I agree that if we can achieve most of the performance gain by
loosening the CL prune thresholds, that might be a temporary option
(although one must consider the effect on recovery time). I don't
think this is a rare case though. Random read over large data sets
which don't fit in memory should be a reasonably common use case.

-Sanjit


On Thu, Nov 26, 2009 at 6:08 PM, Luke <[email protected]> wrote:
> I'm not sure this is a common enough case worth tackling now. Range
> server can fully take advantage of available memory in most production
> cases. For certain benchmarks (just enough RAM to hold the test data),
> we can fiddle the commit log prune thresholds to minimize the impact
> of the difference. We should try that before implementing the feature,
> which I do think, is a good idea in general.
>
>
> On Nov 25, 2009, at 2:15 PM, Doug Judd <[email protected]> wrote:
>
>> I have a proposal that should improve Hypertable performance in
>> certain situations.  When running the HBase benchmark, the one test
>> that we didn't significantly beat HBase on was the random read
>> test.  During the test, the RangeSevers were using just a little
>> more than 800MB, which was the configured size of the block cache.
>> However, HBase was using all of the RAM that was configured.  I
>> suspect the problem is that when we loaded the data into Hypertable,
>> the RangeServers aggressively compacted the data to keep the commit
>> log pruned back to a minimum, whereas HBase had left a significant
>> amount of data in their cell cache equivalent.  This would give
>> HBase and unfair advantage in the random read test since more of the
>> dataset would have been resident in memory.
>>
>> In general, if the RangeServers have memory available to them, they
>> should use it if possible.  I propose that after a minor compaction,
>> we keep the immutable cell cache in memory and have it overshadow
>> the corresponding CellStore on disk.  When the system determines
>> that it needs more memory in its regular maintenance task, it can
>> purge these cell caches.
>>
>> At some point we should probably have a learning algorithm, or at
>> the very least a heuristic that determines the best use of memory
>> among these shadow cell caches, the block cache, and the query cache.
>>
>> - Doug
>>
>> --
>>
>> You received this message because you are subscribed to the Google
>> Groups "Hypertable Development" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected]
>> .
>> For more options, visit this group at 
>> http://groups.google.com/group/hypertable-dev?hl=en
>> .
>
> --
>
> You received this message because you are subscribed to the Google Groups 
> "Hypertable Development" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/hypertable-dev?hl=en.
>
>
>

--

You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.


Reply via email to