This case is very common.  According to the Virident co-founder who worked
at Google, Bigtable sees 90% read workload.  Fiddling with the commit log
threshold is not a good option.  It can majorly impact how long it takes to
bring the system down and back up again.

- Doug

On Thu, Nov 26, 2009 at 6:08 PM, Luke <[email protected]> wrote:

> I'm not sure this is a common enough case worth tackling now. Range
> server can fully take advantage of available memory in most production
> cases. For certain benchmarks (just enough RAM to hold the test data),
> we can fiddle the commit log prune thresholds to minimize the impact
> of the difference. We should try that before implementing the feature,
> which I do think, is a good idea in general.
>
>
> On Nov 25, 2009, at 2:15 PM, Doug Judd <[email protected]> wrote:
>
> > I have a proposal that should improve Hypertable performance in
> > certain situations.  When running the HBase benchmark, the one test
> > that we didn't significantly beat HBase on was the random read
> > test.  During the test, the RangeSevers were using just a little
> > more than 800MB, which was the configured size of the block cache.
> > However, HBase was using all of the RAM that was configured.  I
> > suspect the problem is that when we loaded the data into Hypertable,
> > the RangeServers aggressively compacted the data to keep the commit
> > log pruned back to a minimum, whereas HBase had left a significant
> > amount of data in their cell cache equivalent.  This would give
> > HBase and unfair advantage in the random read test since more of the
> > dataset would have been resident in memory.
> >
> > In general, if the RangeServers have memory available to them, they
> > should use it if possible.  I propose that after a minor compaction,
> > we keep the immutable cell cache in memory and have it overshadow
> > the corresponding CellStore on disk.  When the system determines
> > that it needs more memory in its regular maintenance task, it can
> > purge these cell caches.
> >
> > At some point we should probably have a learning algorithm, or at
> > the very least a heuristic that determines the best use of memory
> > among these shadow cell caches, the block cache, and the query cache.
> >
> > - Doug
> >
> > --
> >
> > You received this message because you are subscribed to the Google
> > Groups "Hypertable Development" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected]<hypertable-dev%[email protected]>
> > .
> > For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en
> > .
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<hypertable-dev%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>
>
>

--

You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.


Reply via email to