Note, this case is not simply "random read over large data sets that don't fit in memory". It's random read of all the items just _once_ over a large data set _just_ over configured memory size (i.e, there is exactly _one_ cell store created for each access group). In this particular case, there is no data locality to speak of, so a query cache wouldn't help, Setting Block Cache size large enough would help though.
A more common "random read over large data sets" case typically have much bigger data set that has more than one cell stores created for each access group, which means each random read would incur seeks in the DFS anyway, so that shadowing cell caches wouldn't help. Typically there is some locality in this case, where query cache would help and is a better use of memory. On Thu, Nov 26, 2009 at 10:04 PM, Sanjit Jhala <[email protected]> wrote: > I agree that if we can achieve most of the performance gain by > loosening the CL prune thresholds, that might be a temporary option > (although one must consider the effect on recovery time). I don't > think this is a rare case though. Random read over large data sets > which don't fit in memory should be a reasonably common use case. > > -Sanjit > > > On Thu, Nov 26, 2009 at 6:08 PM, Luke <[email protected]> wrote: >> I'm not sure this is a common enough case worth tackling now. Range >> server can fully take advantage of available memory in most production >> cases. For certain benchmarks (just enough RAM to hold the test data), >> we can fiddle the commit log prune thresholds to minimize the impact >> of the difference. We should try that before implementing the feature, >> which I do think, is a good idea in general. >> >> >> On Nov 25, 2009, at 2:15 PM, Doug Judd <[email protected]> wrote: >> >>> I have a proposal that should improve Hypertable performance in >>> certain situations. When running the HBase benchmark, the one test >>> that we didn't significantly beat HBase on was the random read >>> test. During the test, the RangeSevers were using just a little >>> more than 800MB, which was the configured size of the block cache. >>> However, HBase was using all of the RAM that was configured. I >>> suspect the problem is that when we loaded the data into Hypertable, >>> the RangeServers aggressively compacted the data to keep the commit >>> log pruned back to a minimum, whereas HBase had left a significant >>> amount of data in their cell cache equivalent. This would give >>> HBase and unfair advantage in the random read test since more of the >>> dataset would have been resident in memory. >>> >>> In general, if the RangeServers have memory available to them, they >>> should use it if possible. I propose that after a minor compaction, >>> we keep the immutable cell cache in memory and have it overshadow >>> the corresponding CellStore on disk. When the system determines >>> that it needs more memory in its regular maintenance task, it can >>> purge these cell caches. >>> >>> At some point we should probably have a learning algorithm, or at >>> the very least a heuristic that determines the best use of memory >>> among these shadow cell caches, the block cache, and the query cache. >>> >>> - Doug >>> >>> -- >>> >>> You received this message because you are subscribed to the Google >>> Groups "Hypertable Development" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected] >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/hypertable-dev?hl=en >>> . >> >> -- >> >> You received this message because you are subscribed to the Google Groups >> "Hypertable Development" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/hypertable-dev?hl=en. >> >> >> > > -- > > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
