I have a proposal that should improve Hypertable performance in certain situations. When running the HBase benchmark, the one test that we didn't significantly beat HBase on was the random read test. During the test, the RangeSevers were using just a little more than 800MB, which was the configured size of the block cache. However, HBase was using all of the RAM that was configured. I suspect the problem is that when we loaded the data into Hypertable, the RangeServers aggressively compacted the data to keep the commit log pruned back to a minimum, whereas HBase had left a significant amount of data in their cell cache equivalent. This would give HBase and unfair advantage in the random read test since more of the dataset would have been resident in memory.
In general, if the RangeServers have memory available to them, they should use it if possible. I propose that after a minor compaction, we keep the immutable cell cache in memory and have it overshadow the corresponding CellStore on disk. When the system determines that it needs more memory in its regular maintenance task, it can purge these cell caches. At some point we should probably have a learning algorithm, or at the very least a heuristic that determines the best use of memory among these shadow cell caches, the block cache, and the query cache. - Doug -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
