[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195745#comment-16195745 ]
Eshcar Hillel commented on HBASE-16417: --------------------------------------- Thanks all for your questions. bq. Can this go into branch-2? Sure why not :) bq. How long did the tests run for in each of the five cases? Write-only runs started from an empty table and performed 500M puts. This took over an hour in SSD and less than 2 hours in HDD. Read-write runs first loaded 10GB data and then ran 500K reads with heavy writes running in the background. These runs took 2-4 hours each. bq. What would you recommend as default? Should we enable adaptive by default? This is a good question. We performed rigorous benchmarks, however these are still only micro-benchmarks, namely rely on synthetic workloads. I think it is best to have Basic as default for 2.0 since its behavior is more predictive, and it requires no configuration. Once we have users feedback we can suggest them also to try playing with adaptive and see where it can further improve their performance. For sure they can configure it for specific column families which can benefit from data reduction. bq. The effect of HDD/SSDs does it come from the fact as how fast these segments in the pipeline are released after flushes? In write-only workload we see that the improvement in throughput has high correlation with reduction of total GC time. With fast SSD hardware this has higher affect on throughput as memory management is more of a bottleneck. bq. here we capture the throughput of writes and flushes are not in the hot path so does it mean that we get blocking updates and the throughput depends on how fast the blocking udpates are cleared and that depends on the segment count? You can see in the parameter tuning report throughput increases as the number of segments in the pipeline increases (up to some point), so I don't think we get more blocking updates with more segments in the pipeline. Also note that the number of segments in the snapshot depends on the timing of the flush, it could be less than the limit. bq. So these tests were done with changing back to the old way of per region flush decision based on heap size NOT on data size? NO. Did not have time to apply these changes yet. I plan to do this next. However, global pressure triggered many flushes, and there as you know it does check heap size and not data size bq. The more the data size, the lesser will be the gain. To have a fair eval what should be the val size to used? I agree. With greater values the gain will be smaller. But I believe we'll still see gain. Flat index not only takes less space but is also more friendly for memory management which is an advantage. Moreover with adaptive we'll still see reduction in space, flushes, disk compaction etc. AND a recent work claim that small values are typical in production workloads like in Facebook and Twitter (see "LSM-trie: An lsm-tree-based ultra-large key-value store for small data items"). We ran experiments with large values in the past. We can repeat some of the experiments with 500B which are also reported in this work. I need a rebase plus will implement the comments above or other comments you put on RB. Anyway happy to answer any further question/concerns you may have. > In-Memory MemStore Policy for Flattening and Compactions > -------------------------------------------------------- > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task > Reporter: Anastasia Braginsky > Assignee: Eshcar Hillel > Fix For: 3.0.0 > > Attachments: HBASE-16417.01.patch, HBASE-16417 - Adaptive Compaction > Policy - 20171001.pdf, HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf, > HBASE-16417-benchmarkresults-20161123.pdf, > HBASE-16417-benchmarkresults-20161205.pdf, > HBASE-16417-benchmarkresults-20170309.pdf, > HBASE-16417-benchmarkresults-20170317.pdf, HBASE-16417 - parameter tuning - > 20171001.pdf, HBASE-16417-V01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)