[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195745#comment-16195745
 ] 

Eshcar Hillel commented on HBASE-16417:
---------------------------------------

Thanks all for your questions.

bq. Can this go into branch-2?
Sure why not :)

bq. How long did the tests run for in each of the five cases?
Write-only runs started from an empty table and performed 500M puts. This took 
over an hour in SSD and less than 2 hours in HDD.
Read-write runs first loaded 10GB data and then ran 500K reads with heavy 
writes running in the background. These runs took 2-4 hours each.

bq. What would you recommend as default? Should we enable adaptive by default?
This is a good question.
We performed rigorous benchmarks, however these are still only 
micro-benchmarks, namely rely on synthetic workloads.
I think it is best to have Basic as default for 2.0 since its behavior is more 
predictive, and it requires no configuration.
Once we have users feedback we can suggest them also to try playing with 
adaptive and see where it can further improve their performance.  
For sure they can configure it for specific column families which can benefit 
from data reduction.

bq. The effect of HDD/SSDs does it come from the fact as how fast these 
segments in the pipeline are released after flushes?
In write-only workload we see that the improvement in throughput has high 
correlation with reduction of total GC time. With fast SSD hardware this has 
higher affect on throughput as memory management is more of a bottleneck.

bq. here we capture the throughput of writes and flushes are not in the hot 
path so does it mean that we get blocking updates and the throughput depends on 
how fast the blocking udpates are cleared and that depends on the segment count?
You can see in the parameter tuning report throughput increases as the number 
of segments in the pipeline increases (up to some point), so I don't think we 
get more blocking updates with more segments in the pipeline.
Also note that the number of segments in the snapshot depends on the timing of 
the flush, it could be less than the limit.

bq. So these tests were done with changing back to the old way of per region 
flush decision based on heap size NOT on data size?
NO. 
Did not have time to apply these changes yet. I plan to do this next.
However, global pressure triggered many flushes, and there as you know it does 
check heap size and not data size 

bq. The more the data size, the lesser will be the gain. To have a fair eval 
what should be the val size to used?
I agree. With greater values the gain will be smaller. But I believe we'll 
still see gain. Flat index not only takes less space but is also more friendly 
for memory management which is an advantage. Moreover with adaptive we'll still 
see reduction in space, flushes, disk compaction etc.
AND a recent work  claim that small values are typical in production workloads 
like in Facebook and Twitter (see "LSM-trie: An lsm-tree-based ultra-large 
key-value store for small data items").
We ran experiments with large values in the past.
We can repeat some of the experiments with 500B which are also reported in this 
work.

I need a rebase plus will implement the comments above or other comments you 
put on RB.
Anyway happy to answer any further question/concerns you may have.


> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 3.0.0
>
>         Attachments: HBASE-16417.01.patch, HBASE-16417 - Adaptive Compaction 
> Policy - 20171001.pdf, HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf, 
> HBASE-16417-benchmarkresults-20161205.pdf, 
> HBASE-16417-benchmarkresults-20170309.pdf, 
> HBASE-16417-benchmarkresults-20170317.pdf, HBASE-16417 - parameter tuning - 
> 20171001.pdf, HBASE-16417-V01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to