[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655444#comment-15655444
 ] 

Eshcar Hillel commented on HBASE-16417:
---------------------------------------

While running the benchmarks this week I realized I did a mistake when running 
data compaction in previous rounds. I turned off the mslab flag but did not 
remove the chunk pool parameters and as a result a chunk pool was allocated but 
not used. I re-ran these experiments this week with no mslabs and no chunk pool 
and indeed the performance improved. For a fair comparison I also ran 
no-compaction option with no mslabs and no chunk pool which turned out to be 
the best performing setting. (See full details in the latest report.)

The focus of this week benchmarks was mixed-workload: 50% reads 50% writes. 
Results show that in a mixed workload running with no mslabs and no chunk pool 
has a significant advantage over running with chunk pool and mslabs. This is 
the case when running with no compaction or with data compaction.

So far benchmarks do not show advantage of index-/data-compaction over 
no-compaction. This might be due to several reasons:
1. Running index-/data-compaction should reduce the amount of disk compactions 
- the price tag of running a disk compaction in the current system (single ssd 
machine) is not as high as it would be in a production cluster.
2. Index compaction would have greater affect as the size of the cells 
decreases - the values we are using now are medium size (1KB) and not small.
3. Index-/data-compaction should result in more reads being served from memory 
thereby reducing reads latency - we might be using too small a data set which 
is efficiently served from block cache; this is not always the case in 
production data sets.
4. Index-/data-compaction should result in more reads being served from memory 
thereby reducing reads latency - the current implementation of reads *always* 
seeks the key in all store files that may contain it even if it resides in 
memory, effectively masking any memory optimization including in-memory 
compaction.

Directions we intend to explore next:
1. Run benchmarks on commodity machines (namely HDD and not SSD); run cluster 
on more than one machine (2 RS, 3-way replication); the scale might be smaller 
though since our HDD machine are modest compared to the ssd machine we have.
2. Run with smaller values - 100B instead of 1KB
3. Run bigger data sets - 10-20M keys instead of 5M keys
4. Change read (get) implementation to first seek for the key in memstore(s) 
only, and only if no matching entry is found seek in all memstore segments and 
all relevant store files. This could be a subject of another Jira. We believe 
this would be beneficial also with no compaction, and even more when 
index-/data-compaction is employed. Any thought on this direction(?)

Finally a small note: a small bug was found which does not allow 
index-compaction to run without mslabs. This bug is about to be fixed in a new 
patch Anastasia is working on.
 

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to