[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243816#comment-16243816
 ] 

Eshcar Hillel commented on HBASE-18294:
---------------------------------------

Last thing I wrote in the developer thread:
bq. Here is a suggestion: We can track both heap and off-heap sizes and have 2 
thresholds one for limiting heap size and one for limiting off-heap size. And 
in all decision making junctions we check whether one of the thresholds is 
exceeded and if it is we trigger a flush. We can choose which entity to flush 
based on the cause. For example, if we decided to flush since the heap size 
exceeds the heap threshold than we flush the region/store with greatest heap 
size. and likewise for off-heap flush. I can prepare a patch.

I plan to start working on this. The main changes in the code would be 
(1) change MemStoreSize and MemStoreSizing to a tuple of 3 - data size, heap 
size, and off-heap size. So for example, if we have in a store key-values of 
size 100MB allocated off-heap and 20MB of metadata allocated on-heap the 
counters will be <100,20,100>, if both key-values and metadata is allocated 
on-heap the counters would be <100,120,0>.
(2) have each segment manage a MemStoreSizing object instead of separate 
counters. Incrementing/decrementing the correct counters would be by advise 
from the segment's MSLAB.
(3) whenever increasing the counters check both on-heap and off-heap threshold; 
trigger a flush if exceeded one of them.
(4) upon flush request collect counters from all stores (and stores' segments) 
to decide which store to flush 

Let me know if you have any concerns or foresee any problems with this solution.


> Flush is based on data size instead of heap size
> ------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to