[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243816#comment-16243816 ]
Eshcar Hillel commented on HBASE-18294: --------------------------------------- Last thing I wrote in the developer thread: bq. Here is a suggestion: We can track both heap and off-heap sizes and have 2 thresholds one for limiting heap size and one for limiting off-heap size. And in all decision making junctions we check whether one of the thresholds is exceeded and if it is we trigger a flush. We can choose which entity to flush based on the cause. For example, if we decided to flush since the heap size exceeds the heap threshold than we flush the region/store with greatest heap size. and likewise for off-heap flush. I can prepare a patch. I plan to start working on this. The main changes in the code would be (1) change MemStoreSize and MemStoreSizing to a tuple of 3 - data size, heap size, and off-heap size. So for example, if we have in a store key-values of size 100MB allocated off-heap and 20MB of metadata allocated on-heap the counters will be <100,20,100>, if both key-values and metadata is allocated on-heap the counters would be <100,120,0>. (2) have each segment manage a MemStoreSizing object instead of separate counters. Incrementing/decrementing the correct counters would be by advise from the segment's MSLAB. (3) whenever increasing the counters check both on-heap and off-heap threshold; trigger a flush if exceeded one of them. (4) upon flush request collect counters from all stores (and stores' segments) to decide which store to flush Let me know if you have any concerns or foresee any problems with this solution. > Flush is based on data size instead of heap size > ------------------------------------------------ > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Bug > Reporter: Eshcar Hillel > Assignee: Eshcar Hillel > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v6.4.14#64029)