[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071856#comment-16071856
 ] 

Anoop Sam John commented on HBASE-18294:
----------------------------------------

When the data size of the memstore is high (choosing one to flush), the heap 
occupancy of it also will be on higher side no?  Am speaking abt the common 
case.  We have basic flattening as the default now and CompactingMemstore is 
the default.  So we can expect all the memstores to be in this way not like 
some are flattened and some are never. (Am speaking wrt defaults configs as we 
have now).  
One can tune the region flush size as per the new changes.  May be the default 
itself we can change now.
But considering data size only for the per region flush decision is more inline 
with a normal user thinking.  128MB size I have configured and the data is 
flushed at that data size reach.  What is the heap overhead (we have some with 
DefaultMemstore and some thing else when compacting memstore basic mode in 
place etc. Tomorrow if a new algorithm comes it may even reduce) is not a user 
headache.   Still we have to consider that as we can not make our RS to OOME or 
have GC bad impacts.  That any way at global level we are doing.

In normal cases also flushes due to global pressure might be happening.   Say 
we have 100 regions per RS and then as per default settings, the ideal heap 
size need for global memstores is
100 * 128 MB = 12.5 GB 
12.5 * 4 = 50 GB.
We allow the memstores size to grow 4 times 128 MB before blocking.
So configuring this big size might not be the usual case.  Agree that we will 
kick start the flush of region once the size is 128 MB.  But if the write 
pressure is high the size can grow beyond 2x .





> Flush is based on data size instead of heap size
> ------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to