[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068379#comment-16068379
 ] 

Eshcar Hillel commented on HBASE-18294:
---------------------------------------

Currently the total memstore size in a region also measures only the data size 
and not overall heap size:
{code}
long size = this.memstoreDataSize.addAndGet(memstoreSize.getDataSize());
{code}
I think this is wrong.
When the data is flushed to disk
(1) metadata changes
(2) data is compacted
(3) data is compressed
All these can cause the file to be smaller than 128MB so the user cannot expect 
to get files of certain sizes.
The purpose of a flush is to release memory when the memory exceeds the 
threshold. It does not guarantee creating files of minimal sizes.
If indeed there are cases where the data only takes half of the memory or even 
less and the rest is used for index and metadata, and we only check if data 
exceeds 128MB than we may use double the size of the memory. 
As a result
(*) the system can exceed the blocking threshold (where updates are blocked) 
more frequently than we would like, and 
(*) we can even get out-of-memory-exceptoion in certain scenarios.

I think a better practice would be to respect the thresholds while considering 
the total heap size -- both in region level when triggering a flush, and in 
store level when selecting stores to be flushed. 

> Flush policy checks data size instead of heap size
> --------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>
> A flush policy decides whether to flush a store by comparing the size of the 
> store to a threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation compares the data size (key-value only) to the 
> threshold where it should compare the heap size (which includes index size, 
> and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to