[ 
https://issues.apache.org/jira/browse/HBASE-16747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614388#comment-15614388
 ] 

ramkrishna.s.vasudevan commented on HBASE-16747:
------------------------------------------------

+1. Great patch.

> Track memstore data size and heap overhead separately 
> ------------------------------------------------------
>
>                 Key: HBASE-16747
>                 URL: https://issues.apache.org/jira/browse/HBASE-16747
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16747.patch, HBASE-16747.patch, 
> HBASE-16747_V2.patch, HBASE-16747_V2.patch, HBASE-16747_V3.patch, 
> HBASE-16747_V3.patch, HBASE-16747_V3.patch, HBASE-16747_V4.patch, 
> HBASE-16747_WIP.patch
>
>
> We track the memstore size in 3 places.
> 1. Global at RS level in RegionServerAccounting. This tracks all memstore's 
> size and used to calculate whether forced flushes needed because of global 
> heap pressure
> 2. At region level in HRegion. This is sum of sizes of all memstores within 
> this region. This is used to decide whether region reaches flush size (128 MB)
> 3. Segment level. This tracks the in memory flush/compaction decisions.
> All these use the Cell's heap size which include the data bytes# as well as 
> Cell object heap overhead.  Also we include the overhead because of addition 
> of Cells into Segment's data structures (Like CSLM).
> Once we have off heap memstore, we will keep the cell data bytes in off heap 
> area. So we can not track both data size and heap overhead as one entity. We 
> need to separate them and track.
> Proposal here is to track both cell data size and heap overhead separately at 
> global accounting layer.  As of now we have only on heap memstore. So the 
> global memstore boundary checks will consider both (adds up and check against 
> global max memstore size)
> Track cell data size alone (This can be on heap or off heap) in region level. 
>  Region flushes use cell data size alone for the region flush decision. A 
> user configuring 128 MB as flush size, normally he will expect to get a 128MB 
> data flush size. But as we were including the heap overhead also, once the 
> flush happens, the actual data size getting flushed is way behind this 128 
> MB.  Now with this change we will behave more like what a user thinks.
> Segment level in memory flush/compaction also considers cell data size alone. 
>  But we will need to track the heap overhead also. (Once the in memory flush 
> or normal flush happens, we will have to adjust both cell data size and heap 
> overhead)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to