[ https://issues.apache.org/jira/browse/HBASE-16747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614388#comment-15614388 ]
ramkrishna.s.vasudevan commented on HBASE-16747: ------------------------------------------------ +1. Great patch. > Track memstore data size and heap overhead separately > ------------------------------------------------------ > > Key: HBASE-16747 > URL: https://issues.apache.org/jira/browse/HBASE-16747 > Project: HBase > Issue Type: Sub-task > Components: regionserver > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-16747.patch, HBASE-16747.patch, > HBASE-16747_V2.patch, HBASE-16747_V2.patch, HBASE-16747_V3.patch, > HBASE-16747_V3.patch, HBASE-16747_V3.patch, HBASE-16747_V4.patch, > HBASE-16747_WIP.patch > > > We track the memstore size in 3 places. > 1. Global at RS level in RegionServerAccounting. This tracks all memstore's > size and used to calculate whether forced flushes needed because of global > heap pressure > 2. At region level in HRegion. This is sum of sizes of all memstores within > this region. This is used to decide whether region reaches flush size (128 MB) > 3. Segment level. This tracks the in memory flush/compaction decisions. > All these use the Cell's heap size which include the data bytes# as well as > Cell object heap overhead. Also we include the overhead because of addition > of Cells into Segment's data structures (Like CSLM). > Once we have off heap memstore, we will keep the cell data bytes in off heap > area. So we can not track both data size and heap overhead as one entity. We > need to separate them and track. > Proposal here is to track both cell data size and heap overhead separately at > global accounting layer. As of now we have only on heap memstore. So the > global memstore boundary checks will consider both (adds up and check against > global max memstore size) > Track cell data size alone (This can be on heap or off heap) in region level. > Region flushes use cell data size alone for the region flush decision. A > user configuring 128 MB as flush size, normally he will expect to get a 128MB > data flush size. But as we were including the heap overhead also, once the > flush happens, the actual data size getting flushed is way behind this 128 > MB. Now with this change we will behave more like what a user thinks. > Segment level in memory flush/compaction also considers cell data size alone. > But we will need to track the heap overhead also. (Once the in memory flush > or normal flush happens, we will have to adjust both cell data size and heap > overhead) -- This message was sent by Atlassian JIRA (v6.3.4#6332)