[ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171458#comment-15171458 ]
Anoop Sam John commented on HBASE-14921: ---------------------------------------- Thanks for the details in the doc.. It neatly explains So during in memory compaction will convert CSLM of Cells into array of Cells. When will the compaction getting triggered? Time based and/or #ImmutableSegments in the pipeline? So for making the array of Cells we need to know how many cells will survive into the compacted result. So we will do scan over the ImmutableSegments 2 times? To know the #cells and then for actual moving it into array. Ya when most of the Cells in the ImmutableSegments can go away during this compaction, the copy to new MSLAB area is very much needed. But can we consider other use cases also? Normally there might not be that much #versions of cells and very rare cells will get expired within memstore. Then this extra copy will double the need for memory need from Pool. (Ya our aim is to make all these into off heap area and so a pool is a must then) If we know the #cells compacting out and #cells which will get away, we can decide whether it is worth copy to new area or not. Also my consideration before was that when an in memory flush is happening, then itself we should get away with a CSLM. And we may have multiple CellBlocks associated with a memstore. With a case of most cells version expired, yes the compaction is very much imp. So am very interested to know when you consider we can compact the CSLM cells into array. It is not just 8 bytes extra overhead per cell when we have array of cells instead of plain bytes cellblock (as HFile data block) Ref to cell in array (8 bytes) + Cell Object (16 bytes) + ref to byte with Cell (8) + offset and length ints (8) = 40 bytes per cell. > Memory optimizations > -------------------- > > Key: HBASE-14921 > URL: https://issues.apache.org/jira/browse/HBASE-14921 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: Eshcar Hillel > Attachments: CellBlocksSegmentInMemStore.pdf > > > Memory optimizations including compressed format representation and offheap > allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)