[ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171458#comment-15171458
 ] 

Anoop Sam John commented on HBASE-14921:
----------------------------------------

Thanks for the details in the doc..   It neatly explains
So during in memory compaction will convert CSLM of Cells into array of Cells.  
When will the compaction getting triggered?  Time based and/or 
#ImmutableSegments in the pipeline?
So for making the array of Cells we need to know how many cells will survive 
into the compacted result. So we will do scan over the ImmutableSegments 2 
times? To know the #cells and then for actual moving it into array.
Ya when most of the Cells in the ImmutableSegments can go away during this 
compaction, the copy to new MSLAB area is very much needed.

But can we consider other use cases also?  Normally there might not be that 
much #versions of cells and very rare cells will get expired within memstore. 
Then this extra copy will double the need for memory need from Pool. (Ya our 
aim is to make all these into off heap area and so a pool is a must then)  If 
we know the #cells compacting out and #cells which will get away, we can decide 
whether it is worth copy to new area or not.

Also my consideration before was that when an in memory flush is happening, 
then itself we should get away with a CSLM. And we may have multiple CellBlocks 
associated with a memstore.  With a case of most cells version expired, yes the 
compaction is very much imp.  So am very interested to know when you consider 
we can compact the CSLM cells into array.

It is not just 8 bytes extra overhead per cell when we have array of cells 
instead of plain bytes cellblock (as HFile data block)
Ref to cell in array (8 bytes)  + Cell Object (16 bytes) + ref to byte with 
Cell (8) + offset and length ints (8) = 40 bytes per cell.  

> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>         Attachments: CellBlocksSegmentInMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to