[ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244119#comment-15244119
 ] 

Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------

[~yuzhih...@gmail.com], thanks for taking the look!

bq. Can you explain in bit more detail on the savings ?

Flattenning just means replacing the CellSet from one based on 
ConcurrentSkipListMap to one based on CellArrayMap. CellArayMap is a new name 
for CellBlockObjectArray and it uses less overhead (metadata) per cell than 
ConcurrentSkipListMap. I am quoting [~anoop.hbase] below:
{quote}
an entry added to CSLM (Cell object) will have ~100 bytes overhead per cell.
The Cell[] way of CellBlock (CellBlockObjectArray) will have per Cell overhead 
of 48 bytes
{quote}

\\
bq. What's plan for flattening to CellChunkMap w.r.t. getting the chunk Id ?

The following answer also answers the questions raised by [~stack]

bq. What is this "...currently impossible to get the chunk ID out of already 
created cell metadata"

\\
In CellChunkMap we save a kind of reference to Cell using three integers 
(possible to deal with 2, but 3 for now). Assuming that all the data of Cell A 
is saved on Chunk C, in CellChunkMap we save the following per cell:
1.      Reference to C (some possibility to access the byte array of Chunk C)
2.      Offset from the beginning of Chunk C
3.      Length of the Cell A on C
The problem is in 1. In Java we can not have a pointer/reference/address of and 
object. To resolve that, we added an ID for each Chunk, which is created in the 
MemStoreChunkPool. In addition we added a mapping from Chunk IDs to Chunks 
references. So in 1 we save the Chunk ID and translate it to Chunk reference 
when we need to access the Cell data. This is OK when we create CellChunkMap 
from the scratch. 

But in case of flattening, we have an exisiting segment with MSLAB and 
ConcurrentSkipListMap and we do not want to copy the data in MSLAB. So as it is 
now, we can not just translate the ConcurrentSkipListMap to CellChunkMap, 
because we do not know the Chunk IDs of the Cells. But we can translate 
ConcurrentSkipListMap to CellArrayMap, which already reduces some metadata 
overhead.

In order to allow translation to CellChunkMap we need the Cells to know where 
they are storred and their Chunk IDs. It is quite a big change and it is 
planned to be done after performance evaluation phase.


> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, 
> CellBlocksSegmentinthecontextofMemStore(1).pdf, HBASE-14921-V01.patch, 
> HBASE-14921-V02.patch, HBASE-14921-V03.patch, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to