[ https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953177#comment-15953177 ]
Anastasia Braginsky commented on HBASE-16438: --------------------------------------------- bq. What specific question in RB are you looking out for? OK. I will write here the questions that bother me and I don't see responses: 1.In ByteBufferChunkCell, please explain me why to add this new class? Why can not the existing BBKV just have a new method - getChunkId() - to return the chunk id in the 0th offset of the backing BB? 2. In ByteBufferKeyValue or in MSLAB or anywhere else, please add constant saying what is the size in bytes of the ChunkCell or what I call cell-representation (chunkId + offset + length + seqId), so I can use it later. I will review the existing patch once again bq. ChunkId is per ByteBuffer backing the chunk. I can change the chunkId to be an int. You got it yourself, I also thought so for a moment. I am talking about ChunkID of where each cell is located, which is saved per cell. Please do change chunkID to int, but check for overflow (at least log some error). I believe we should strive to decrease number of bytes the cell representation is taking, because this is the reason why are we doing the CellChunkMap... bq. My Q was, this Cell meta data (ChunkId, offset, length) also we planned to write to chunks. So what is the difference? In this chunk or that chunk? Do you mean the seqID is going to be written in index-chunk only and is not going to be written in the main-chunk, holding key, value and etc.? So no duplication? Are you sure? If so, then already little better, but still I would like to keep the Cell meta data smaller. The smaller the Cell meta data is (hopefully only chunkId, offset, length and only 12 bytes) the less is the meta-data-overhead per cell is and the more we can squeeze into single index-chunk (CellChunkMap). The smaller CellChunkMap is we all enjoy the locality for scans and the binary search can hit the processor-cache easily. bq. The only thing is we should go with fixed 8 bytes for that. This is not a desired situation. We are increasing from 12 bytes to 20 bytes, almost twice... We should not do it unless it is very very necessary... bq. So now if you are going to write the seqId in the BB backing every cell, then the seqId as the state variable is not needed at all and hence you may need a new cell representation for it. OK. So lets have a new cell representation. bq. Otherwise we should still go with it and use the seqID as a caching value in addition to having it in the BB. Why to have the duplication of the same? > Create a cell type so that chunk id is embedded in it > ----------------------------------------------------- > > Key: HBASE-16438 > URL: https://issues.apache.org/jira/browse/HBASE-16438 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-16438_1.patch, > HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch, > HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, > HBASE-16438_8_ChunkCreatorwrappingChunkPool_withchunkRef.patch, > HBASE-16438_9_ChunkCreatorwrappingChunkPool_withchunkRef.patch, > HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch, > MemstoreChunkCell_trunk.patch > > > For CellChunkMap we may need a cell such that the chunk out of which it was > created, the id of the chunk be embedded in it so that when doing flattening > we can use the chunk id as a meta data. More details will follow once the > initial tasks are completed. > Why we need to embed the chunkid in the Cell is described by [~anastas] in > this remark over in parent issue > https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119 -- This message was sent by Atlassian JIRA (v6.3.15#6346)