[ 
https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940016#comment-15940016
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-16438 at 3/24/17 8:54 AM:
-------------------------------------------------------------------------

I have a way to solve this problem. LEt's discuss before I put up the patch. 
Most of the other RB comments are fixed.
-> Now since we need to know if the chunk is from pool or not - the Chunk will 
have a boolean indicating whther the chunk was created for the pool. Say we 
have isFromPool() will return true for those chunks.
-> Every chunk will have an AtomicInteger ref count.
-> When the MSLAB does a copyToChunkCell - where we know that the cell has to 
have a chunk(comes out of chunkCreator) we do an increment of the refCount.
-> Now in the MemstoreImpl when we do getCellSet().add() ( we need to have a 
new API in CellSet which actually returns the cell that was already there in 
the CSLM which is returned by CSLM.put() returns. Now we only have 
cellSet#add() which return boolean).
-> On this returned cell (which is the actual duplicate cell) we get the 
chunkId from the Cell. remember we now have a BbChunkCell which can give the 
chunkid from the 0th offset.
-> Use this chunkId to actually do a decrement of the reference count of this 
chunk. For this we need a decrementChunkRefCount in MSLAB interface. I think it 
is valid because MSLAB impl is nothing but Chunks.
-> Now on doing this decrementChunkRefCount  , we could check if the result is 
now 0 and if so just remove that chunk from the chunkCreator map. So by this 
way we are making sure that the reference to the chunk is released immediately.
-> Things to note is that in case the chunk is from Pool this 
increment/decrement will not have any impact. This will impact only when we 
have ondemand chunks.
-> There is an atomic ref count operation happening now which may add on to the 
write path overhead. May be need to see the impact. but remember this is going 
to happen only if there are lot of duplicates like in HBASE-16195. In a normal 
case this should not be a problem because the CSLM#put() is going to return a 
null as there is no duplicate and so there are no such problems. And infact in 
such a case the GC issue mentioned in HBASE-16195 will not happen as all the 
chunks are needed till the MSLAB is closed.
Thoughts!!!


was (Author: ram_krish):
I have a way to solve this problem. LEt's discuss before I put up the patch. 
Most of the other RB comments are fixed.
-> Now since we need to if the chunk is from pool or not - the Chunk will have 
a boolean indicating whther the chunk was created for the pool. Say we have 
isFromPool() will return true for those chunks.
-> Every chunk will have an AtomicInteger ref count.
-> When the MSLAB does a copyToChunkCell - where we know that the cell has to 
have a chunk(comes out of chunkCreator) we do an increment of the refCount.
-> Now in the MemstoreImpl when we do getCellSet().add() ( we need to have a 
new API in CellSet which actually returns the cell that was already there in 
the CSLM which is returned by CSLM.put() returns. Now we only have 
cellSet#add() which return boolean).
-> On this returned cell (which is the actual duplicate cell) we get the 
chunkId from the Cell. remember we now have a BbChunkCell which can give the 
chunkid frm the 0th offset.
-> Use this chunkId to actually do a decrement of the reference count of this 
chunk. For this we need a decrementChunkRefCount in MSLAB interface. I think it 
is valid because MSLAB impl is nothing but Chunks.
-> Now on doing this decrementChunkRefCount  , we could check if the result is 
now 0 and if so just remove that chunk from the chunkCreator map. So by this 
way we are making sure that the reference to the chunk is released immediately.
-> Things to note is that in case the chunk is from Pool this 
increment/decrement will not have any impact. This will impact only when we 
have ondemand chunks.
-> There is an atomic ref count operation happening now which may add on to the 
write path overhead. May be need to see the impact. but remember this is going 
to happen only if there are lot of duplicates like in HBASE-16195. In a normal 
case this should not be a problem because the CSLM#put() is going to return a 
null as there is no duplicate and so there are no such problems. And infact in 
such a case the GC issue mentioned in HBASE-16195 will not happen as all the 
chunks are needed till the MSLAB is closed.
Thoughts!!!

> Create a cell type so that chunk id is embedded in it
> -----------------------------------------------------
>
>                 Key: HBASE-16438
>                 URL: https://issues.apache.org/jira/browse/HBASE-16438
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: HBASE-16438_1.patch, 
> HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch, 
> HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, HBASE-16438.patch, 
> MemstoreChunkCell_memstoreChunkCreator_oldversion.patch, 
> MemstoreChunkCell_trunk.patch
>
>
> For CellChunkMap we may need a cell such that the chunk out of which it was 
> created, the id of the chunk be embedded in it so that when doing flattening 
> we can use the chunk id as a meta data. More details will follow once the 
> initial tasks are completed. 
> Why we need to embed the chunkid in the Cell is described by [~anastas] in 
> this remark over in parent issue 
> https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to