[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563359#comment-14563359 ]
ramkrishna.s.vasudevan commented on HBASE-12295: ------------------------------------------------ bq.returnBlock(BlockCacheKey cacheKey, HFileBlock block) bq.Do we need to return the block too in the above? Won't the key be enough? Ideally yes. But as per our current impl we have a type of block whether it is from L2 or L1 and hence needed the block there. May be we can only pass the type of the block there? That should be possible. Not a big deal. bq.Or, consider that we will want to stream out Cells as they come up out of the server when we implement a streaming Interface on the server. Okie. When we tried to directly write the cells to the socket as part of the POC things were directly slow. May be a different type of protocol/approach may be needed there. bq.Hmm... pulling the CellBlock into the Region from the ipc layer? I have thought that Result should carry CellBlocks.... This would be an extra copy, right? If we wanted to get to zero copy, would it be possible if we went this route? Yes, this will be a zero copy. Currently while creating cell block there is no copy we do and directly use the encoder to create it. same here except that it is now in HRegion. Making Result carry it is one option, I think you mean the PB result right? The approach here was to be simple use the existing Payload. When you say Result - will that not be the current way as how we do for non-java clients? bq.Nah. You can't pull an oddball RPC datastructure back into HRegion. Could it be done in the Result itself? Same as above. bq.He has added a bunch of accounting on where scan is at... state, and has scans doing heartbeating, and early returns. Can you make use of this work of his? I had a look at it. Will check once more before commenting back. But in our case we need to handle both scans and gets. Scans have states and gets do not have states as gets operate with in Region. bq.Tell us more about the marking of Cells from L2 with a new Interface and why CP need special treatment, need Cells copied when read from CP. We have to do this? CPs are bit tricky. Take a CP which is trying to implement a postScannerOpen hook by wrapping the original scanner. Now in a non CP approach we have the control on the result and the cellblock creation and we are sure that once the cell block is created we no longer refer to the cells from the hfileblocks. But when you have a CP there is a high chance that those cells are referred for a longer time and the CP tries to use those Cells as its state. In those cases, if we think that the blocks ref count can be decremented just because the results have been fetched, we end up corrupting the states of those CPs. Hence we need to do a copy of the result. bq.finalizeScan(boolean finalizeAll). Though we have completed the implementation, we are still seeing if there is a better way,, but I have done some analysis and I fear that may be very very tricky. I can come up with a write up after some more analysis but overall the problem is that the scanner flow has some optimizaitons where we proactively close some of the scanner from the heap just because they don't return any result (infact we nullify them also). In such cases just calling close will not be enough because already those StoreFileScanners could be closed and we will lose the reference to those scanners. Hence thought of adding an explicit API to do it. And added to that for the scan case the close() call alone won't work because there are going to be set of next() calls for a scan to finish and it makes it better if we clear the references of those cells then and there. And in case of scans the latest block would be needed for the subsequent next() calls as Scans are with States. bq. "In such a case we don’t evict the block if the ref count > 0, instead we mark those blocks with a Boolean." This is a special case. In case of compaction after the files are compacted we know that the compacted files are no longer needed and we forcefully try to evict them from the block cache. But now if there were any parallel scans operating on those files we just cannot evict them. So we use the same ref count mechanism and see if the block can really evicted (even if it is forceful). All such blocks would automatically be evicted once the read operation using that block gets completed. (in the sense on decrementing a 'marked' block to 0 we call evict forcefully). This ensures that the results are not corrupted. > Prevent block eviction under us if reads are in progress from the BBs > --------------------------------------------------------------------- > > Key: HBASE-12295 > URL: https://issues.apache.org/jira/browse/HBASE-12295 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-12295.pdf, HBASE-12295_trunk.patch > > > While we try to serve the reads from the BBs directly from the block cache, > we need to ensure that the blocks does not get evicted under us while > reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)