[ 
https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563359#comment-14563359
 ] 

ramkrishna.s.vasudevan commented on HBASE-12295:
------------------------------------------------

bq.returnBlock(BlockCacheKey cacheKey, HFileBlock block)
bq.Do we need to return the block too in the above? Won't the key be enough?
Ideally yes. But as per our current impl we have a type of block whether it is 
from L2 or L1 and hence needed the block there.  May be we can only pass the 
type of the block there?  That should be possible. Not a big deal.
bq.Or, consider that we will want to stream out Cells as they come up out of 
the server when we implement a streaming Interface on the server.
Okie.  When we tried to directly write the cells to the socket as part of the 
POC things were directly slow. May be a different type of protocol/approach may 
be needed there.
bq.Hmm... pulling the CellBlock into the Region from the ipc layer? I have 
thought that Result should carry CellBlocks.... This would be an extra copy, 
right? If we wanted to get to zero copy, would it be possible if we went this 
route?
Yes, this will be a zero copy.  Currently while creating cell block there is no 
copy we do and directly use the encoder to create it.  same here except that it 
is now in HRegion.
Making Result carry it is one option, I think you mean the PB result right? The 
approach here was to be simple use the existing Payload. When you say Result - 
will that not be the current way as how we do for non-java clients?
bq.Nah. You can't pull an oddball RPC datastructure back into HRegion. Could it 
be done in the Result itself?
Same as above.  
bq.He has added a bunch of accounting on where scan is at... state, and has 
scans doing heartbeating, and early returns. Can you make use of this work of 
his?
I had a look at it. Will check once more before commenting back.  But in our 
case we need to handle both scans and gets.  Scans have states and gets do not 
have states as gets operate with in Region.
bq.Tell us more about the marking of Cells from L2 with a new Interface and why 
CP need special treatment, need Cells copied when read from CP. We have to do 
this?
CPs are bit tricky.  Take a CP which is trying to implement a postScannerOpen 
hook by wrapping the original scanner. 
Now in a non CP approach we have the control on the result and the cellblock 
creation and we are sure that once the cell block is created we no longer refer 
to the cells from the hfileblocks.  But when you have a CP there is a high 
chance that those cells are referred for a longer time and the CP tries to use 
those Cells as its state. In those cases, if we think that the blocks ref count 
can be decremented just because  the results have been fetched, we end up 
corrupting the states of those CPs.  Hence we need to do a copy of the result.
bq.finalizeScan(boolean finalizeAll).
Though we have completed the implementation, we are still seeing if there is a 
better way,, but I have done some analysis and I fear that may be very very 
tricky.  I can come up with a write up after some more analysis but overall the 
problem is that the scanner flow has some optimizaitons where we proactively 
close some of the scanner from the heap just because they don't return any 
result (infact we nullify them also).  In such cases just calling close will 
not be enough because already those StoreFileScanners could be closed and we 
will lose the reference to those scanners. 
Hence thought of adding an explicit API to do it.  And added to that for the 
scan case the close() call alone won't work because there are going to be set 
of next() calls for a scan to finish and it makes it better if we clear the 
references of those cells then and there.  And in case of scans the latest 
block would be needed for the subsequent next() calls as Scans are with States.
bq. "In such a case we don’t evict the block if the ref count > 0, instead we 
mark those
blocks with a Boolean."
This is a special case.  In case of compaction after the files are compacted we 
know that the compacted files are no longer needed and we forcefully try to 
evict them from the block cache.  But now if there were any parallel scans 
operating on those files we just cannot evict them. So we use the same ref 
count mechanism and see if the block can really evicted (even if it is 
forceful). All such blocks would automatically be evicted once the read 
operation using that block gets completed.  (in the sense on decrementing a 
'marked' block to 0 we call evict forcefully).  This ensures that the results 
are not corrupted.

> Prevent block eviction under us if reads are in progress from the BBs
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12295
>                 URL: https://issues.apache.org/jira/browse/HBASE-12295
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-12295.pdf, HBASE-12295_trunk.patch
>
>
> While we try to serve the reads from the BBs directly from the block cache, 
> we need to ensure that the blocks does not get evicted under us while 
> reading.  This JIRA is to discuss and implement a strategy for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to