[jira] [Commented] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs

ramkrishna.s.vasudevan (JIRA) Fri, 12 Jun 2015 02:10:12 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583173#comment-14583173
 ]


ramkrishna.s.vasudevan commented on HBASE-12295:
------------------------------------------------

Had some internal discussions on some of the questions raised
Thanks for the reviews so far over in RB. We discussed some of your comments 
and thought of finalizing things and also thought it would help to clarify 
certain aspects as why it was done.

-> CellBlock creation 
The first area is the cellblock creation.  As per the previous mail discussion 
now we will have now create the cellBlock in the RsRpcServices layer both for 
scan and gets.
The scans already were creating the cellScanner in the RsRpcServices layer - 
now that we have added the cellBlock creation here.

But in order to do this we are creating a new getScanner() API in HRegion - 
(note that it is not in Region.java) so it is not exposed to CPs. 
This new API would indicate if the client is java based or not.  For java based 
client we will allow the cellblock creation in the RsRpcServices layer and then 
the blocks are returned back (ref count decremented).  
But for the non-java cases we will go with the old getScanner() API only but 
internally it would set a state in the RegionScanner saying that the results 
are to be copied then and there as there is no cellblock creation for it. (We 
don't create cellblock for non-java cases as you may be already knowing it). 
Once the copying of the results are done we are sure that we can return the 
blocks back.

This new state in the REgionScanner also helps us with the case of CPs wrapping 
the scanner where though the scan is from java -client the partial results 
could be used in the CPs and we may not know how the CPs will use the result. 
Hence it is better to copy those results like we do for non-java client cases.
In case of scan() after every next() we make a call to return the blocks so 
that we don't hold them up till the scanners are done. 

Now coming to get(), we have now tweaked the logic of get() in RsrpcServices 
such that what ever was being done in Region.get() is now moved to 
RsRpcServices included CP hooks and once the result is formed we create the 
cellblock in RsRpcServices.  
For the non-java case the old Region.get() API is still used.  Just like in 
scan the results would be copied once it is formed. 
Remember we need to use it for CP when it needs to do a get() on a region 
directly.  
In case of gets() the scanner.close() woudl ensure that the block ref count 
decrement happens. 

-> MemType and CacheType
First of all we can be sure that none of the blocks coming from L1 cache needs 
to go through all these shared memory or copying stuff.  Because they are just 
objects referring to the blocks created from HDFS and nothing is serialized and 
there is no problem of eviction.

We need to do this only for Bucketcache.  BucketCache again has 3 things
-> RamQueue - which is like initially the blocks are added to this queue and 
then moved to the actual buckets. So there are chances the reads are served 
from these queue itself. Those cases there is no need for any copying or 
sharing of memory. (this is similar to L1).
-> ByteBufferIOEngine
This is the actual case where we need to solve.  Here the block is for sure 
from L2 and also the memory is shared.  So if you see in our patch we would 
have set this L2 and shared_memory only at this layer.
-> fileIOEngine
Here though it is a bucketcache impl there is no need for having a shared_mem 
because already there is copy from the file. 

But who is responsible for setting the memory type-? It should be the engine.  
The engine knows what type of memory it is creating.  The engines just return a 
BB and we don't know the type of it. so better the engine set the type of the 
memory also.

-> ReturnBlock in between code in HFileREaders
This is important.  As I earlier commented in the RB, the index or bloom blocks 
from L1 cache can get demoted to L2 cache.  Our code is such that in some case 
we try to read a block and we iterate till we reach a data block. Those blocks 
could have been from L1 or L2 - we are not sure. 
So we have analysed the code and ensured that where ever we read blocks and 
just throw it away till we end up in a data block- all such blocks have to be 
returned back. If the block is from L1 it is going to be a noop.  

I verified it once again and to be safe added some try/finally around it.  I 
can explain more on this if you have some queries.

->IPCUtil singleton
IPCUtil singleton was mainly for ease of use.  The IPCutil had the code to 
create Cellblock. So we wanted to reuse that code rather than duplicating it. 
We could create IPCUTil inside Payload also.  But for doing that we need the 
'conf' object. Most of the cases Payload is created from RPcControllerfactory 
where already we have 'conf'.  But in case of AbstractRPcClient, 
AbstractRpcCaller etc we need to get the access to these conf object so that we 
can pass a 'conf' to Payload and use that conf to create the IPCUtil object.
So in order to avoid this in the latest will pass the IPCtuil to the Payload 
using a helper class which will also have all the other ingredients needed to 
create  a cellblock.  This wil avoid all the singleton related changes to 
IPCUtil.

To which Stack had some comments
Hmm.

RsRpcServices is a weird class. It has not character. It is just something 
Jimmy made to move a bunch of HRegionServer to its own class.

So, will Get be the only thing in Region that cannot be intercepted by CPs?  
Other methods in Region can be intercepted by CPs? Having Get be an exception 
will cause confusion.

Or is it that you are saying that the old Get can be used by CPs to 
intercept... and in this case there will be the unavoidable copy. If so, that 
sounds ok (where can I see current state of code?)

For the MemType and CacheType his reply was
bq.
Thats fine. I just don't like labeling each individual block. High-level, the 
engine knows where the block came from, right? It can stamp the return with 
where it came from rather than do it per block?



> Prevent block eviction under us if reads are in progress from the BBs
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12295
>                 URL: https://issues.apache.org/jira/browse/HBASE-12295
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, 
> HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_trunk.patch
>
>
> While we try to serve the reads from the BBs directly from the block cache, 
> we need to ensure that the blocks does not get evicted under us while 
> reading.  This JIRA is to discuss and implement a strategy for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs

Reply via email to