[ 
https://issues.apache.org/jira/browse/HBASE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026231#comment-16026231
 ] 

Anoop Sam John commented on HBASE-17874:
----------------------------------------

We are doing a slice () call on the duplicated BB (4 MB sized) when we read out 
a block from it.. The slice call changes the capacity to be (newLimit - newPos) 
64 KB.. So functionally it looks ok.  It just behave same way as for the other 
blocks which are read from DFS or L1 cache.  Sorry I was missing the call to 
slice() that we have at the end 
{code}
for (int i = startBuffer, j = 0; i <= endBuffer; ++i, j++) {
      ByteBuffer bb = buffers[i].duplicate();
      if (i == startBuffer) {
        ...
      } else if (i == endBuffer) {
        ...
      } else {
        ...
      }
      mbb[j] = bb.slice();
      
{code}

> Limiting of read request response size based on block size may go wrong when 
> blocks are read from onheap or off heap bucket cache
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17874
>                 URL: https://issues.apache.org/jira/browse/HBASE-17874
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Blocker
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17874.patch
>
>
> HBASE-14978 added this size limiting so as to make sure the multi read 
> requests do not retain two many blocks. This works well when the blocks are 
> obtained from any where other than memory mode BucketCache. In case of on 
> heap or off heap Bucket Cache, the entire cache area is split into N 
> ByteBuffers each of size 4 MB. When we hit a block in this cache, we no 
> longer do copy data into temp array. We use the same shared memory (BB).  Its 
>  capacity is 4 MB.
> The block size accounting logic is RSRpcServices is like below
> {code}
>  if (c instanceof ByteBufferCell) {
>           ByteBufferCell bbCell = (ByteBufferCell) c;
>           ByteBuffer bb = bbCell.getValueByteBuffer();
>           if (bb != lastBlock) {
>             context.incrementResponseBlockSize(bb.capacity());
>             lastBlock = bb;
>           }
>         } else {
>           // We're using the last block being the same as the current block as
>           // a proxy for pointing to a new block. This won't be exact.
>           // If there are multiple gets that bounce back and forth
>           // Then it's possible that this will over count the size of
>           // referenced blocks. However it's better to over count and
>           // use two rpcs than to OOME the regionserver.
>           byte[] valueArray = c.getValueArray();
>           if (valueArray != lastBlock) {
>             context.incrementResponseBlockSize(valueArray.length);
>             lastBlock = valueArray;
>           }
>         }
>               {code}
> We take the BBCell's value buffer and takes its capacity. The cell is backed 
> by the same BB that backs the HFileBlock. When the HFileBlock is created from 
> the BC, we do as below duplicating and proper positioning and limiting the BB
> {code}
>  ByteBuffer bb = buffers[i].duplicate();
>       if (i == startBuffer) {
>         cnt = bufferSize - startBufferOffset;
>         if (cnt > len) cnt = len;
>         bb.limit(startBufferOffset + cnt).position(startBufferOffset);
>               {code}
> Still this BB's capacity is 4 MB.
> This will make the size limit breach to happen too soon. What we expect is 
> block size defaults to 64 KB and so we here by allow cells from different 
> blocks to appear in response. We have a way to check whether we move from one 
> block to next.
> {code}
> if (bb != lastBlock) {
> ...
>             lastBlock = bb;
> }
> {code}
> But already just by considering the 1st cell, we added 4 MB size!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to