Anoop Sam John created HBASE-17874:
--------------------------------------

             Summary: Limiting of read request response size based on block 
size may go wrong when blocks are read from onheap or off heap bucket cache
                 Key: HBASE-17874
                 URL: https://issues.apache.org/jira/browse/HBASE-17874
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: Anoop Sam John
            Assignee: Anoop Sam John
            Priority: Blocker
             Fix For: 2.0.0


HBASE-14978 added this size limiting so as to make sure the multi read requests 
do not retain two many blocks. This works well when the blocks are obtained 
from any where other than memory mode BucketCache. In case of on heap or off 
heap Bucket Cache, the entire cache area is split into N ByteBuffers each of 
size 4 MB. When we hit a block in this cache, we no longer do copy data into 
temp array. We use the same shared memory (BB).  Its  capacity is 4 MB.
The block size accounting logic is RSRpcServices is like below
{code}
 if (c instanceof ByteBufferCell) {
          ByteBufferCell bbCell = (ByteBufferCell) c;
          ByteBuffer bb = bbCell.getValueByteBuffer();
          if (bb != lastBlock) {
            context.incrementResponseBlockSize(bb.capacity());
            lastBlock = bb;
          }
        } else {
          // We're using the last block being the same as the current block as
          // a proxy for pointing to a new block. This won't be exact.
          // If there are multiple gets that bounce back and forth
          // Then it's possible that this will over count the size of
          // referenced blocks. However it's better to over count and
          // use two rpcs than to OOME the regionserver.
          byte[] valueArray = c.getValueArray();
          if (valueArray != lastBlock) {
            context.incrementResponseBlockSize(valueArray.length);
            lastBlock = valueArray;
          }
        }
                {code}
We take the BBCell's value buffer and takes its capacity. The cell is backed by 
the same BB that backs the HFileBlock. When the HFileBlock is created from the 
BC, we do as below duplicating and proper positioning and limiting the BB
{code}
 ByteBuffer bb = buffers[i].duplicate();
      if (i == startBuffer) {
        cnt = bufferSize - startBufferOffset;
        if (cnt > len) cnt = len;
        bb.limit(startBufferOffset + cnt).position(startBufferOffset);
                {code}
Still this BB's capacity is 4 MB.
This will make the size limit breach to happen too soon. What we expect is 
block size defaults to 64 KB and so we here by allow cells from different 
blocks to appear in response. We have a way to check whether we move from one 
block to next.
{code}
if (bb != lastBlock) {
...
            lastBlock = bb;
}
{code}
But already just by considering the 1st cell, we added 4 MB size!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to