[ https://issues.apache.org/jira/browse/HBASE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993554#comment-15993554 ]
stack commented on HBASE-17874: ------------------------------- Looks good. It makes it so we stride through in other than 4MB steps? A test [~anoop.hbase]? > Limiting of read request response size based on block size may go wrong when > blocks are read from onheap or off heap bucket cache > --------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-17874 > URL: https://issues.apache.org/jira/browse/HBASE-17874 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-17874.patch > > > HBASE-14978 added this size limiting so as to make sure the multi read > requests do not retain two many blocks. This works well when the blocks are > obtained from any where other than memory mode BucketCache. In case of on > heap or off heap Bucket Cache, the entire cache area is split into N > ByteBuffers each of size 4 MB. When we hit a block in this cache, we no > longer do copy data into temp array. We use the same shared memory (BB). Its > capacity is 4 MB. > The block size accounting logic is RSRpcServices is like below > {code} > if (c instanceof ByteBufferCell) { > ByteBufferCell bbCell = (ByteBufferCell) c; > ByteBuffer bb = bbCell.getValueByteBuffer(); > if (bb != lastBlock) { > context.incrementResponseBlockSize(bb.capacity()); > lastBlock = bb; > } > } else { > // We're using the last block being the same as the current block as > // a proxy for pointing to a new block. This won't be exact. > // If there are multiple gets that bounce back and forth > // Then it's possible that this will over count the size of > // referenced blocks. However it's better to over count and > // use two rpcs than to OOME the regionserver. > byte[] valueArray = c.getValueArray(); > if (valueArray != lastBlock) { > context.incrementResponseBlockSize(valueArray.length); > lastBlock = valueArray; > } > } > {code} > We take the BBCell's value buffer and takes its capacity. The cell is backed > by the same BB that backs the HFileBlock. When the HFileBlock is created from > the BC, we do as below duplicating and proper positioning and limiting the BB > {code} > ByteBuffer bb = buffers[i].duplicate(); > if (i == startBuffer) { > cnt = bufferSize - startBufferOffset; > if (cnt > len) cnt = len; > bb.limit(startBufferOffset + cnt).position(startBufferOffset); > {code} > Still this BB's capacity is 4 MB. > This will make the size limit breach to happen too soon. What we expect is > block size defaults to 64 KB and so we here by allow cells from different > blocks to appear in response. We have a way to check whether we move from one > block to next. > {code} > if (bb != lastBlock) { > ... > lastBlock = bb; > } > {code} > But already just by considering the 1st cell, we added 4 MB size! -- This message was sent by Atlassian JIRA (v6.3.15#6346)