[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773086#comment-16773086
 ] 

Zheng Hu commented on HBASE-21879:
----------------------------------

I am thinking about another problem:   now we have two allocator, 
ByteBufferAllocator for RPC, BuckeAllocator for BucketCache, and when reading a 
block, we do:
step.1  allocate ByteBuff from ByteBufferAllocator, and read the block;
step.2  put the ByteBuff into ramCache of BucketCache;
step.3  writer thread will persist the ByteBuff into relative io-engine in 
BucketCache;

problem is:  when can we free the ByteBuff allocated in step.1 ? 
In theory,  if the rpc finished,  we can free the ByteBuff,  but ByteBuff may 
disappear after the async write thread persist to io-engine and we've no chance 
to free it,
while we cannot free the ByteBuff once writer thread finished the persistence 
because the rpc may still not finish and ref to the ByteBuff.

On the other hand,  if there're many rpc  hit the same ByteBuff in ramCache,  
then the ByteBuff cannot be free even if some rpc finished.

So maybe an solution is:  save a refCount in ByteBuff,   once allocated, its 
refCount ++,  and if it put into ramCache, then also refCount ++; 
removed from ramCahce or rpc finished, refCount --,  only when the refCount is 
0, we can free the ByteBuff. 

If so, I think we'll need a new patch for HBASE-21916,  each ByteBuff will have 
a refCount to track, just like the netty's ByteBuf

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21879
>                 URL: https://issues.apache.org/jira/browse/HBASE-21879
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
>         Attachments: QPS-latencies-before-HBASE-21879.png, 
> gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
>     long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .....
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>       onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
>         // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to