[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767850#comment-16767850 ]
Zheng Hu commented on HBASE-21879: ---------------------------------- There's a issue HDFS-3246 for ByteBuffer pread before, but no progress since 12/May/16. I think we need to make this forward. > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > ------------------------------------------------------------------------------------------ > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Major > Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4 > > Attachments: QPS-latencies-before-HBASE-21879.png, > gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // ..... > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)