+1, as the numbers suggest significant perf improvement.
On Thu, Mar 9, 2023 at 9:36 AM Bryan Beaudreault <bbeaudrea...@apache.org> wrote: > Hey all, > > We have a use-case for HFile.Reader at my company. The use-case in question > is scanning many hfiles for a small subset of cells, so it mostly is > just iterating a large number of HFiles and discarding most of the cells. > We recently upgraded that deployable from super old cdh 5.16 (hbase > 1.2-ish) to hbase 2.5.3. In doing so, we noticed a performance regression > of around 4x. I imagine this regression would also affect > ClientSideRegionScanner. > > We did some profiling and noticed that a large amount of time is spent in > SingleByteBuff.checkRefCnt. It seems like every SingleByteBuff method calls > checkRefCnt and this checks compares a volatile > in AtomicIntegerFieldUpdater in the netty code. > > I believe ref counting is mostly necessary for off-heap buffers, but > on-heap buffers are also wrapped in SingleByteBuff and so also go through > checkRefCnt. We removed the checkRefCnt call, and the regression > disappeared. > > We created a simple test method which just does HFile.createReader, > reader.getScanner(), and then iterates the scanner counting the total > cells. The test reads an hfile with 100M cells and takes over 1 minute > with checkRefCnt. Removing checkRefCnt brings the runtime down to 20s. > > It's worth noting that the regression is most prominent on java17. It's > slightly less obvious on java11, with runtime being 40s vs 28s. > > Thoughts on updating SingleByteBuff to skip the checkRefCnt call for > on-heap buffers? We can handle this in the constructor, when wrapping an > on-heap buffer here [1]. > > [1] > > https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBuffAllocator.java#L300 >