Much thanks to Duo for his review, this will be merged shortly to active branches.
In the PR we brainstormed ways to also optimize *off-heap* ByteBuffs. That discussion is continuing in https://issues.apache.org/jira/browse/HBASE-27730 if interested. On Tue, Mar 14, 2023 at 8:45 PM Bryan Beaudreault <bbeaudrea...@apache.org> wrote: > For those following along, I’ve submitted an initial PR for this issue. > https://github.com/apache/hbase/pull/5104 > > On Mon, Mar 13, 2023 at 11:46 AM Bryan Beaudreault <bbeaudrea...@gmail.com> > wrote: > >> Thanks everyone for chiming in. I submitted >> https://issues.apache.org/jira/browse/HBASE-27710 and will work on it in >> the near future. >> >> This does make me wonder what overhead checkRefCount causes in off-heap >> cases too, but we just dont see it in profiling because the server does so >> much other work. It has me thinking if there are other optimizations we >> could do there. For example, currently we do checkRefCount for every single >> ByteBuff operation (slice/duplicate/get/etc/etc). A lot of the buffer >> operations happen before creating a Cell for return to user in >> HFileBlock/DBE. I wonder if we could do all the parsing we need in >> HFileBlock/DBE without checkRefCount, do a single checkRefCount call when >> we create the Cell, then enable checkRefCount for all future ByteBuff calls >> before returning a cell to upper levels. This would probably be a larger >> project, so not investigating now. Just an idea i had. >> >> On Mon, Mar 13, 2023 at 2:06 AM Viraj Jasani <vjas...@apache.org> wrote: >> >>> +1, as the numbers suggest significant perf improvement. >>> >>> >>> On Thu, Mar 9, 2023 at 9:36 AM Bryan Beaudreault < >>> bbeaudrea...@apache.org> >>> wrote: >>> >>> > Hey all, >>> > >>> > We have a use-case for HFile.Reader at my company. The use-case in >>> question >>> > is scanning many hfiles for a small subset of cells, so it mostly is >>> > just iterating a large number of HFiles and discarding most of the >>> cells. >>> > We recently upgraded that deployable from super old cdh 5.16 (hbase >>> > 1.2-ish) to hbase 2.5.3. In doing so, we noticed a performance >>> regression >>> > of around 4x. I imagine this regression would also affect >>> > ClientSideRegionScanner. >>> > >>> > We did some profiling and noticed that a large amount of time is spent >>> in >>> > SingleByteBuff.checkRefCnt. It seems like every SingleByteBuff method >>> calls >>> > checkRefCnt and this checks compares a volatile >>> > in AtomicIntegerFieldUpdater in the netty code. >>> > >>> > I believe ref counting is mostly necessary for off-heap buffers, but >>> > on-heap buffers are also wrapped in SingleByteBuff and so also go >>> through >>> > checkRefCnt. We removed the checkRefCnt call, and the regression >>> > disappeared. >>> > >>> > We created a simple test method which just does HFile.createReader, >>> > reader.getScanner(), and then iterates the scanner counting the total >>> > cells. The test reads an hfile with 100M cells and takes over 1 minute >>> > with checkRefCnt. Removing checkRefCnt brings the runtime down to 20s. >>> > >>> > It's worth noting that the regression is most prominent on java17. It's >>> > slightly less obvious on java11, with runtime being 40s vs 28s. >>> > >>> > Thoughts on updating SingleByteBuff to skip the checkRefCnt call for >>> > on-heap buffers? We can handle this in the constructor, when wrapping >>> an >>> > on-heap buffer here [1]. >>> > >>> > [1] >>> > >>> > >>> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBuffAllocator.java#L300 >>> > >>> >>