wchevreuil commented on PR #8353: URL: https://github.com/apache/hbase/pull/8353#issuecomment-4710109315
> > > Mind explaining more about the changes here? I do not fully get what you are trying to fix here... > > > > > > As explained in the jira description: > > When executing ycsb read workloads, we observed a ~30% latency degradation (please see flamegrahs attached to the jira). > > The problem was that we added logic for parsing the file Path into region name, family name, as well checks for archiving all on the BlockCacheKey constructor used by HFileReaderImpl on the beginning of each block read. As seen on the flame graphs attached covering a five minutes window on one of the RSes, around 30% of the CPU time was spent on the BlockCacheKey constructor, either calling Path.getParent() or HFileUtils.isHFileArchived(). > > So the intention here, is to not always call intern when creating BlockCacheKey, and move intern to other places? Sort of. The main problem is not the intern call itself, but the parsing of file name, region and CF name from the path. Doing it inside the BlockCacheKey constructor is too costly, we saw 10% degradation on latency and throughput of ycsb workloads. Moving this parsing to HFileWriterImpl/HFileReaderImpl initialisation means we only need to do it once, as all block cache keys on each writer/reader instance will refer to same values. We could maybe leave the pools and intern calls private to BlockCacheKey, but we would need the original parsed strings from getting pooled. Less efficient, but I think it's not a big deal, as we would only be doing it at the reader/writer level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
