wchevreuil commented on PR #8353:
URL: https://github.com/apache/hbase/pull/8353#issuecomment-4710109315

   > > > Mind explaining more about the changes here? I do not fully get what 
you are trying to fix here...
   > > 
   > > 
   > > As explained in the jira description:
   > > When executing ycsb read workloads, we observed a ~30% latency 
degradation (please see flamegrahs attached to the jira).
   > > The problem was that we added logic for parsing the file Path into 
region name, family name, as well checks for archiving all on the BlockCacheKey 
constructor used by HFileReaderImpl on the beginning of each block read. As 
seen on the flame graphs attached covering a five minutes window on one of the 
RSes, around 30% of the CPU time was spent on the BlockCacheKey constructor, 
either calling Path.getParent() or HFileUtils.isHFileArchived().
   > 
   > So the intention here, is to not always call intern when creating 
BlockCacheKey, and move intern to other places?
   
   Sort of. The main problem is not the intern call itself, but the parsing of 
file name, region and CF name from the path. Doing it inside the BlockCacheKey 
constructor is too costly, we saw 10% degradation on latency and throughput of 
ycsb workloads. Moving this parsing to HFileWriterImpl/HFileReaderImpl 
initialisation means we only need to do it once, as all block cache keys on 
each writer/reader instance will refer to same values. 
   
   We could maybe leave the pools and intern calls private to BlockCacheKey, 
but we would need the original parsed strings from getting pooled. Less 
efficient, but I think it's not a big deal, as we would only be doing it at the 
reader/writer level.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to