Charles Connell created HBASE-29218:
---------------------------------------
Summary: Reduce calls to Configuration#get() in decompression path
Key: HBASE-29218
URL: https://issues.apache.org/jira/browse/HBASE-29218
Project: HBase
Issue Type: Improvement
Reporter: Charles Connell
Assignee: Charles Connell
Part of a series of changes from me dedicated to improving decompression speed
(HBASE-29123, HBASE-29135, HBASE-29193). Use of the
{{org.apache.hadoop.conf.Configuration}} class to look up values is not super
fast. It's fine most of the time, but in a very hot code path, it takes up
noticeable CPU time.
{{ByteBuffDecompressor}}s are pooled and reused to avoid garbage collection
churn. This means that sometimes their settings are not right for the block
they're being asked to decompress. To handle this, before every decompression
action, we call {{ByteBuffDecompressor#reinit(Configuration)}}, so it can pull
settings from the Configuration in preparation for the decompression it's about
to do. This {{reinit()}} happens once per block, even though the settings it
deals with are consistent across the entire StoreFile. This uses a lot of CPU
cycles unnecessarily. I've attached two flamegraphs from RegionServers at my
company that do a heavy amount of decompression. One was taken from a period of
notable slowness for that server, and one was taken randomly at a "normal"
time. In both profiles, {{reinit()}} accounts for 2-3% of CPU time.
Because the settings used by a {{ByteBuffDecompressor}} don't actually change
within a StoreFile, we can pull the settings it needs from a {{Configuration}}
when opening the StoreFile, and then not check again. Attached is a PR to do so.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)