[
https://issues.apache.org/jira/browse/HBASE-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Dimiduk resolved HBASE-29123.
----------------------------------
Resolution: Fixed
Pushed to branch-2.5+. Thanks a lot [~charlesconnell]!
> A faster CodecPool for HBase
> ----------------------------
>
> Key: HBASE-29123
> URL: https://issues.apache.org/jira/browse/HBASE-29123
> Project: HBase
> Issue Type: Improvement
> Components: HFile, io
> Reporter: Charles Connell
> Assignee: Charles Connell
> Priority: Minor
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12
>
> Attachments: borrow-decompressor.html, lease-counting.html,
> return-decompressor.html
>
>
> I look at many profile flamegraphs of my company's RegionServers. I sometimes
> see memory allocation inside of {{org.apache.hadoop.io.compress.CodecPool}}
> taking up roughly 1% of my CPU time. The point of a CodecPool is to avoid
> allocating short-lived objects, so this is not good. Luckily, these
> allocations can be avoided. Attached are three flamegraphs showing the
> allocations I'm talking about.
> I plan this ticket as the first of a series relating to decompression
> performance. In the context of the overall series, it makes sense to fork
> CodecPool out of hadoop-common and start a new copy of it in HBase. I'll do
> that in this ticket and include my improvements:
> Change the pool data structure from {{HashMap<Class<Compressor>,
> HashSet<Compressor>>}} to {{ConcurrentHashMap<Class<Compressor>,
> ConcurrentSkipListSet<Compressor>>}}. This allows the "borrow" code:
> {code}
> T codec = null;
> Set<T> codecSet;
> synchronized (pool) {
> codecSet = pool.get(codecClass);
> }
> if (codecSet != null) {
> synchronized (codecSet) {
> if (!codecSet.isEmpty()) {
> codec = codecSet.iterator().next();
> codecSet.remove(codec);
> }
> }
> }
> {code}
> to be re-written as:
> {code}
> if (codecClass == null) {
> return null;
> }
> NavigableSet<T> codecSet = pool.get(codecClass);
> if (codecSet != null) {
> return codecSet.pollFirst();
> } else {
> return null;
> }
> {code}
> thus avoiding the allocation of an iterator and the necessity of locking.
> The lease counters are only read in unit tests, so I'll stop updating those
> outside of testing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)