[ https://issues.apache.org/jira/browse/HBASE-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Dimiduk resolved HBASE-29123. ---------------------------------- Resolution: Fixed Pushed to branch-2.5+. Thanks a lot [~charlesconnell]! > A faster CodecPool for HBase > ---------------------------- > > Key: HBASE-29123 > URL: https://issues.apache.org/jira/browse/HBASE-29123 > Project: HBase > Issue Type: Improvement > Components: HFile, io > Reporter: Charles Connell > Assignee: Charles Connell > Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12 > > Attachments: borrow-decompressor.html, lease-counting.html, > return-decompressor.html > > > I look at many profile flamegraphs of my company's RegionServers. I sometimes > see memory allocation inside of {{org.apache.hadoop.io.compress.CodecPool}} > taking up roughly 1% of my CPU time. The point of a CodecPool is to avoid > allocating short-lived objects, so this is not good. Luckily, these > allocations can be avoided. Attached are three flamegraphs showing the > allocations I'm talking about. > I plan this ticket as the first of a series relating to decompression > performance. In the context of the overall series, it makes sense to fork > CodecPool out of hadoop-common and start a new copy of it in HBase. I'll do > that in this ticket and include my improvements: > Change the pool data structure from {{HashMap<Class<Compressor>, > HashSet<Compressor>>}} to {{ConcurrentHashMap<Class<Compressor>, > ConcurrentSkipListSet<Compressor>>}}. This allows the "borrow" code: > {code} > T codec = null; > Set<T> codecSet; > synchronized (pool) { > codecSet = pool.get(codecClass); > } > if (codecSet != null) { > synchronized (codecSet) { > if (!codecSet.isEmpty()) { > codec = codecSet.iterator().next(); > codecSet.remove(codec); > } > } > } > {code} > to be re-written as: > {code} > if (codecClass == null) { > return null; > } > NavigableSet<T> codecSet = pool.get(codecClass); > if (codecSet != null) { > return codecSet.pollFirst(); > } else { > return null; > } > {code} > thus avoiding the allocation of an iterator and the necessity of locking. > The lease counters are only read in unit tests, so I'll stop updating those > outside of testing. -- This message was sent by Atlassian Jira (v8.20.10#820010)