Josh Elser created HBASE-25692:
----------------------------------

             Summary: Failure to instantiate WALCellCodec leaks socket
                 Key: HBASE-25692
                 URL: https://issues.apache.org/jira/browse/HBASE-25692
             Project: HBase
          Issue Type: Bug
          Components: Replication
    Affects Versions: 2.4.2, 2.4.1, 2.3.4, 2.3.2, 2.2.6, 2.2.5, 2.4.0, 2.2.4, 
2.1.9, 2.3.3, 2.2.3, 2.1.8, 2.2.2, 2.1.7, 2.1.6, 2.2.1, 2.1.5, 2.0.6, 2.1.4, 
2.3.1, 2.3.0, 2.1.3, 2.1.2, 2.1.1, 2.2.0, 2.1.0
            Reporter: Josh Elser
            Assignee: Josh Elser


I was looking at an HBase user's cluster with [~danilocop] where they saw two 
otherwise identical clusters where one of them was regularly had sockets in 
CLOSE_WAIT going from RegionServers to a distributed storage appliance.

After a lot of analysis, we eventually figured out that these sockets in 
CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
close inside of the RegionServer. The subtlety was that only one of these HBase 
clusters was set up to do replication (to the other cluster). The HBase cluster 
experiencing this problem was shipping edits to a peer, and had previously been 
using Phoenix. At some point, the cluster had Phoenix removed from it.

What we found was that replication still had WALs to ship which were for 
Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
however, this codec class was missing from the RS classpath after the owner of 
the cluster removed Phoenix.

When we try to instantiate the Codec implementation via ReflectionUtils, we end 
up throwing an UnsupportedOperationException which wraps a 
NoClassDefFoundException. However, in WALFactory, we _only_ close the 
FSDataInputStream when we catch an IOException. 

Thus, replication sits in a "fast" loop, trying to ship these edits, each time 
leaking a new socket because of the InputStream not being closed. There is an 
obvious workaround for this specific issue, but we should not leak this inside 
HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to