[ https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322510#comment-17322510 ]
Hudson commented on HBASE-25692: -------------------------------- Results for branch branch-2.2 [build #205 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Failure to instantiate WALCellCodec leaks socket in replication > --------------------------------------------------------------- > > Key: HBASE-25692 > URL: https://issues.apache.org/jira/browse/HBASE-25692 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, > 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, > 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2 > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6 > > > I was looking at an HBase user's cluster with [~danilocop] where they saw two > otherwise identical clusters where one of them was regularly had sockets in > CLOSE_WAIT going from RegionServers to a distributed storage appliance. > After a lot of analysis, we eventually figured out that these sockets in > CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to > close inside of the RegionServer. The subtlety was that only one of these > HBase clusters was set up to do replication (to the other cluster). The HBase > cluster experiencing this problem was shipping edits to a peer, and had > previously been using Phoenix. At some point, the cluster had Phoenix removed > from it. > What we found was that replication still had WALs to ship which were for > Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; > however, this codec class was missing from the RS classpath after the owner > of the cluster removed Phoenix. > When we try to instantiate the Codec implementation via ReflectionUtils, we > end up throwing an UnsupportedOperationException which wraps a > NoClassDefFoundException. However, in WALFactory, we _only_ close the > FSDataInputStream when we catch an IOException. > Thus, replication sits in a "fast" loop, trying to ship these edits, each > time leaking a new socket because of the InputStream not being closed. There > is an obvious workaround for this specific issue, but we should not leak this > inside HBase. > Approximate, 2.1.x stack trace which lead us to this is below. > {noformat} > 2021-03-11 18:19:20,364 ERROR > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: > Failed to read stream of replication entries > java.io.IOException: Cannot get log reader > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138) > Caused by: java.lang.UnsupportedOperationException: Unable to find > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec > at > org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:47) > at > org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:301) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:311) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:81) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:321) > ... 10 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:264) > at > org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:43) > ... 16 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)