[ https://issues.apache.org/jira/browse/HBASE-27249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571989#comment-17571989 ]
Duo Zhang commented on HBASE-27249: ----------------------------------- Mind explaining more on the problem? Thanks. > Remove invalid peer RegionServer crash > -------------------------------------- > > Key: HBASE-27249 > URL: https://issues.apache.org/jira/browse/HBASE-27249 > Project: HBase > Issue Type: Bug > Reporter: zhengsicheng > Assignee: zhengsicheng > Priority: Major > > add_peer 'test', CLUSTER_KEY => "zookeeper-01:2181:/hbase_01" > remove_peer 'test' > find add peer wrong, remove peer but regionserver crash > The log information is as follows: > 2022-07-18 13:26:11,016 ERROR > [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff-SendThread(zookeeper-01:2181)] > client.StaticHostProvider: Unable to resolve address: > zookeeper-01/<unresolved>:2181 > java.net.UnknownHostException: zookeeper-01 > at > java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:800) > at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1507) > at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1366) > at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1300) > at > org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92) > at > org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137) > 2022-07-18 13:26:11,016 WARN > [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff-SendThread(zookeeper-01:2181)] > zookeeper.ClientCnxn: Session 0x0 for server zookeeper-01/<unresolved>:2181, > unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > zookeeper-01/<unresolved>:2181 because it's not resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:71) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:39) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1087) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1139) > 2022-07-18 13:26:11,116 WARN [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff] > zookeeper.ReadOnlyZKClient: 0x44281bff to zookeeper-01:2181 failed for get of > /hbase_01/hbaseid, code = CONNECTIONLOSS, retries = 48 > 2022-07-18 13:26:11,119 WARN [regionserver/ip1:16020.logRoller] > regionserver.ReplicationSource: peerId=test, WAL group > ip1%2C16020%2C1658118295598.ip1%2C16020%2C1658118295598.regiongroup-2 queue > size: 11 exceeds value of replication.source.log.queue.warn 2 > 2022-07-18 13:26:12,055 INFO [MemStoreFlusher.1] regionserver.HRegion: > Flushing 31bbfb9b76b6795e5d44fabd113174c0 1/2 column families, > dataSize=245.67 MB heapSize=257.48 MB; f1={dataSize=245.67 MB, > heapSize=257.48 MB, offHeapSize=0 B} > 2022-07-18 13:26:12,116 ERROR > [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff-SendThread(zookeeper-01:2181)] > client.StaticHostProvider: Unable to resolve address: > zookeeper-01/<unresolved>:2181 > java.net.UnknownHostException: zookeeper-01 > at > java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:800) > at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1507) > at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1366) > at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1300) > at > org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92) > at > org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137) > 2022-07-18 13:26:30,270 INFO [RS_REFRESH_PEER-regionserver/ip1:16020-1] > regionserver.RefreshPeerCallable: Received a peer change event, peerId=test, > type=REMOVE_PEER > 2022-07-18 13:26:30,270 INFO [RS_REFRESH_PEER-regionserver/ip1:16020-1] > regionserver.ReplicationSourceManager: Number of deleted recovered sources > for test: 0 > 2022-07-18 13:26:30,270 INFO [RS_REFRESH_PEER-regionserver/ip1:16020-1] > regionserver.ReplicationSource: peerId=test, Closing source test because: > Replication stream was removed by a user > 2022-07-18 13:26:30,271 WARN > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > client.ConnectionImplementation: Retrieve cluster id failed > java.lang.InterruptedException > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2063) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:583) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:316) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) > at > org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:230) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:691) > at java.base/javax.security.auth.Subject.doAs(Subject.java:425) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1830) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:228) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:128) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.createConnection(HBaseInterClusterReplicationEndpoint.java:140) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:172) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:340) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:557) > at java.base/java.lang.Thread.run(Thread.java:832) > 2022-07-18 13:26:30,271 INFO > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > zookeeper.RecoverableZooKeeper: Process identifier=connection to cluster: > test connecting to ZooKeeper ensemble=zookeeper-01:2181 > 2022-07-18 13:26:30,271 INFO > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > zookeeper.ZooKeeper: Initiating client connection, > connectString=zookeeper-01:2181 sessionTimeout=90000 > watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@70ad0136 > 2022-07-18 13:26:30,271 INFO > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > zookeeper.ClientCnxnSocket: jute.maxbuffer value is 67108864 Bytes > 2022-07-18 13:26:30,272 INFO > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled= > 2022-07-18 13:26:30,272 ERROR > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test-SendThread()] > client.StaticHostProvider: Unable to resolve address: > zookeeper-01/<unresolved>:2181 > java.net.UnknownHostException: zookeeper-01 > at > java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:800) > at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1507) > at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1366) > at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1300) > at > org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92) > at > org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137) > 2022-07-18 13:26:30,272 ERROR > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > regionserver.ReplicationSource: Unexpected exception in > RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test > currentPath=null > java.lang.IllegalStateException: Source should be active. > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:581) > at java.base/java.lang.Thread.run(Thread.java:832) > 2022-07-18 13:26:30,272 WARN > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test-SendThread(zookeeper-01:2181)] > zookeeper.ClientCnxn: Session 0x0 for server zookeeper-01/<unresolved>:2181, > unexpected error, closing socket connection and attempting reconnect > java.lang.IllegalArgumentException: Unable to canonicalize address > zookeeper-01/<unresolved>:2181 because it's not resolvable > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:71) > at > org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:39) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1087) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1139) > 2022-07-18 13:26:30,274 ERROR > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > regionserver.HRegionServer: ***** ABORTING region server > ip1,16020,1658118295598: Unexpected exception in > RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test ***** > java.lang.IllegalStateException: Source should be active. > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:581) > at java.base/java.lang.Thread.run(Thread.java:832) > 2022-07-18 13:26:30,275 ERROR > [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: > [org.apache.hadoop.hbase.security.access.AccessController, > org.apache.hadoop.hbase.replication.regionserver.ReplicationObserver] -- This message was sent by Atlassian Jira (v8.20.10#820010)