[ https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guanghao Zhang updated HBASE-18111: ----------------------------------- Attachment: HBASE-18111.patch > Replication stuck when cluster connection is closed > --------------------------------------------------- > > Key: HBASE-18111 > URL: https://issues.apache.org/jira/browse/HBASE-18111 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Attachments: HBASE-18111.patch > > > Log: > {code} > 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] > org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum > member failed: javax.security.sasl.SaslException: An error: > (java.security.PrivilegedActionException: javax.security.sasl.SaslException: > GSS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper > Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED > state. > 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] > org.apache.hadoop.hbase.client.HConnectionImplementation: > hconnection-0x1148dd9b-0x35b6b4d4ca999c6, > quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000, > baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 > received auth failed from ZooKeeper, aborting > org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = > AuthFailed > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] > org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper > sessionid=0x35b6b4d4ca999c6 > 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint: > Replicate edites to peer cluster failed. > java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local > exception: java.io.IOException: Connection closed > {code} > jstack > {code} > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492) > {code} > The cluster connection was aborted when the ZookeeperWatcher receive a > AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() > method will stuck in a while loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346)