[ https://issues.apache.org/jira/browse/HBASE-25583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Kyle Purtell resolved HBASE-25583. ----------------------------------------- Hadoop Flags: Reviewed Resolution: Fixed > Handle the NoNode exception in remove log replication and avoid RS crash > ------------------------------------------------------------------------ > > Key: HBASE-25583 > URL: https://issues.apache.org/jira/browse/HBASE-25583 > Project: HBase > Issue Type: Bug > Reporter: Sandeep Pal > Assignee: Sandeep Pal > Priority: Critical > Fix For: 1.7.0 > > > Should not crash the region server it there is a NoNode exception while > removing the log > We should look into the excpetion and if it is NoNode we shouldn't crash. > There might be a possiblity the node was deleted as part of peer tear down. > {code:java} > @Override > public void removeLog(String queueId, String filename) { > try { > String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId); > znode = ZKUtil.joinZNode(znode, filename); > ZKUtil.deleteNode(this.zookeeper, znode); } > catch (KeeperException e) { > this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId > + ", filename=" + filename + ")", e); } > } > {code} > This was the exception observed on region servers: > {code:java} > 2021-02-16 20:11:58,567 FATAL [95922885,xyz_peer] regionserver.HRegionServer > - ABORTING region server regionserver-111,60020,1613495922885: Failed to > remove wal from queue (queueId=xyz_peer, > filename=regionserver-111%2C60020%2C1613495922885.1613505863058) > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/regionserver-111,60020,1613495922885/xyz_peer/regionserver-111%2C60020%2C1613495922885.16135058630 > 58 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:114) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:54) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:890) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:238) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1341) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1330) > at > org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeLog(ReplicationQueuesZKImpl.java:142) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana > ger.java:232) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana > ger.java:222) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(Replica > tionSourceManager.java:198) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.updateLogP > osition(ReplicationSource.java:831) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.shipEdits( > ReplicationSource.java:746) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.run(Replic > ationSource.java:650) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)