surendra singh lilhore created HDFS-3386: --------------------------------------------
Summary: Namenode is not deleting his lock entry '/ledgers/lock/lock-0000X', when fails to acquire the lock Key: HDFS-3386 URL: https://issues.apache.org/jira/browse/HDFS-3386 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: surendra singh lilhore Priority: Minor Fix For: 0.23.0 When a Standby NN becomes Active, it will first create his sequential lock entry create lock-000X in ZK and then tries to acquire the lock as shown below: {quote} myznode = zkc.create(lockpath + "/lock-", new byte[] {'0'}, Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); if ((lockpath + "/" + nodes.get(0)).equals(myznode)) { if (LOG.isTraceEnabled()) { LOG.trace("Lock acquired - " + myznode); } lockCount.set(1); zkc.exists(myznode, this); return; } else { LOG.error("Failed to acquire lock with " + myznode + ", " + nodes.get(0) + " already has it"); throw new IOException("Could not acquire lock"); } {quote} Say the transition to standby fails to acquire the lock it will throw the exception and NN is getting shutdown. Here the problem is, the lock entry lock-000X will exists in the ZK till session expiry and the further start-up will not be able to acquire lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira