[ https://issues.apache.org/jira/browse/HADOOP-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron T. Myers updated HADOOP-11722: ------------------------------------ Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks very much for the contribution, Arun. > Some Instances of Services using ZKDelegationTokenSecretManager go down when > old token cannot be deleted > -------------------------------------------------------------------------------------------------------- > > Key: HADOOP-11722 > URL: https://issues.apache.org/jira/browse/HADOOP-11722 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Arun Suresh > Assignee: Arun Suresh > Fix For: 2.7.0 > > Attachments: HADOOP-11722.1.patch, HADOOP-11722.2.patch > > > The delete node code in {{ZKDelegationTokenSecretManager}} is as follows : > {noformat} > while(zkClient.checkExists().forPath(nodeRemovePath) != null){ > zkClient.delete().guaranteed().forPath(nodeRemovePath); > } > {noformat} > When instances of a Service using {{ZKDelegationTokenSecretManager}} try > deleting a node simutaneously, It is possible that all of them enter into the > while loop in which case, all peers will try to delete the node.. Only 1 will > succeed and the rest will throw an exception.. which will bring down the node. > The Exception is as follows : > {noformat} > 2015-03-15 10:24:54,000 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover thread received unexpected exception > java.lang.RuntimeException: Could not remove Stored Token > ZKDTSMDelegationToken_28 > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.removeStoredToken(ZKDelegationTokenSecretManager.java:770) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:605) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /zkdtsm/ZKDTSMRoot/ZKDTSMTokensRoot/DT_28 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) > at > org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238) > at > org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) > at > org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214) > at > org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.removeStoredToken(ZKDelegationTokenSecretManager.java:764) > ... 4 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)