Albert Shau created TWILL-152:
---------------------------------

             Summary: Zookeeper NodeExistsException on AM restarts
                 Key: TWILL-152
                 URL: https://issues.apache.org/jira/browse/TWILL-152
             Project: Apache Twill
          Issue Type: Bug
            Reporter: Albert Shau


If the AM fails and is restarted (for example, due to expiration of AMRM 
token), we see failures starting up again due to zookeeper nodes already 
existing

{code}

java.util.concurrent.ExecutionException: 
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /c072c759-d7bf-488a-a8ca-782a3656392f/runnables
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
 ~[com.google.guava.guava-13.0.1.jar:na]
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281)
 ~[com.google.guava.guava-13.0.1.jar:na]
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[com.google.guava.guava-13.0.1.jar:na]
        at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:94) 
~[org.apache.twill.twill-yarn-0.5.0-incubating.jar:0.5.0-incubating]
        at 
org.apache.twill.internal.appmaster.ApplicationMasterMain.main(ApplicationMasterMain.java:77)
 [org.apache.twill.twill-yarn-0.5.0-incubating.jar:0.5.0-incubating]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.7.0_75]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
~[na:1.7.0_75]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.7.0_75]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75]
        at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:86) 
[launcher.e5147f31-88e3-486c-a6e2-8d33bdc30ebb.jar:na]
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /c072c759-d7bf-488a-a8ca-782a3656392f/runnables
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
 ~[com.google.guava.guava-13.0.1.jar:na]
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281)
 ~[com.google.guava.guava-13.0.1.jar:na]
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[com.google.guava.guava-13.0.1.jar:na]
        at 
org.apache.twill.internal.appmaster.ApplicationMasterService.doStart(ApplicationMasterService.java:222)
 ~[org.apache.twill.twill-yarn-0.5.0-incubating.jar:0.5.0-incubating]
        at 
org.apache.twill.internal.AbstractTwillService.startUp(AbstractTwillService.java:171)
 ~[org.apache.twill.twill-core-0.5.0-incubating.jar:0.5.0-incubating]
        at 
com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)
 ~[com.google.guava.guava-13.0.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_75]
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /c072c759-d7bf-488a-a8ca-782a3656392f/runnables
        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:119) 
~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
        at 
org.apache.twill.internal.zookeeper.DefaultZKClientService$Callbacks$1.processResult(DefaultZKClientService.java:500)
 ~[org.apache.twill.twill-zookeeper-0.5.0-incubating.jar:0.5.0-incubating]
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602) 
~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 
~[org.apache.zookeeper.zookeeper-3.4.5.jar:3.4.5-1392090]
{code}

This is due to the fact that the restarted AM has the same run id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to