This can happen when there all instances (participants/controllers) were not shutdown in the previous run. From your logs, it looks like there was a CONTROLLER that was still running when the new yarn application was started.
On Tue, Feb 16, 2016 at 11:08 PM, Joel Baranick (JIRA) <[email protected]> wrote: > Joel Baranick created HELIX-628: > ----------------------------------- > > Summary: ZKHelixAdmin silently fails to fully cleanup the ZK > structure > Key: HELIX-628 > URL: https://issues.apache.org/jira/browse/HELIX-628 > Project: Apache Helix > Issue Type: Bug > Affects Versions: 0.6.x > Reporter: Joel Baranick > > > For some reason, the ZKHelixAdmin silently fails to fully cleanup the ZK > structure corresponding to the Helix cluster instance even if it is > configured to do the cleanup before everything else starts up. This causes > the Yarn application to fail to start. > > {code:title=Shutdown|borderStyle=solid} > 2016-02-17 06:25:01 UTC INFO [Thread-4] > gobblin.yarn.GobblinYarnAppLauncher 301 - Stopping the > GobblinYarnAppLauncher > 2016-02-17 06:25:01 UTC INFO [Thread-4] > org.apache.helix.messaging.DefaultMessagingService 84 - Send 1 messages > with criteria instanceName=%resourceName=%partitionName=%partitionState=% > 2016-02-17 06:25:02 UTC INFO [LogCopier STOPPING] > gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: > java.util.concurrent.ScheduledThreadPoolExecutor@73240b61[Shutting down, > pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1862] > 2016-02-17 06:25:02 UTC INFO [LogCopier STOPPING] > gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: > java.util.concurrent.ScheduledThreadPoolExecutor@73240b61[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1862] > 2016-02-17 06:25:02 UTC INFO [JobExecutionInfoServer STOPPING] > gobblin.rest.JobExecutionInfoServer 94 - Stopping the job execution > information server > Shutting down > 2016-02-17 06:25:02 UTC INFO [AdminWebServer STOPPING] > org.eclipse.jetty.server.AbstractConnector 306 - Stopped > ServerConnector@35e0c350{HTTP/1.1}{localhost:8280} > 2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 125 > - Attempting to shutdown ExecutorService: > java.util.concurrent.Executors$DelegatedScheduledExecutorService@185aaf1f > 2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 144 > - Successfully shutdown ExecutorService: > java.util.concurrent.Executors$DelegatedScheduledExecutorService@185aaf1f > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.manager.zk.ZKHelixManager 546 - disconnect > ip-169-0-0-1(SPECTATOR) from GobblinYarn > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.messaging.handling.HelixTaskExecutor 679 - Shutting down > HelixTaskExecutor > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.messaging.handling.HelixTaskExecutor 443 - Reset > HelixTaskExecutor > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.messaging.handling.HelixTaskExecutor 453 - Reset exectuor > for msgType: TASK_REPLY, pool: > java.util.concurrent.ThreadPoolExecutor@3f197a46[Running, pool size = 0, > active threads = 0, queued tasks = 0, completed tasks = 0] > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.messaging.handling.HelixTaskExecutor 397 - Shutting down > pool: java.util.concurrent.ThreadPoolExecutor@3f197a46[Running, pool size > = 0, active threads = 0, queued tasks = 0, completed tasks = 0] > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.messaging.handling.HelixTaskExecutor 684 - Shutdown > HelixTaskExecutor finished > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.manager.zk.ZkClient 130 - Closing zkclient: > State:CONNECTED Timeout:30000 sessionid:0xd452eb397b640065 local:/ > 169.0.0.1:51319 remoteserver:ip-138-0-0-1.ec2.internal/138.0.0.2:2181 > lastZxid:60129782948 xid:17 sent:140 recv:140 queuedpkts:0 pendingresp:0 > queuedevents:0 > 2016-02-17 06:25:02 UTC INFO [ZkClient-EventThread-17-zk.server:2181] > org.I0Itec.zkclient.ZkEventThread 82 - Terminate ZkClient event thread. > 2016-02-17 06:25:02 UTC INFO [main-EventThread] > org.apache.zookeeper.ClientCnxn$EventThread 512 - EventThread shut down > 2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.zookeeper.ZooKeeper > 684 - Session: 0xd452eb397b640065 closed > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.manager.zk.ZkClient 157 - Closed zkclient > 2016-02-17 06:25:02 UTC INFO [Thread-4] > org.apache.helix.manager.zk.ZKHelixManager 570 - Cluster manager: > ip-169-0-0-1 disconnected > 2016-02-17 06:25:02 UTC INFO [Thread-4] > gobblin.yarn.GobblinYarnAppLauncher 722 - Deleting application working > directory hdfs:// > ec2-145-0-0-1.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1455654714320_0004 > {code} > > {code:title=Startup|borderStyle=solid} > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr > /lib64:/lib64:/lib:/usr/lib > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:java.io.tmpdir=/tmp > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:java.compiler=<NA> > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:os.name=Linux > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:os.arch=amd64 > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:os.version=3.19.0-49-generic > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:user.name=yarn > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:user.home=/home/yarn > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 > - Client environment:user.dir=/opt/app/gobblin/00046-cfdc834 > 2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.ZooKeeper 438 - > Initiating client connection, connectString=zk.server:2181 > sessionTimeout=30000 watcher=org.apache.helix.manager.zk.ZkClient@35e52059 > 2016-02-17 06:51:32 UTC INFO > [main-SendThread(ip-169-0-0-1.ec2.internal:2181)] > org.apache.zookeeper.ClientCnxn$SendThread 975 - Opening socket connection > to server ip-169-0-0-1.ec2.internal/169.0.0.1:2181. Will not attempt to > authenticate using SASL (unknown error) > 2016-02-17 06:51:32 UTC INFO > [main-SendThread(ip-169-0-0-1.ec2.internal:2181)] > org.apache.zookeeper.ClientCnxn$SendThread 852 - Socket connection > established to ip-169-0-0-1.ec2.internal/169.0.0.1:2181, initiating > session > 2016-02-17 06:51:32 UTC INFO > [main-SendThread(ip-169-0-0-1.ec2.internal:2181)] > org.apache.zookeeper.ClientCnxn$SendThread 1235 - Session establishment > complete on server ip-169-0-0-1.ec2.internal/169.0.0.1:2181, sessionid = > 0x5b52eb397b640080, negotiated timeout = 30000 > 2016-02-17 06:51:32 UTC INFO [main-EventThread] > org.I0Itec.zkclient.ZkClient 449 - zookeeper state changed (SyncConnected) > 2016-02-17 06:51:32 UTC WARN [main] > org.apache.helix.manager.zk.ZKHelixAdmin 495 - Root directory > exists.Cleaning the root directory:/GobblinYarn > Exception in thread "main" org.I0Itec.zkclient.exception.ZkException: > org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = > Directory not empty for /GobblinYarn/CONTROLLER > at > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68) > at > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) > at org.apache.helix.manager.zk.ZkClient.delete(ZkClient.java:348) > at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:516) > at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:511) > at > org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:496) > at > org.apache.helix.tools.ClusterSetup.addCluster(ClusterSetup.java:154) > at > gobblin.yarn.YarnHelixUtils.createGobblinYarnHelixCluster(YarnHelixUtils.java:67) > at > gobblin.yarn.GobblinYarnAppLauncher.launch(GobblinYarnAppLauncher.java:243) > at > gobblin.yarn.GobblinYarnAppLauncher.main(GobblinYarnAppLauncher.java:784) > Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: > KeeperErrorCode = Directory not empty for /GobblinYarn/CONTROLLER > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:125) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) > at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:91) > at org.apache.helix.manager.zk.ZkClient$8.call(ZkClient.java:352) > at > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) > ... 8 more > 2016-02-17 06:51:33 UTC INFO [Thread-4] > gobblin.yarn.GobblinYarnAppLauncher 301 - Stopping the > GobblinYarnAppLauncher > 2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 125 > - Attempting to shutdown ExecutorService: > java.util.concurrent.Executors$DelegatedScheduledExecutorService@2c68b710 > 2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 144 > - Successfully shutdown ExecutorService: > java.util.concurrent.Executors$DelegatedScheduledExecutorService@2c68b710 > {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
