Joel Baranick created HELIX-628:
-----------------------------------
Summary: ZKHelixAdmin silently fails to fully cleanup the ZK
structure
Key: HELIX-628
URL: https://issues.apache.org/jira/browse/HELIX-628
Project: Apache Helix
Issue Type: Bug
Affects Versions: 0.6.x
Reporter: Joel Baranick
For some reason, the ZKHelixAdmin silently fails to fully cleanup the ZK
structure corresponding to the Helix cluster instance even if it is configured
to do the cleanup before everything else starts up. This causes the Yarn
application to fail to start.
{code:title=Shutdown|borderStyle=solid}
2016-02-17 06:25:01 UTC INFO [Thread-4] gobblin.yarn.GobblinYarnAppLauncher
301 - Stopping the GobblinYarnAppLauncher
2016-02-17 06:25:01 UTC INFO [Thread-4]
org.apache.helix.messaging.DefaultMessagingService 84 - Send 1 messages with
criteria instanceName=%resourceName=%partitionName=%partitionState=%
2016-02-17 06:25:02 UTC INFO [LogCopier STOPPING] gobblin.util.ExecutorsUtils
125 - Attempting to shutdown ExecutorService:
java.util.concurrent.ScheduledThreadPoolExecutor@73240b61[Shutting down, pool
size = 1, active threads = 0, queued tasks = 0, completed tasks = 1862]
2016-02-17 06:25:02 UTC INFO [LogCopier STOPPING] gobblin.util.ExecutorsUtils
144 - Successfully shutdown ExecutorService:
java.util.concurrent.ScheduledThreadPoolExecutor@73240b61[Terminated, pool size
= 0, active threads = 0, queued tasks = 0, completed tasks = 1862]
2016-02-17 06:25:02 UTC INFO [JobExecutionInfoServer STOPPING]
gobblin.rest.JobExecutionInfoServer 94 - Stopping the job execution
information server
Shutting down
2016-02-17 06:25:02 UTC INFO [AdminWebServer STOPPING]
org.eclipse.jetty.server.AbstractConnector 306 - Stopped
ServerConnector@35e0c350{HTTP/1.1}{localhost:8280}
2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 125 -
Attempting to shutdown ExecutorService:
java.util.concurrent.Executors$DelegatedScheduledExecutorService@185aaf1f
2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 144 -
Successfully shutdown ExecutorService:
java.util.concurrent.Executors$DelegatedScheduledExecutorService@185aaf1f
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.manager.zk.ZKHelixManager 546 - disconnect
ip-169-0-0-1(SPECTATOR) from GobblinYarn
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.messaging.handling.HelixTaskExecutor 679 - Shutting down
HelixTaskExecutor
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.messaging.handling.HelixTaskExecutor 443 - Reset
HelixTaskExecutor
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.messaging.handling.HelixTaskExecutor 453 - Reset exectuor for
msgType: TASK_REPLY, pool:
java.util.concurrent.ThreadPoolExecutor@3f197a46[Running, pool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 0]
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.messaging.handling.HelixTaskExecutor 397 - Shutting down
pool: java.util.concurrent.ThreadPoolExecutor@3f197a46[Running, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 0]
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.messaging.handling.HelixTaskExecutor 684 - Shutdown
HelixTaskExecutor finished
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.manager.zk.ZkClient
130 - Closing zkclient: State:CONNECTED Timeout:30000
sessionid:0xd452eb397b640065 local:/169.0.0.1:51319
remoteserver:ip-138-0-0-1.ec2.internal/138.0.0.2:2181 lastZxid:60129782948
xid:17 sent:140 recv:140 queuedpkts:0 pendingresp:0 queuedevents:0
2016-02-17 06:25:02 UTC INFO [ZkClient-EventThread-17-zk.server:2181]
org.I0Itec.zkclient.ZkEventThread 82 - Terminate ZkClient event thread.
2016-02-17 06:25:02 UTC INFO [main-EventThread]
org.apache.zookeeper.ClientCnxn$EventThread 512 - EventThread shut down
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.zookeeper.ZooKeeper 684 -
Session: 0xd452eb397b640065 closed
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.manager.zk.ZkClient
157 - Closed zkclient
2016-02-17 06:25:02 UTC INFO [Thread-4]
org.apache.helix.manager.zk.ZKHelixManager 570 - Cluster manager: ip-169-0-0-1
disconnected
2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.yarn.GobblinYarnAppLauncher
722 - Deleting application working directory
hdfs://ec2-145-0-0-1.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1455654714320_0004
{code}
{code:title=Startup|borderStyle=solid}
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr
/lib64:/lib64:/lib:/usr/lib
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:java.io.tmpdir=/tmp
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:java.compiler=<NA>
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:os.name=Linux
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:os.arch=amd64
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:os.version=3.19.0-49-generic
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:user.name=yarn
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:user.home=/home/yarn
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 -
Client environment:user.dir=/opt/app/gobblin/00046-cfdc834
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.ZooKeeper 438 -
Initiating client connection, connectString=zk.server:2181 sessionTimeout=30000
watcher=org.apache.helix.manager.zk.ZkClient@35e52059
2016-02-17 06:51:32 UTC INFO [main-SendThread(ip-169-0-0-1.ec2.internal:2181)]
org.apache.zookeeper.ClientCnxn$SendThread 975 - Opening socket connection to
server ip-169-0-0-1.ec2.internal/169.0.0.1:2181. Will not attempt to
authenticate using SASL (unknown error)
2016-02-17 06:51:32 UTC INFO [main-SendThread(ip-169-0-0-1.ec2.internal:2181)]
org.apache.zookeeper.ClientCnxn$SendThread 852 - Socket connection established
to ip-169-0-0-1.ec2.internal/169.0.0.1:2181, initiating session
2016-02-17 06:51:32 UTC INFO [main-SendThread(ip-169-0-0-1.ec2.internal:2181)]
org.apache.zookeeper.ClientCnxn$SendThread 1235 - Session establishment
complete on server ip-169-0-0-1.ec2.internal/169.0.0.1:2181, sessionid =
0x5b52eb397b640080, negotiated timeout = 30000
2016-02-17 06:51:32 UTC INFO [main-EventThread] org.I0Itec.zkclient.ZkClient
449 - zookeeper state changed (SyncConnected)
2016-02-17 06:51:32 UTC WARN [main] org.apache.helix.manager.zk.ZKHelixAdmin
495 - Root directory exists.Cleaning the root directory:/GobblinYarn
Exception in thread "main" org.I0Itec.zkclient.exception.ZkException:
org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode =
Directory not empty for /GobblinYarn/CONTROLLER
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.apache.helix.manager.zk.ZkClient.delete(ZkClient.java:348)
at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:516)
at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:511)
at
org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:496)
at org.apache.helix.tools.ClusterSetup.addCluster(ClusterSetup.java:154)
at
gobblin.yarn.YarnHelixUtils.createGobblinYarnHelixCluster(YarnHelixUtils.java:67)
at
gobblin.yarn.GobblinYarnAppLauncher.launch(GobblinYarnAppLauncher.java:243)
at
gobblin.yarn.GobblinYarnAppLauncher.main(GobblinYarnAppLauncher.java:784)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty for /GobblinYarn/CONTROLLER
at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:91)
at org.apache.helix.manager.zk.ZkClient$8.call(ZkClient.java:352)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 8 more
2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.yarn.GobblinYarnAppLauncher
301 - Stopping the GobblinYarnAppLauncher
2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 125 -
Attempting to shutdown ExecutorService:
java.util.concurrent.Executors$DelegatedScheduledExecutorService@2c68b710
2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 144 -
Successfully shutdown ExecutorService:
java.util.concurrent.Executors$DelegatedScheduledExecutorService@2c68b710
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)