Hi Safder, Looks like the Kafka error is caused by the AM is not running (hence Kafka server not running, since it runs inside AM).
The AM failed due to a node in ZooKeeper already exists. I think this would happen if the first attempt to launch AM by YARN was failed and this error would arise in the 2nd attempt by YARN (see https://issues.apache.org/jira/browse/TWILL-61). Can you confirm if that’s the case? If that’s the case, however, doesn’t solve your problem as root cause is why the first attempt failed. Are you able to resurrect the AM logs from the first attempt? Terence On Mar 20, 2014, at 6:06 PM, safder <[email protected]> wrote: > Hi Guys, > > Needed help with Twill. I am trying to run a simple Distributed Shell > application on a single node cluster. When I run it, in the standard out logs > I get a ton of kafka related errors. I tee’ed the logs, but each run was > making 25MBs of it. The only main exception I see is this > > > 20:57:42.382 [YarnTwillRunnerService > STARTING-SendThread(localhost.localdomain:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Reading > reply sessionid:0x144e1a859d40052, packet:: > clientPath:/MY_BASE_APP/c47fd263-a5c1-48ef-8c76-a91cf8009431/state > serverPath:/MY_BASE_A > PP/c47fd263-a5c1-48ef-8c76-a91cf8009431/state finished:false header:: 15,4 > replyHeader:: 15,652,0 request:: '/MY_BASE_APP/c47fd263- > a5c1-48ef-8c76-a91cf8009431/state,T response:: > #7b227374617465223a2253544f5050494e47227d,s{627,652,1395363459875,1395363462375,3,0,0 > ,0,20,0,627} > 20:57:42.639 [Kafka-Consumer-log-0] INFO > o.a.t.i.k.client.SimpleKafkaConsumer - Exception when fetching message on > TopicPartition{to > pic=log, partition=0}. > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.7.0_45] > at sun.nio.ch.Net.connect(Net.java:465) ~[na:1.7.0_45] > at sun.nio.ch.Net.connect(Net.java:457) ~[na:1.7.0_45] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:666) > ~[na:1.7.0_45] > at kafka.network.BlockingChannel.connect(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer.connect(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer.reconnect(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer.liftedTree1$1(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at > kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at > kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.metrics.KafkaTimer.time(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown > Source) ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.metrics.KafkaTimer.time(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.consumer.SimpleConsumer.fetch(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at kafka.javaapi.consumer.SimpleConsumer.fetch(Unknown Source) > ~[kafka_2.10-0.8.0.jar:0.8.0] > at > org.apache.twill.internal.kafka.client.SimpleKafkaConsumer$ConsumerThread.fetchMessages(SimpleKafkaConsumer.java:419) > ~[twill-core-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT] > at > org.apache.twill.internal.kafka.client.SimpleKafkaConsumer$ConsumerThread.run(SimpleKafkaConsumer.java:355) > ~[twill-core-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT] > 20:57:42.642 [Kafka-Consumer-log-0] INFO > o.a.t.i.k.client.SimpleKafkaConsumer - Exception when fetching message on > TopicPartition{topic=log, partition=0}. > java.net.ConnectException: Connection refused > > > I also attached the application logs on the yarn end. That is showing a > different exception. > > [main] ERROR o.apache.twill.internal.ServiceMain - Exception when starting > service org.apache.twill.internal.appmaster.ApplicationMasterService@1d16eaf2. > java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /c47fd263-a5c1-48ef-8c76-a91cf8009431/state > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) > ~[guava-13.0.1.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) > ~[guava-13.0.1.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-13.0.1.jar:na] > at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:80) > ~[twill-yarn-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT] > at > org.apache.twill.internal.appmaster.ApplicationMasterMain.main(ApplicationMasterMain.java:69) > [twill-yarn-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.7.0_45] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ~[na:1.7.0_45] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.7.0_45] > at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_45] > at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:86) > [launcher.71cb0f5e-fc14-43e7-8149-71e57defd89f.jar:na] > java.util.concurrent.ExecutionException: > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /c47fd263-a5c1-48ef-8c76-a91cf8009431/state > > > > Please help! > > Safder > >
