Hi Safder,

Looks like the Kafka error is caused by the AM is not running (hence Kafka 
server not running, since it runs inside AM).

The AM failed due to a node in ZooKeeper already exists. I think this would 
happen if the first attempt to launch AM by YARN was failed and this error 
would arise in the 2nd attempt by YARN (see 
https://issues.apache.org/jira/browse/TWILL-61). Can you confirm if that’s the 
case?

If that’s the case, however, doesn’t solve your problem as root cause is why 
the first attempt failed. Are you able to resurrect the AM logs from the first 
attempt? 

Terence

On Mar 20, 2014, at 6:06 PM, safder <[email protected]> wrote:

> Hi Guys,
> 
> Needed help with Twill. I am trying to run a simple Distributed Shell 
> application on a single node cluster. When I run it, in the standard out logs 
> I get a ton of kafka related errors. I tee’ed the logs, but each run was 
> making 25MBs of it. The only main exception I see is this
> 
> 
> 20:57:42.382 [YarnTwillRunnerService 
> STARTING-SendThread(localhost.localdomain:2181)] DEBUG 
> org.apache.zookeeper.ClientCnxn - Reading
> reply sessionid:0x144e1a859d40052, packet:: 
> clientPath:/MY_BASE_APP/c47fd263-a5c1-48ef-8c76-a91cf8009431/state 
> serverPath:/MY_BASE_A
> PP/c47fd263-a5c1-48ef-8c76-a91cf8009431/state finished:false header:: 15,4  
> replyHeader:: 15,652,0  request:: '/MY_BASE_APP/c47fd263-
> a5c1-48ef-8c76-a91cf8009431/state,T  response:: 
> #7b227374617465223a2253544f5050494e47227d,s{627,652,1395363459875,1395363462375,3,0,0
> ,0,20,0,627}
> 20:57:42.639 [Kafka-Consumer-log-0] INFO  
> o.a.t.i.k.client.SimpleKafkaConsumer - Exception when fetching message on 
> TopicPartition{to
> pic=log, partition=0}.
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.Net.connect0(Native Method) ~[na:1.7.0_45]
>        at sun.nio.ch.Net.connect(Net.java:465) ~[na:1.7.0_45]
>        at sun.nio.ch.Net.connect(Net.java:457) ~[na:1.7.0_45]
>        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:666) 
> ~[na:1.7.0_45]
>        at kafka.network.BlockingChannel.connect(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer.connect(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer.reconnect(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer.liftedTree1$1(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(Unknown
>  Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Unknown
>  Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown
>  Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown
>  Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.metrics.KafkaTimer.time(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(Unknown 
> Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown 
> Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown 
> Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.metrics.KafkaTimer.time(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.consumer.SimpleConsumer.fetch(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at kafka.javaapi.consumer.SimpleConsumer.fetch(Unknown Source) 
> ~[kafka_2.10-0.8.0.jar:0.8.0]
>        at 
> org.apache.twill.internal.kafka.client.SimpleKafkaConsumer$ConsumerThread.fetchMessages(SimpleKafkaConsumer.java:419)
>  ~[twill-core-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT]
>        at 
> org.apache.twill.internal.kafka.client.SimpleKafkaConsumer$ConsumerThread.run(SimpleKafkaConsumer.java:355)
>  ~[twill-core-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT]
> 20:57:42.642 [Kafka-Consumer-log-0] INFO  
> o.a.t.i.k.client.SimpleKafkaConsumer - Exception when fetching message on 
> TopicPartition{topic=log, partition=0}.
> java.net.ConnectException: Connection refused
> 
> 
> I also attached the application logs on the yarn end. That is showing a 
> different exception.
> 
> [main] ERROR o.apache.twill.internal.ServiceMain - Exception when starting 
> service org.apache.twill.internal.appmaster.ApplicationMasterService@1d16eaf2.
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /c47fd263-a5c1-48ef-8c76-a91cf8009431/state
>        at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
>  ~[guava-13.0.1.jar:na]
>        at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281)
>  ~[guava-13.0.1.jar:na]
>        at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-13.0.1.jar:na]
>        at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:80) 
> ~[twill-yarn-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT]
>        at 
> org.apache.twill.internal.appmaster.ApplicationMasterMain.main(ApplicationMasterMain.java:69)
>  [twill-yarn-0.2.0-incubating-SNAPSHOT.jar:0.2.0-incubating-SNAPSHOT]
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_45]
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[na:1.7.0_45]
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.7.0_45]
>        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_45]
>        at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:86) 
> [launcher.71cb0f5e-fc14-43e7-8149-71e57defd89f.jar:na]
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /c47fd263-a5c1-48ef-8c76-a91cf8009431/state
> 
> 
> 
> Please help!
> 
> Safder
> 
> 

Reply via email to