Hi -

Thanks in advance for your help.

I have been following this guide 
http://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html trying 
to prove that my samza cluster runs. I get as far as having a Running YARN 
task, as the tutorial specifies, but this task doesn’t actually do anything. No 
log that I’ve found (I’ve looked at application master, yarn resource manager, 
node manager logs, as well as the stderror and stdout userlogs on the resource 
manager nodes) shows any kind of error or warning; they simply stop growing 
after the initial setup with


2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1 containers

2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1 container(s) 
with 850mb of memory

The app doesn't die or anything, but I never see any data flowing through kafka 
from the wikipedia feed.

On the Kafka side, the logs show something very similar to the logs here: 
https://issues.apache.org/jira/browse/KAFKA-1393, suggesting that Samza is 
creating and closing many connections in sequence (though I have no idea why). 
Excerpt:


[2015-03-30 22:45:59,561] INFO Closing socket connection to /172.31.11.241. 
(kafka.network.Processor)

[2015-03-30 22:45:59,592] INFO Closing socket connection to /172.31.11.241. 
(kafka.network.Processor)

[2015-03-30 22:49:29,927] INFO Closing socket connection to /172.31.11.206. 
(kafka.network.Processor)

*.241 is the single ResourceManager node in my YARN cluster and *.206 is the 
single Kafka broker itself, the box on which I viewed this log. Then I see this 
error:


[2015-03-30 22:49:49,261] ERROR Closing socket for /172.31.11.206 because of 
error (kafka.network.Processor)

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

        at sun.nio.ch.IOUtil.read(IOUtil.java:197)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)

        at kafka.utils.Utils$.read(Utils.scala:375)

        at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)

        at kafka.network.Processor.read(SocketServer.scala:347)

        at kafka.network.Processor.run(SocketServer.scala:245)

        at java.lang.Thread.run(Thread.java:745)

As far as I can tell, this suggests that Samza reset the connection. The only 
other weirdness in the logs is in the ApplicationManager’s garbage collection 
log, which looks like this:


2015-03-30T22:46:00.670+0000: 4.674: [GC (Allocation Failure)  
16244K->7692K(31808K), 0.0029805 secs]

2015-03-30T22:46:00.721+0000: 4.725: [GC (Allocation Failure)  
16516K->8128K(31808K), 0.0025949 secs]

2015-03-30T22:46:00.818+0000: 4.822: [GC (Allocation Failure)  
16960K->7890K(31808K), 0.0021872 secs]

2015-03-30T22:46:01.042+0000: 5.046: [GC (Allocation Failure)  
16722K->8642K(31808K), 0.0032969 secs]

2015-03-30T22:51:56.920+0000: 360.924: [GC (Allocation Failure)  
17474K->8476K(31808K), 0.0029685 secs]

Is it possible that the garbage collection cycles are causing Samza to rapidly 
recreate connections to Zookeeper/Kafka? Zookeeper’s logs also suggest that 
consumers are being created and deleted rapidly:


2015-03-30 22:53:54,371 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:create cxid:0x2 zxid:0xad 
txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/ids 
Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/ids

2015-03-30 22:53:54,374 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:create cxid:0x3 zxid:0xae 
txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758 
Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758

2015-03-30 22:53:54,678 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:create cxid:0x17 zxid:0xb2 
txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/owners/test 
Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/owners/test

2015-03-30 22:53:54,681 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:create cxid:0x18 zxid:0xb3 
txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/owners 
Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/owners

2015-03-30 22:53:57,223 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x23 zxid:0xb8 
txntype:-1 reqpath:n/a Error 
Path:/consumers/console-consumer-43758/offsets/test/0 Error:KeeperErrorCode = 
NoNode for /consumers/console-consumer-43758/offsets/test/0

2015-03-30 22:53:57,229 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:create cxid:0x24 zxid:0xb9 
txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets 
Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/offsets

2015-03-30 22:53:57,255 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x28 zxid:0xbd 
txntype:-1 reqpath:n/a Error 
Path:/consumers/console-consumer-43758/offsets/test/1 Error:KeeperErrorCode = 
NoNode for /consumers/console-consumer-43758/offsets/test/1

2015-03-30 22:53:57,257 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when 
processing sessionid:0x14c6ccff26d0013 type:create cxid:0x29 zxid:0xbe 
txntype:-1 reqpath:n/a Error 
Path:/consumers/console-consumer-43758/offsets/test Error:KeeperErrorCode = 
NodeExists for /consumers/console-consumer-43758/offsets/test

Any help will be greatly appreciated – I’m really stuck on this one.

Thanks,
[Helix Education]<http://www.helixeducation.com/>
Andrew Sannier
Software Engineer, Big Data

C: 480-284-1048

www.helixeducation.com<http://www.helixeducation.com/>
Blog<http://www.helixeducation.com/blog/> | 
Twitter<https://twitter.com/HelixEducation> | 
Facebook<https://www.facebook.com/HelixEducation> | 
LinkedIn<http://www.linkedin.com/company/3609946>

Reply via email to