Hi - Thanks in advance for your help.
I have been following this guide http://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html trying to prove that my samza cluster runs. I get as far as having a Running YARN task, as the tutorial specifies, but this task doesn’t actually do anything. No log that I’ve found (I’ve looked at application master, yarn resource manager, node manager logs, as well as the stderror and stdout userlogs on the resource manager nodes) shows any kind of error or warning; they simply stop growing after the initial setup with 2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1 containers 2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1 container(s) with 850mb of memory The app doesn't die or anything, but I never see any data flowing through kafka from the wikipedia feed. On the Kafka side, the logs show something very similar to the logs here: https://issues.apache.org/jira/browse/KAFKA-1393, suggesting that Samza is creating and closing many connections in sequence (though I have no idea why). Excerpt: [2015-03-30 22:45:59,561] INFO Closing socket connection to /172.31.11.241. (kafka.network.Processor) [2015-03-30 22:45:59,592] INFO Closing socket connection to /172.31.11.241. (kafka.network.Processor) [2015-03-30 22:49:29,927] INFO Closing socket connection to /172.31.11.206. (kafka.network.Processor) *.241 is the single ResourceManager node in my YARN cluster and *.206 is the single Kafka broker itself, the box on which I viewed this log. Then I see this error: [2015-03-30 22:49:49,261] ERROR Closing socket for /172.31.11.206 because of error (kafka.network.Processor) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at kafka.utils.Utils$.read(Utils.scala:375) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:347) at kafka.network.Processor.run(SocketServer.scala:245) at java.lang.Thread.run(Thread.java:745) As far as I can tell, this suggests that Samza reset the connection. The only other weirdness in the logs is in the ApplicationManager’s garbage collection log, which looks like this: 2015-03-30T22:46:00.670+0000: 4.674: [GC (Allocation Failure) 16244K->7692K(31808K), 0.0029805 secs] 2015-03-30T22:46:00.721+0000: 4.725: [GC (Allocation Failure) 16516K->8128K(31808K), 0.0025949 secs] 2015-03-30T22:46:00.818+0000: 4.822: [GC (Allocation Failure) 16960K->7890K(31808K), 0.0021872 secs] 2015-03-30T22:46:01.042+0000: 5.046: [GC (Allocation Failure) 16722K->8642K(31808K), 0.0032969 secs] 2015-03-30T22:51:56.920+0000: 360.924: [GC (Allocation Failure) 17474K->8476K(31808K), 0.0029685 secs] Is it possible that the garbage collection cycles are causing Samza to rapidly recreate connections to Zookeeper/Kafka? Zookeeper’s logs also suggest that consumers are being created and deleted rapidly: 2015-03-30 22:53:54,371 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x2 zxid:0xad txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/ids Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/ids 2015-03-30 22:53:54,374 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x3 zxid:0xae txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758 Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758 2015-03-30 22:53:54,678 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x17 zxid:0xb2 txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/owners/test Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/owners/test 2015-03-30 22:53:54,681 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x18 zxid:0xb3 txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/owners Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/owners 2015-03-30 22:53:57,223 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x23 zxid:0xb8 txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets/test/0 Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/offsets/test/0 2015-03-30 22:53:57,229 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x24 zxid:0xb9 txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/offsets 2015-03-30 22:53:57,255 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x28 zxid:0xbd txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets/test/1 Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/offsets/test/1 2015-03-30 22:53:57,257 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x29 zxid:0xbe txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets/test Error:KeeperErrorCode = NodeExists for /consumers/console-consumer-43758/offsets/test Any help will be greatly appreciated – I’m really stuck on this one. Thanks, [Helix Education]<http://www.helixeducation.com/> Andrew Sannier Software Engineer, Big Data C: 480-284-1048 www.helixeducation.com<http://www.helixeducation.com/> Blog<http://www.helixeducation.com/blog/> | Twitter<https://twitter.com/HelixEducation> | Facebook<https://www.facebook.com/HelixEducation> | LinkedIn<http://www.linkedin.com/company/3609946>
