Re: Problem occured when running job with >1 worker.
Hi Kaya.. Below is the worker's log.. WARN org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address kanha-Vostro-1014/ 127.0.1.1:30002 java.nio.channels.ClosedChannelException at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:674) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:642) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:98) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:385) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:256) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 2014-01-20 12:29:40,161 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server kanha-Vostro-1014/127.0.1.1:22181 2014-01-20 12:29:40,106 WARN org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address null java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 2014-01-20 12:29:40,297 WARN org.apache.zookeeper.ClientCnxn: Session 0x143ae2e202a0001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) 2014-01-20 12:29:40,044 WARN org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: Channel failed with remote address /127.0.0.1:43641 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) at sun.nio.ch.IOUtil.read(IOUtil.java:193) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:63) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:385) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:256) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 2014-01-20 12:29:40,074 WARN org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address kanha-Vostro-1014/ 127.0.1.1:30002 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) at sun.nio.ch.IOUtil.read(IOUtil.java:193) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:63) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:385) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:256) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 2014-01-20 12:29:40,044 WARN org.apache.giraph.comm.netty.NettyClient: getNextChannel: Failed to reconnect to kanha-Vostro-1014/127.0.1.1:30002 on attempt 1 out of 1000 max attempts, sleeping for 5 secs java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.
Re: Problem occured when running job with >1 worker.
Hi Jyoti; I assume this is the log of master vertex. It seems like master can not reach a worker for some reason. Did you also check the worker vertex's log? Maybe you can share it too. Sertug On 20-01-2014 09:22, Jyoti Yadav wrote: *h.master.MasterThread: masterThread: Master algorithm failed with ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException: -1*
Problem occured when running job with >1 worker.
Hi Folks... When i am running one algorithm on single system cluster with 1 worker,it is working fine...But when i increased the no of worker >1,following error is thrown at run time.. *ERROR org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive: Missing chosen worker Worker(hostname=kanha-Vostro-1014, MRtaskID=2, port=30002) on superstep 172014-01-20 12:27:36,451 INFO org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 17 took 1414.576 seconds ended with state WORKER_FAILURE and is now on superstep 172014-01-20 12:28:02,723 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException: -1* at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1276) at org.apache.giraph.master.MasterThread.run(MasterThread.java:139) 2014-01-20 12:28:06,059 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.ArrayIndexOutOfBoundsException: -1, exiting... java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.giraph.master.MasterThread.run(MasterThread.java:185) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1276) at org.apache.giraph.master.MasterThread.run(MasterThread.java:139) 2014-01-20 12:28:36,993 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started. 2014-01-20 12:28:36,993 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process. 2014-01-20 12:29:08,015 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed). Any ideas?? Thanks.. Jyoti