Hi, I am trying to run the page rank algorithm using giraph on a 3.5 billion node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB RAM total). I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true and the application killed after some time.
I am running the giraph job using this command: yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880 -Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true -Dgiraph.useOutOfCoreMessages=true org.apache.giraph.examples.PageRankComputation -vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip /user/darshan/AdjList/ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc org.apache.giraph.examples.RandomWalkVertexMasterCompute -wc org.apache.giraph.examples.RandomWalkWorkerContext -ca org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f -ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21 Here is a log from the zookeeper: 2017-07-12 08:08:35,026 WARN [netty-client-worker-1] org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address <url>/<ip>:30006<http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006> java.lang.ArrayIndexOutOfBoundsException: 1075052547 at org.apache.giraph.comm.flow_control.NoOpFlowControl.getAckSignalFlag(NoOpFlowControl.java:52) at org.apache.giraph.comm.netty.NettyClient.messageReceived(NettyClient.java:796) at org.apache.giraph.comm.netty.handler.ResponseClientHandler.channelRead(ResponseClientHandler.java:87) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(InboundByteCounter.java:74) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:126) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) at java.lang.Thread.run(Thread.java:745) I think this issue is related to the messaging stack rather than the algorithm. If not, can someone please help me with this or at least point me in the right direction? Cheers, Darshan
