Hi, giraph.useOutOfCoreMessages is no longer in use.
The main problem here is that you are using default flow control mechanism (NoOpFlowControl), that causes a lot of outstanding/received messages. As a consequence, you fill up the memory so fast, and the job would fail for various reasons. Please use the following options instead: -Dgiraph.isStaticGraph=false -Dgiraph.useOutOfCoreGraph=true -Dgiraph.waitForPerWorkerRequests=true Note: the static graph has a known bug with the out-of-core mechanism. Hope it helps, Hassan On Sun, Jul 16, 2017 at 1:54 PM, Darshan Mallenahalli Shankaralingappa < dshankaralinga...@ntent.com> wrote: > Hi, > > I am trying to run the page rank algorithm using giraph on a 3.5 billion > node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB > RAM total). > I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true > and the application killed after some time. > > I am running the giraph job using this command: > yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880 > -Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true > -Dgiraph.useOutOfCoreMessages=true > org.apache.giraph.examples.PageRankComputation > -vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip > /user/darshan/AdjList/ -vof > org.apache.giraph.io.formats.IdWithValueTextOutputFormat > -op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc > org.apache.giraph.examples.RandomWalkVertexMasterCompute > -wc org.apache.giraph.examples.RandomWalkWorkerContext -ca > org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f > -ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21 > > Here is a log from the zookeeper: > > 2017-07-12 08:08:35,026 WARN [netty-client-worker-1] > org.apache.giraph.comm.netty.handler.ResponseClientHandler: > exceptionCaught: Channel failed with remote address <url>/<ip>:30006< > http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006> > > java.lang.ArrayIndexOutOfBoundsException: 1075052547 > at org.apache.giraph.comm.flow_control.NoOpFlowControl. > getAckSignalFlag(NoOpFlowControl.java:52) > at org.apache.giraph.comm.netty.NettyClient.messageReceived( > NettyClient.java:796) > at org.apache.giraph.comm.netty.handler.ResponseClientHandler. > channelRead(ResponseClientHandler.java:87) > at io.netty.channel.DefaultChannelHandlerContext. > invokeChannelRead(DefaultChannelHandlerContext.java:338) > at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead( > DefaultChannelHandlerContext.java:324) > at io.netty.handler.codec.ByteToMessageDecoder.channelRead( > ByteToMessageDecoder.java:153) > at io.netty.channel.DefaultChannelHandlerContext. > invokeChannelRead(DefaultChannelHandlerContext.java:338) > at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead( > DefaultChannelHandlerContext.java:324) > at org.apache.giraph.comm.netty.InboundByteCounter.channelRead( > InboundByteCounter.java:74) > at io.netty.channel.DefaultChannelHandlerContext. > invokeChannelRead(DefaultChannelHandlerContext.java:338) > at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead( > DefaultChannelHandlerContext.java:324) > at io.netty.channel.DefaultChannelPipeline.fireChannelRead( > DefaultChannelPipeline.java:785) > at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read( > AbstractNioByteChannel.java:126) > at io.netty.channel.nio.NioEventLoop.processSelectedKey( > NioEventLoop.java:485) > at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized( > NioEventLoop.java:452) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) > at io.netty.util.concurrent.SingleThreadEventExecutor$2. > run(SingleThreadEventExecutor.java:101) > at java.lang.Thread.run(Thread.java:745) > > > I think this issue is related to the messaging stack rather than the > algorithm. > If not, can someone please help me with this or at least point me in the > right direction? > > Cheers, > Darshan >