Sebastian, I had a look at your vertexinputformat. I think there might be a bug. Why are you caching/reusing the id? This way every vertex parsed by the vertexreader will share the same ID object, and hence have the same ID. I think this is broken. you should instantiate a new ID object in the preprocessLine. Can you try like that?
On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter <s...@apache.org> wrote: > Hi Armando, > > I uploaded my test code to github at: > > https://github.com/sscdotopen/giraph/tree/hyperball64-ooc > > I'm working on an algorithm to estimate the neighborhood function of the > graph (similar to [1]). I'm running this on the transposed adjacency matrix > of a snapshot of the twitter follower graph [2]. For this graph out-of-core > is not necessary, but I would like to run my algorithm on another larger > graph that doesn't fit into the aggregated main memory of the cluster > anymore. > > I think for testing purposes, you can run it on any large graph in > adjacency form. > > Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks per > machine. I use the following options to run the algorithm: > > hadoop jar > giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > > org.apache.giraph.examples.hyperball.HyperBall > > --vertexInputFormat org.apache.giraph.examples.hyperball. > HyperBallTextInputFormat > > --vertexInputPath hdfs:///ssc/twitter-negative/ > > --vertexOutputFormat org.apache.giraph.io.formats. > IdWithValueTextOutputFormat > > --outputPath hdfs:///ssc/tmp-123/ > > --combiner org.apache.giraph.comm.messages.HyperLogLogCombiner > > --outEdges org.apache.giraph.edge.LongNullArrayEdges > > --workers 24 > > --customArguments > > giraph.oneToAllMsgSending=true, > giraph.isStaticGraph=true, > giraph.numComputeThreads=15, > giraph.numInputThreads=15, > giraph.numOutputThreads=15, > giraph.maxNumberOfSupersteps=30, > giraph.useOutOfCoreGraph=true, > giraph.maxPartitionsInMemory=20 > > Best, > Sebastian > > [1] http://arxiv.org/abs/1308.2144 > [2] http://konect.uni-koblenz.de/networks/twitter_mpi > > > On 02/12/2014 04:21 PM, Armando Miraglia wrote: > >> >> Hi Sebastian, >> >> On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote: >> >>> No. Should I have done that? >>> >> >> could you please provide me with the test you have done together with >> the variables that you have set during for the computation? This would >> help me a lot. >> >> Cheers, >> Armando >> >> > -- Claudio Martella