I don't know, maybe I'm missing something, or there's a bug there as well.
I do agree that this is spooky. Armando has tested it also with the
WattsStrogatzInputformat, that creates another type of graph. For what I
understand, this should not happen due to the topology. I think we should
just try to replicate this behavior, hopefully without a very large graph
that makes debugging difficult.


On Sat, Feb 15, 2014 at 8:42 PM, Sebastian Schelter <s...@apache.org> wrote:

> I copied the caching from o.a.g.io.formats.IntIntNullTextInputFormat and
> it worked well during my tests (it did not happen that all vertices had the
> same id).
>
> I'm happy to remove this and rerun the tests. It's strange that
> out-of-core works with PageRank on a generated graph, but not with
> Hyperball on the twitter graph. The generated graph has a uniform degree
> distribution, while the twitter graph's degree distribution is heavily
> skewed, can that have an influence on the behavior of ooc?
>
> Best,
> Sebastian
>
>
>
> On 02/15/2014 08:32 PM, Claudio Martella wrote:
>
>> Sebastian, I had a look at your vertexinputformat. I think there might be
>> a
>> bug. Why are you caching/reusing the id? This way every vertex parsed by
>> the vertexreader will share the same ID object, and hence have the same
>> ID.
>> I think this is broken. you should instantiate a new ID object in the
>> preprocessLine.
>> Can you try like that?
>>
>>
>> On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter <s...@apache.org>
>> wrote:
>>
>>  Hi Armando,
>>>
>>> I uploaded my test code to github at:
>>>
>>> https://github.com/sscdotopen/giraph/tree/hyperball64-ooc
>>>
>>> I'm working on an algorithm to estimate the neighborhood function of the
>>> graph (similar to [1]). I'm running this on the transposed adjacency
>>> matrix
>>> of a snapshot of the twitter follower graph [2]. For this graph
>>> out-of-core
>>> is not necessary, but I would like to run my algorithm on another larger
>>> graph that doesn't fit into the aggregated main memory of the cluster
>>> anymore.
>>>
>>> I think for testing purposes, you can run it on any large graph in
>>> adjacency form.
>>>
>>> Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks
>>> per
>>> machine. I use the following options to run the algorithm:
>>>
>>> hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-
>>> with-dependencies.jar
>>> org.apache.giraph.GiraphRunner
>>>
>>> org.apache.giraph.examples.hyperball.HyperBall
>>>
>>> --vertexInputFormat org.apache.giraph.examples.hyperball.
>>> HyperBallTextInputFormat
>>>
>>> --vertexInputPath hdfs:///ssc/twitter-negative/
>>>
>>> --vertexOutputFormat org.apache.giraph.io.formats.
>>> IdWithValueTextOutputFormat
>>>
>>> --outputPath hdfs:///ssc/tmp-123/
>>>
>>> --combiner org.apache.giraph.comm.messages.HyperLogLogCombiner
>>>
>>> --outEdges org.apache.giraph.edge.LongNullArrayEdges
>>>
>>> --workers 24
>>>
>>> --customArguments
>>>
>>> giraph.oneToAllMsgSending=true,
>>> giraph.isStaticGraph=true,
>>> giraph.numComputeThreads=15,
>>> giraph.numInputThreads=15,
>>> giraph.numOutputThreads=15,
>>> giraph.maxNumberOfSupersteps=30,
>>> giraph.useOutOfCoreGraph=true,
>>> giraph.maxPartitionsInMemory=20
>>>
>>> Best,
>>> Sebastian
>>>
>>> [1] http://arxiv.org/abs/1308.2144
>>> [2] http://konect.uni-koblenz.de/networks/twitter_mpi
>>>
>>>
>>> On 02/12/2014 04:21 PM, Armando Miraglia wrote:
>>>
>>>
>>>> Hi Sebastian,
>>>>
>>>> On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote:
>>>>
>>>>  No. Should I have done that?
>>>>>
>>>>>
>>>> could you please provide me with the test you have done together with
>>>> the variables that you have set during for the computation? This would
>>>> help me a lot.
>>>>
>>>> Cheers,
>>>> Armando
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
   Claudio Martella

Reply via email to