Oh sorry I have not read the other half of your mail.
I have made a mapreduce preprocessing step (yes mapreduce is the right
answer for that here) for that which can be found here:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/crawl/processing/WebGraphProcessingJob.java
>

It traverses the graph and the reducer makes the right output which can be
processed by the job or the TextToSeq utility as a preprocessing for
pagerank, just have a look into the example package.

I appears that I am not supposed to access the vertices field of
> GraphJobRunner class in some way from within the PageRank.PageRankVertex
> class?
>

Yes in this new Pregel-Like API this falls under information-hiding. If you
have a look into the 4.0 release, there is the hardcore version of pagerank
with plain BSP, there you can access and modify all the stuff you want.
(but it is more complicated)

Hope it helps, if you have additional problems, don't hesitate to ask them.


Am 27. April 2012 17:10 schrieb Thomas Jungblut <
[email protected]>:

> You have to "address" dangling nodes on your adjacency list.
>
> So your input must look like:
>
>
> 0    1    2
> 1    1    2
> 2    1    2    3
> 3 <-- this one was missing causing the Null Pointer Exception.
> 5
>
> See http://wiki.apache.org/hama/PageRank under "Submit your own Webgraph".
>
>> This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3
>> and Site3 is a dangling node. As you can see a site is always on the
>> leftmost side (we call it the key-site), and the outlinks are seperated by
>> tabs (\t) as the following elements.
>> Make sure that every site's outlink can somewhere be found in the file as
>> a key-site. Otherwise it will result in weird 
>> NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions>.
>>
>
>
> Good luck.
>
> Am 27. April 2012 16:56 schrieb SWP <[email protected]>:
>
> I am dealing with the PageRank example
>> from  hama-dist-0.5.0-incubating-**source.tar.gz RC2
>> which I downloaded from 
>> http://people.apache.org/~**edwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/>
>> a few days ago.
>>
>> My input graph has some "dangling edges", that is, edges pointing to
>> non-existing nodes.
>> Here are the adjacencies of a small example. The format is:
>> source target1 target2 target3 ...
>>
>> 0    1    2
>> 1    1    2
>> 2    1    2    3
>> 5
>>
>> Your see that 2 has an edge directed to 3 but there is no adjacency list
>> given for 3.
>>
>> Now, when I run this example through pagerank-text2seq and then the
>> pagerank examle, I get a NullPointerException:
>>
>> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP
>> execution!
>> java.lang.NullPointerException
>>    at org.apache.hama.graph.**GraphJobRunner.bsp(**
>> GraphJobRunner.java:96)
>>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.run(**
>> LocalBSPRunner.java:256)
>>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
>> LocalBSPRunner.java:286)
>>    at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
>> LocalBSPRunner.java:1)
>>    at java.util.concurrent.**FutureTask$Sync.innerRun(**
>> FutureTask.java:303)
>>    at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>>    at java.util.concurrent.**ThreadPoolExecutor$Worker.**
>> runTask(ThreadPoolExecutor.**java:886)
>>    at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
>> ThreadPoolExecutor.java:908)
>>    at java.lang.Thread.run(Thread.**java:662)
>>
>> The problem appears to be that when GraphJobRunner's bsp() method  looks
>> up the vertex to which the message is addressed,
>> it is not found in the vertices map.
>> (By the way, if you replace 5 with 3 in the example, it works - because
>> then the target vertex can be looked up.)
>>
>> See the vertices.get(e.getKey()) statement in the code snippet below.
>> Of course one can avoid the exception by adding a check in
>> GraphJobRunner.java (at line about 95) like this:
>>
>>        if(vertices.containsKey(e.**getKey()))
>>        {
>>            vertices.get(e.getKey()).**compute(msgs.iterator());
>>        } else {
>>            System.out.println("Ignoring message(s) '" + msgs + "' sent to
>> vertex '" + e.getKey() +"'");
>>        }
>>
>> However, what I really want is:
>> check within PageRank.PageRankVertex's compute() method whether the
>> target vertex exists
>> before sending out a message to it.
>>
>> That is, in PageRank.java (line 60) , instead of
>>
>>                sendMessageToNeighbors(new DoubleWritable(this.getValue()*
>> *.get() / numEdges));
>>
>> I would like to send messages only  to "existing" vertices, that is,
>>  those which have an adjacency list in the input.
>>
>> Any hints how this can be achieved?
>> I appears that I am not supposed to access the vertices field of
>> GraphJobRunner class in some way from within the PageRank.PageRankVertex
>> class?
>>
>> I concede that my example graph may qualify as invalid input ... but on
>> the other hand: how could I add those missing vertices after a first pass
>> through the adjacency lists input?
>>
>> Clemens Gröpl
>>
>> --
>>
>> Semantic Web Project, IT
>>
>> Unister GmbH
>> Barfußgäßchen 11 | 04109 Leipzig
>>
>> Telefon: +49 (0)341 49288 4496
>> [email protected] <mailto:%20contact-semweb@**
>> unister-gmbh.de <[email protected]>>
>> www.unister.de <http://www.unister.de>
>>
>> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
>> Amtsgericht Leipzig, HRB: 19056
>>
>>
>
>
> --
> Thomas Jungblut
> Berlin <[email protected]>
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Reply via email to