Hi Thomas,
so ... just to make sure I understood you right:
You are saying that:
fiddling with the input in a way that adds a couple of nodes having
empty adjacency lists
is below the level of abstraction that Hama intends to provide, correct?
Sure we will find a way to adjust our data, but I think the question is
somewhat at the borderline.
I will think about a way to add those missing nodes when the input is
read, before the BSP starts.
Thank you very much.
Clemens
Am 27.04.2012 17:24, schrieb Thomas Jungblut:
Oh sorry I have not read the other half of your mail.
I have made a mapreduce preprocessing step (yes mapreduce is the right
answer for that here) for that which can be found here:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/crawl/processing/WebGraphProcessingJob.java
It traverses the graph and the reducer makes the right output which can be
processed by the job or the TextToSeq utility as a preprocessing for
pagerank, just have a look into the example package.
I appears that I am not supposed to access the vertices field of
GraphJobRunner class in some way from within the PageRank.PageRankVertex
class?
Yes in this new Pregel-Like API this falls under information-hiding. If you
have a look into the 4.0 release, there is the hardcore version of pagerank
with plain BSP, there you can access and modify all the stuff you want.
(but it is more complicated)
Hope it helps, if you have additional problems, don't hesitate to ask them.
Am 27. April 2012 17:10 schrieb Thomas Jungblut<
[email protected]>:
You have to "address" dangling nodes on your adjacency list.
So your input must look like:
0 1 2
1 1 2
2 1 2 3
3<-- this one was missing causing the Null Pointer Exception.
5
See http://wiki.apache.org/hama/PageRank under "Submit your own Webgraph".
This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3
and Site3 is a dangling node. As you can see a site is always on the
leftmost side (we call it the key-site), and the outlinks are seperated by
tabs (\t) as the following elements.
Make sure that every site's outlink can somewhere be found in the file as
a key-site. Otherwise it will result in weird
NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions>.
Good luck.
Am 27. April 2012 16:56 schrieb SWP<[email protected]>:
I am dealing with the PageRank example
from hama-dist-0.5.0-incubating-**source.tar.gz RC2
which I downloaded from
http://people.apache.org/~**edwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/>
a few days ago.
My input graph has some "dangling edges", that is, edges pointing to
non-existing nodes.
Here are the adjacencies of a small example. The format is:
source target1 target2 target3 ...
0 1 2
1 1 2
2 1 2 3
5
Your see that 2 has an edge directed to 3 but there is no adjacency list
given for 3.
Now, when I run this example through pagerank-text2seq and then the
pagerank examle, I get a NullPointerException:
12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP
execution!
java.lang.NullPointerException
at org.apache.hama.graph.**GraphJobRunner.bsp(**
GraphJobRunner.java:96)
at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.run(**
LocalBSPRunner.java:256)
at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
LocalBSPRunner.java:286)
at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(**
LocalBSPRunner.java:1)
at java.util.concurrent.**FutureTask$Sync.innerRun(**
FutureTask.java:303)
at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**
runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)
The problem appears to be that when GraphJobRunner's bsp() method looks
up the vertex to which the message is addressed,
it is not found in the vertices map.
(By the way, if you replace 5 with 3 in the example, it works - because
then the target vertex can be looked up.)
See the vertices.get(e.getKey()) statement in the code snippet below.
Of course one can avoid the exception by adding a check in
GraphJobRunner.java (at line about 95) like this:
if(vertices.containsKey(e.**getKey()))
{
vertices.get(e.getKey()).**compute(msgs.iterator());
} else {
System.out.println("Ignoring message(s) '" + msgs + "' sent to
vertex '" + e.getKey() +"'");
}
However, what I really want is:
check within PageRank.PageRankVertex's compute() method whether the
target vertex exists
before sending out a message to it.
That is, in PageRank.java (line 60) , instead of
sendMessageToNeighbors(new DoubleWritable(this.getValue()*
*.get() / numEdges));
I would like to send messages only to "existing" vertices, that is,
those which have an adjacency list in the input.
Any hints how this can be achieved?
I appears that I am not supposed to access the vertices field of
GraphJobRunner class in some way from within the PageRank.PageRankVertex
class?
I concede that my example graph may qualify as invalid input ... but on
the other hand: how could I add those missing vertices after a first pass
through the adjacency lists input?
Clemens Gröpl
--
Semantic Web Project, IT
Unister GmbH
Barfußgäßchen 11 | 04109 Leipzig
Telefon: +49 (0)341 49288 4496
[email protected]<mailto:%20contact-semweb@**
unister-gmbh.de<[email protected]>>
www.unister.de<http://www.unister.de>
Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 19056
--
Thomas Jungblut
Berlin<[email protected]>
--
Semantic Web Project, IT
Unister GmbH
Barfußgäßchen 11 | 04109 Leipzig
Telefon: +49 (0)341 49288 4496
[email protected] <mailto:%[email protected]>
www.unister.de <http://www.unister.de>
Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 19056