I am dealing with the PageRank example
from hama-dist-0.5.0-incubating-source.tar.gz RC2
which I downloaded from http://people.apache.org/~edwardyoon/dist/
a few days ago.
My input graph has some "dangling edges", that is, edges pointing to
non-existing nodes.
Here are the adjacencies of a small example. The format is:
source target1 target2 target3 ...
0 1 2
1 1 2
2 1 2 3
5
Your see that 2 has an edge directed to 3 but there is no adjacency list
given for 3.
Now, when I run this example through pagerank-text2seq and then the
pagerank examle, I get a NullPointerException:
12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP execution!
java.lang.NullPointerException
at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:96)
at
org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
at
org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
at
org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
The problem appears to be that when GraphJobRunner's bsp() method looks
up the vertex to which the message is addressed,
it is not found in the vertices map.
(By the way, if you replace 5 with 3 in the example, it works - because
then the target vertex can be looked up.)
See the vertices.get(e.getKey()) statement in the code snippet below.
Of course one can avoid the exception by adding a check in
GraphJobRunner.java (at line about 95) like this:
if(vertices.containsKey(e.getKey()))
{
vertices.get(e.getKey()).compute(msgs.iterator());
} else {
System.out.println("Ignoring message(s) '" + msgs + "' sent
to vertex '" + e.getKey() +"'");
}
However, what I really want is:
check within PageRank.PageRankVertex's compute() method whether the
target vertex exists
before sending out a message to it.
That is, in PageRank.java (line 60) , instead of
sendMessageToNeighbors(new
DoubleWritable(this.getValue().get() / numEdges));
I would like to send messages only to "existing" vertices, that is,
those which have an adjacency list in the input.
Any hints how this can be achieved?
I appears that I am not supposed to access the vertices field of
GraphJobRunner class in some way from within the PageRank.PageRankVertex
class?
I concede that my example graph may qualify as invalid input ... but on
the other hand: how could I add those missing vertices after a first
pass through the adjacency lists input?
Clemens Gröpl
--
Semantic Web Project, IT
Unister GmbH
Barfußgäßchen 11 | 04109 Leipzig
Telefon: +49 (0)341 49288 4496
[email protected] <mailto:%[email protected]>
www.unister.de <http://www.unister.de>
Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 19056