Well, no you got it wrong. Hama's graph package is designed to have a easy to develop API for simple graph problems with a vertex-centric paradigm. However, and that is where we distinguish from giraph, we provide the low level access to the underlying BSP methods and implementations.
How to fix the problem you have without the need of preprocessing your data? You can rework the GraphJobRunner class, to check in the setup method if the vertices map is correctly filled. A simple pseudo code could look like this: // after load vertices into the map and edges have been setup > for(Vertex v : adjacencyList) > for(Edge neighbor : v.getOutEdges()) > // do the hash partitioning trick > peer.send(peer.getAllPeerNames()[Math.abs( neighbor.getDestVertexID() > % peer.getNumPeers())], new Text(neighbor.getDestVertexID())); > // sync > peer.sync(); > Text vertexName = null; > while((vertexName = (Text) peer.getCurrentMessage()) != null) > // check if the vertexName is contained in that peer's vertices map > if(!vertices.containsKey(vertexName.get())) > vertices.put(vertexName.get(), new Vertex(vertexName.get())); That is all, and solve the current problems with your dangling nodes. Good question why we haven't implemented it yet, but maybe because it adds overhead at the beginning of a job. And we also assume that the resulting structure in the graph is complete. We want to provide a real framework where you can plug all the stuff arround together and modify single parts of it very easy. Hope you got the gist, if you need a patch or stuff, I can implement it correctly for you. (we should add this anyways as a configurable preprocessing). Thanks for pointing to it! Regards, Thomas Am 27. April 2012 19:34 schrieb SWP <[email protected]>: > Hi Thomas, > > so ... just to make sure I understood you right: > > You are saying that: > fiddling with the input in a way that adds a couple of nodes having empty > adjacency lists > is below the level of abstraction that Hama intends to provide, correct? > > Sure we will find a way to adjust our data, but I think the question is > somewhat at the borderline. > I will think about a way to add those missing nodes when the input is > read, before the BSP starts. > > Thank you very much. > Clemens > > > Am 27.04.2012 17:24, schrieb Thomas Jungblut: > >> Oh sorry I have not read the other half of your mail. >> I have made a mapreduce preprocessing step (yes mapreduce is the right >> answer for that here) for that which can be found here: >> >> https://github.com/**thomasjungblut/thomasjungblut-** >> common/blob/master/src/de/**jungblut/crawl/processing/** >> WebGraphProcessingJob.java<https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/crawl/processing/WebGraphProcessingJob.java> >> It traverses the graph and the reducer makes the right output which can be >> processed by the job or the TextToSeq utility as a preprocessing for >> pagerank, just have a look into the example package. >> >> I appears that I am not supposed to access the vertices field of >> >>> GraphJobRunner class in some way from within the PageRank.PageRankVertex >>> class? >>> >>> Yes in this new Pregel-Like API this falls under information-hiding. If >> you >> have a look into the 4.0 release, there is the hardcore version of >> pagerank >> with plain BSP, there you can access and modify all the stuff you want. >> (but it is more complicated) >> >> Hope it helps, if you have additional problems, don't hesitate to ask >> them. >> >> >> Am 27. April 2012 17:10 schrieb Thomas Jungblut< >> [email protected]**>: >> >> You have to "address" dangling nodes on your adjacency list. >>> >>> So your input must look like: >>> >>> >>> 0 1 2 >>> 1 1 2 >>> 2 1 2 3 >>> 3<-- this one was missing causing the Null Pointer Exception. >>> 5 >>> >>> See >>> http://wiki.apache.org/hama/**PageRank<http://wiki.apache.org/hama/PageRank>under >>> "Submit your own Webgraph". >>> >>> This piece of text will adjacent Site1 to Site2 and Site3, Site2 to >>>> Site3 >>>> and Site3 is a dangling node. As you can see a site is always on the >>>> leftmost side (we call it the key-site), and the outlinks are seperated >>>> by >>>> tabs (\t) as the following elements. >>>> Make sure that every site's outlink can somewhere be found in the file >>>> as >>>> a key-site. Otherwise it will result in weird NullPointerExceptions< >>>> http://**wiki.apache.org/hama/**NullPointerExceptions<http://wiki.apache.org/hama/NullPointerExceptions> >>>> >. >>>> >>>> >>> Good luck. >>> >>> Am 27. April 2012 16:56 schrieb >>> SWP<contact-semweb@unister-**gmbh.de<[email protected]> >>> >: >>> >>> I am dealing with the PageRank example >>> >>>> from hama-dist-0.5.0-incubating-****source.tar.gz RC2 >>>> which I downloaded from >>>> http://people.apache.org/~****edwardyoon/dist/<http://people.apache.org/~**edwardyoon/dist/> >>>> <http://**people.apache.org/%**7Eedwardyoon/dist/<http://people.apache.org/%7Eedwardyoon/dist/> >>>> > >>>> >>>> a few days ago. >>>> >>>> My input graph has some "dangling edges", that is, edges pointing to >>>> non-existing nodes. >>>> Here are the adjacencies of a small example. The format is: >>>> source target1 target2 target3 ... >>>> >>>> 0 1 2 >>>> 1 1 2 >>>> 2 1 2 3 >>>> 5 >>>> >>>> Your see that 2 has an edge directed to 3 but there is no adjacency list >>>> given for 3. >>>> >>>> Now, when I run this example through pagerank-text2seq and then the >>>> pagerank examle, I get a NullPointerException: >>>> >>>> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP >>>> execution! >>>> java.lang.NullPointerException >>>> at org.apache.hama.graph.****GraphJobRunner.bsp(** >>>> GraphJobRunner.java:96) >>>> at org.apache.hama.bsp.****LocalBSPRunner$BSPRunner.run(**** >>>> LocalBSPRunner.java:256) >>>> at org.apache.hama.bsp.****LocalBSPRunner$BSPRunner.call(**** >>>> LocalBSPRunner.java:286) >>>> at org.apache.hama.bsp.****LocalBSPRunner$BSPRunner.call(**** >>>> LocalBSPRunner.java:1) >>>> at java.util.concurrent.****FutureTask$Sync.innerRun(** >>>> FutureTask.java:303) >>>> at java.util.concurrent.****FutureTask.run(FutureTask.****java:138) >>>> at java.util.concurrent.****ThreadPoolExecutor$Worker.** >>>> runTask(ThreadPoolExecutor.****java:886) >>>> at java.util.concurrent.****ThreadPoolExecutor$Worker.run(**** >>>> ThreadPoolExecutor.java:908) >>>> at java.lang.Thread.run(Thread.****java:662) >>>> >>>> >>>> The problem appears to be that when GraphJobRunner's bsp() method looks >>>> up the vertex to which the message is addressed, >>>> it is not found in the vertices map. >>>> (By the way, if you replace 5 with 3 in the example, it works - because >>>> then the target vertex can be looked up.) >>>> >>>> See the vertices.get(e.getKey()) statement in the code snippet below. >>>> Of course one can avoid the exception by adding a check in >>>> GraphJobRunner.java (at line about 95) like this: >>>> >>>> if(vertices.containsKey(e.****getKey())) >>>> { >>>> vertices.get(e.getKey()).****compute(msgs.iterator()); >>>> >>>> } else { >>>> System.out.println("Ignoring message(s) '" + msgs + "' sent >>>> to >>>> vertex '" + e.getKey() +"'"); >>>> } >>>> >>>> However, what I really want is: >>>> check within PageRank.PageRankVertex's compute() method whether the >>>> target vertex exists >>>> before sending out a message to it. >>>> >>>> That is, in PageRank.java (line 60) , instead of >>>> >>>> sendMessageToNeighbors(new DoubleWritable(this.getValue() >>>> *** >>>> >>>> *.get() / numEdges)); >>>> >>>> I would like to send messages only to "existing" vertices, that is, >>>> those which have an adjacency list in the input. >>>> >>>> Any hints how this can be achieved? >>>> I appears that I am not supposed to access the vertices field of >>>> GraphJobRunner class in some way from within the PageRank.PageRankVertex >>>> class? >>>> >>>> I concede that my example graph may qualify as invalid input ... but on >>>> the other hand: how could I add those missing vertices after a first >>>> pass >>>> through the adjacency lists input? >>>> >>>> Clemens Gröpl >>>> >>>> -- >>>> >>>> Semantic Web Project, IT >>>> >>>> Unister GmbH >>>> Barfußgäßchen 11 | 04109 Leipzig >>>> >>>> Telefon: +49 (0)341 49288 4496 >>>> [email protected]**<mailto:%20contact-semweb@** >>>> unister-gmbh.de<20contact-**[email protected]<[email protected]> >>>> >> >>>> >>>> www.unister.de<http://www.**unister.de <http://www.unister.de>> >>>> >>>> Vertretungsberechtigter Geschäftsführer: Thomas Wagner >>>> Amtsgericht Leipzig, HRB: 19056 >>>> >>>> >>>> >>> -- >>> Thomas Jungblut >>> Berlin<thomas.jungblut@gmail.**com <[email protected]>> >>> >>> >> >> > > -- > > Semantic Web Project, IT > > Unister GmbH > Barfußgäßchen 11 | 04109 Leipzig > > Telefon: +49 (0)341 49288 4496 > [email protected] > <mailto:%20contact-semweb@**unister-gmbh.de<[email protected]> > > > www.unister.de <http://www.unister.de> > > Vertretungsberechtigter Geschäftsführer: Thomas Wagner > Amtsgericht Leipzig, HRB: 19056 > > -- Thomas Jungblut Berlin <[email protected]>
