Vertex with no outgoing edges doesn't execute Compute in Superstep 0
Hello all, I was playing around with Shortest path example. I decided to write my own input format to match with SNAP's LJ dataset ( http://snap.stanford.edu/data/soc-LiveJournal1.html)http://snap.stanford.edu/data/soc-LiveJournal1.html ). This is an edge format so I wrote LongFloatTextEdgeInputFormat (which is similar to IntNullTextEdgeInputFormat given in io/formats directory). To test this I had following input : 0 1 0 2 0 3 1 2 2 0 2 3 And all edges have weight 1. I tried running ShortestPath example by specifying the edge input format and the above input file and I got the following output : 0 2.0 1 0 2 1.0 3 0 Notice that for vertex 3 it should be 2 (source is vertex 1). I thought there is some problem with my input format, so went back to vertex input format as specified in the quick start guide. Here is my vertex input in JSON format : [0,0,[[1,1],[2,1],[3,1]]] [1,0,[[2,1]]] [2,0,[[3,1],[0,1]]] Notice that vertex 3 doesn't have any outgoing edge so I didn't added an entry for it. Even with this I got the same output. Then I enabled debug and found that Vertex 3 doesn't execute Superstep 0 at all. It only executed Superstep 3 and 4 (in Superstep 3 it receives a message from vertex 2). Also, in Superstep 3 it shows that it has vertex value = 0. Does it mean that vertices with no outgoing edges are not active in the beginning? Is there any way to fix this? A quick and dirty fix will be adding a line to vertex input file - [3, 0, []] but what if I don't want to use vertex input format and use edge input format as described above? Thank you. Sincerely, Vivek
RE: giraph 1.1.0 Execution Error
Hi Xenia, I think there is some problem with Zookeeper. Can you make sure that Zookeeper server is running. If it is running then is it on port 22181? (because your Giraph job is trying to connect on this port). If Zookeeper is running on some different port then try running your Giraph job with -Dgiraph.zkList=zookeper server ip:zookeeper port I'm not sure whether you have to start an instance of zookeeper separately or Giraph will start one for you, I have a separate instance running on my cluster and I specify the server and port via -Dgiraph.zkList option. I hope that works. Vivek From: xeniad20 xenia...@gmail.com Sent: Thursday, August 7, 2014 3:46 PM To: user@giraph.apache.org Subject: giraph 1.1.0 Execution Error Hi experts, I try to execute Giraph 1.1.0 on a small cluster but I have the following Errors: 2014-08-07 23:35:46,141 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server DataNode2/10.190.12.33:22181. Will not attempt to authenticate using SASL (unknown error) 2014-08-07 23:35:46,142 WARN org.apache.zookeeper.ClientCnxn: Session 0x147b22ebf420001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 2014-08-07 23:35:46,243 WARN org.apache.giraph.zk.ZooKeeperExt: deleteExt: Connection loss on attempt 2, waiting 5000 msecs before retrying. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201408072332_0003/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir/datanode1_1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:302) at org.apache.giraph.worker.BspServiceWorker.unregisterHealth(BspServiceWorker.java:768) at org.apache.giraph.worker.BspServiceWorker.failureCleanup(BspServiceWorker.java:782) at org.apache.giraph.graph.GraphTaskManager.workerFailureCleanup(GraphTaskManager.java:900) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:100) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) 2014-08-07 23:35:48,126 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server DataNode2/10.190.12.33:22181. Will not attempt to authenticate using SASL (unknown error) 2014-08-07 23:35:48,127 WARN org.apache.zookeeper.ClientCnxn: Session 0x147b22ebf420001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 2014-08-07 23:35:49,368 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread Thread-12, msg = createExt: Failed to create /_hadoopBsp/job_201408072332_0003/_workerProgresses/1 after 3 tries!, exiting... java.lang.IllegalStateException: createExt: Failed to create /_hadoopBsp/job_201408072332_0003/_workerProgresses/1 after 3 tries! at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:182) at org.apache.giraph.zk.ZooKeeperExt.createOrSetExt(ZooKeeperExt.java:247) at org.apache.giraph.worker.WorkerProgress.writeToZnode(WorkerProgress.java:110) at org.apache.giraph.worker.WorkerProgressWriter$1.run(WorkerProgressWriter.java:59) at java.lang.Thread.run(Thread.java:724) However Giraph 1.0.0 version run without any problems. What might be the solution for the above errors? Any help is appreciated. Thanks Xenia
RE: Setting variable value in Compute class and using it in the next superstep
Hello again, As Tom and Matthew suggested I wrote my own custom vertex value class and input format class. I followed Matthew's example to create my own custom vertex class but now I'm getting the following error while running the program java.lang.IllegalStateException: newInstance: Illegal access org.apache.giraph.examples.DeltaVertexWritable at org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:84) at org.apache.giraph.utils.WritableUtils.createWritable(WritableUtils.java:68) at org.apache.giraph.factories.DefaultVertexValueFactory.newInstance(DefaultVertexValueFactory.java:48) at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createVertexValue(ImmutableClassesGiraphConfiguration.java:729) at org.apache.giraph.utils.VertexIterator.resetEmptyVertex(VertexIterator.java:69) at org.apache.giraph.utils.VertexIterator.init(VertexIterator.java:60) at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:108) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Here is my DeltaVertexWritable class - https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60 I tried digging a bit but I couldn't get any success [at the first place I even didn't understand the error message!] Thank you. Vivek From: Sardeshmukh, Vivek vivek-sardeshm...@uiowa.edu Sent: Monday, July 21, 2014 6:06 PM To: user@giraph.apache.org Subject: RE: Setting variable value in Compute class and using it in the next superstep Thank you Matthew. Now writing a custom vertex class and input format seems doable! Thank you. -- Vivek From: Matthew Saltz sal...@gmail.com Sent: Monday, July 21, 2014 5:50 PM To: user@giraph.apache.org Subject: Re: Setting variable value in Compute class and using it in the next superstep Yeah, that's true. Sorry I forgot that part. Luckily, it isn't too tricky either, depending on the input format of your graph. Here's another examplehttps://gist.github.com/saltzm/ab7172c57dec927061be to get you started, for a very simple input format for edges with no values. I basically took the code straight from herehttp://giraph.apache.org/apidocs/org/apache/giraph/io/formats/LongLongNullTextInputFormat.html and modified where I needed to it to return the InputFormat that I needed for my code. You'll probably be better off digging through some of the already implemented InputFormat classes that come with Giraph to do something similar, since I'm guessing your input files will be different than mine. Take a look at the subclasses of TextVertexInputFormathttp://giraph.apache.org/apidocs/org/apache/giraph/io/formats/TextVertexInputFormat.html, since they deal with a lot of common input format styles, and see if you can modify their code to work with your custom vertex data format. Now, the example I give you is also easy because I just use the default constructor of the class, but if you need to load additional data from the file into your vertex data and the default constructor isn't appropriate, you may have to do some extra parsing and legwork for that. Best of luck, Matthew On Tue, Jul 22, 2014 at 12:28 AM, Sardeshmukh, Vivek vivek-sardeshm...@uiowa.edumailto:vivek-sardeshm...@uiowa.edu wrote: Thank you Matthew for the example link. It is helpful. I'll give it a shot. If I have a custom vertex class isn't it necessary to change the VertexInputFormat class too? Since this class loads the data into the vertex and if vertex has a custom value field then it doesn't know how to load the input. Am I right? Vivek From: Schweiger, Tom thschwei...@ebay.commailto:thschwei...@ebay.com Sent: Monday, July 21, 2014 5:16 PM To: user@giraph.apache.orgmailto:user@giraph.apache.org Subject: RE: Setting variable value in Compute class and using it in the next superstep For more than one flag, a custom class is necessary (unless you're able to, say, toggle the sign bit to get double usage out or a value). I've started a private thread with Vivek to get a better understanding of what he was trying to solve. And you are also correct that there isn't much
Setting variable value in Compute class and using it in the next superstep
Hi, all-- In my algorithm, I need to set a flag if certain conditions hold (locally at a vertex v). If this flag is set then execute some other block of code *only once*, and do nothing until some other condition is hold. My question is, can I declare a flag variable in the class where I override compute function? I defined the flag as a public variable and setting it once the conditions are met but it seems the value is not carried over to the next superstep. I dig a little bit in this mailing list and found this https://www.mail-archive.com/user@giraph.apache.org/msg01266.html This post also suggests (along with what I described above) to have a field in the vertex value itself. For that I need to change the vertex input format and also create my own custom vertex class. Is it really necessary? By the way, I am using Giraph 1.1.0 compiled against Hadoop 1.0.3. I was able to run SimpleShortestPathComputation successfully. Here are more technical details of my algorithm: I am trying to implement Delta-stepping shortest path algorithm ( http://dl.acm.org/citation.cfm?id=740136 or http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2200 ). This was mentioned in Pregel paper. A vertex relax light edges if it belongs to the minimum bucket index (of course, aggregators!). Once a vertex is done with relaxing light edges it relaxes heavy edges (here is where I need a flag) once. A vertex may be re-inserted to a newer bucket and may have to execute all the steps that I described here again. Thanks. Sincerely, Vivek A beginner in Giraph (and Java too!)
RE: Setting variable value in Compute class and using it in the next superstep
Thank you Tom for your prompt reply. If that is the case then I might be doing something wrong. I'll take a closer look with debug enabled and keep you posted. Thank you again. Vivek From: Schweiger, Tom thschwei...@ebay.com Sent: Monday, July 21, 2014 4:37 PM To: user@giraph.apache.org Subject: RE: Setting variable value in Compute class and using it in the next superstep And in answer of : This post also suggests (along with what I described above) to have a field in the vertex value itself. For that I need to change the vertex input format and also create my own custom vertex class. Is it really necessary? No, you don't need a custom vertex class or vertex input format. You can create/initialize the value at the beginning of the first superstep. From: Sardeshmukh, Vivek [vivek-sardeshm...@uiowa.edu] Sent: Monday, July 21, 2014 2:05 PM To: user@giraph.apache.org Subject: Setting variable value in Compute class and using it in the next superstep Hi, all-- In my algorithm, I need to set a flag if certain conditions hold (locally at a vertex v). If this flag is set then execute some other block of code *only once*, and do nothing until some other condition is hold. My question is, can I declare a flag variable in the class where I override compute function? I defined the flag as a public variable and setting it once the conditions are met but it seems the value is not carried over to the next superstep. I dig a little bit in this mailing list and found this https://www.mail-archive.com/user@giraph.apache.org/msg01266.html This post also suggests (along with what I described above) to have a field in the vertex value itself. For that I need to change the vertex input format and also create my own custom vertex class. Is it really necessary? By the way, I am using Giraph 1.1.0 compiled against Hadoop 1.0.3. I was able to run SimpleShortestPathComputation successfully. Here are more technical details of my algorithm: I am trying to implement Delta-stepping shortest path algorithm ( http://dl.acm.org/citation.cfm?id=740136 or http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2200 ). This was mentioned in Pregel paper. A vertex relax light edges if it belongs to the minimum bucket index (of course, aggregators!). Once a vertex is done with relaxing light edges it relaxes heavy edges (here is where I need a flag) once. A vertex may be re-inserted to a newer bucket and may have to execute all the steps that I described here again. Thanks. Sincerely, Vivek A beginner in Giraph (and Java too!)