How to activate checkpoint?
Hi, I'm using Giraph 1.0.0 and I ran RandomMessageBenchmark in Giraph. In the middle of the process I tried killing a hadoop task (= a worker). Suddenly the process just failed with the following exception in master 2014-11-29 04:40:18,049 INFO org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 1 took 611.669 seconds ended with state WORKER_FAILURE and is now on superstep 1 2014-11-29 04:40:18,313 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with RuntimeException java.lang.RuntimeException: restartFromCheckpoint: KeeperException at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1185) at org.apache.giraph.master.MasterThread.run(MasterThread.java:135) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /_hadoopBsp/job_201411290417_0003/_edgeInputSplitDir at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:307) at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1179) ... 1 more 2014-11-29 04:40:18,315 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.RuntimeException: restartFromCheckpoint: KeeperException, exiting... java.lang.IllegalStateException: java.lang.RuntimeException: restartFromCheckpoint: KeeperException at org.apache.giraph.master.MasterThread.run(MasterThread.java:181) Caused by: java.lang.RuntimeException: restartFromCheckpoint: KeeperException at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1185) at org.apache.giraph.master.MasterThread.run(MasterThread.java:135) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /_hadoopBsp/job_201411290417_0003/_edgeInputSplitDir at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:307) at org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1179) Is this some kind of bug in Giraph? What I see from the log is: master is trying to do restartFromCheckpoint but it failed. How can I activate a checkpoint situation in Giraph? Thanks Regards, Vincentius Martin
When do Giraph vertices receive their messages?
I am curious about how does Giraph receive messages before processing it I know that they use their accepted messages in the compute() method on the next superstep, but when do they receive it? If it is before the checkpoint process, is there any part in the documentation/code that I can see to understand it? Also, what mechanism that Giraph use to store messages before superstep S+1? Are they store it in a buffer or disk first? I still cannot find anything about this. Regards, Vincentius Martin
storeCheckpoint() in worker can be slow when a slow worker is presence
Hi all, I have a question related to my last experience using Giraph. In Giraph worker's code, I see a line like this: *getServerData().getCurrentMessageStore().writePartition(verticesOutputStream, partition.getId());* To the best of my knowledge, while executing this line, a worker writes some of its partitionMap entry into outputStream. However, with the existence of a slow workers, this line execution in another worker also gets slower. I see that the execution gets faster again after the slow worker has finished its storeCheckpoint process. From what I know, each worker uses its own message store and write to different output stream. Hence, why the slow storeCheckpoint process in a worker can affect another worker checkpoint process? Thanks
Place where Giraph log files are storred
Hi, Sorry if it is a basic question. I'm still using Giraph-1.0.0. How can I enable giraph metrics? Aside from some logs which are outputted to the console by mapreduce, is there any log directory that I can check to see what happened inside the process? Thanks Regards, Vincentius Martin