How to activate checkpoint?

2014-11-29 Thread Vincentius Martin
Hi,

I'm using Giraph 1.0.0 and I ran RandomMessageBenchmark in Giraph.

In the middle of the process I tried killing a hadoop task (= a worker).
Suddenly the process just failed with the following exception in master

2014-11-29 04:40:18,049 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 1 took 611.669 seconds ended
with state WORKER_FAILURE and is now on superstep 1
2014-11-29 04:40:18,313 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with RuntimeException
java.lang.RuntimeException: restartFromCheckpoint: KeeperException
at 
org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1185)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:135)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201411290417_0003/_edgeInputSplitDir
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:307)
at 
org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1179)
... 1 more
2014-11-29 04:40:18,315 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg =
java.lang.RuntimeException: restartFromCheckpoint: KeeperException,
exiting...
java.lang.IllegalStateException: java.lang.RuntimeException:
restartFromCheckpoint: KeeperException
at org.apache.giraph.master.MasterThread.run(MasterThread.java:181)
Caused by: java.lang.RuntimeException: restartFromCheckpoint: KeeperException
at 
org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1185)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:135)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201411290417_0003/_edgeInputSplitDir
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:307)
at 
org.apache.giraph.master.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1179)

Is this some kind of bug in Giraph? What I see from the log is: master is
trying to do restartFromCheckpoint but it failed.

How can I activate a checkpoint situation in Giraph?

Thanks

Regards,
Vincentius Martin


When do Giraph vertices receive their messages?

2014-11-09 Thread Vincentius Martin
I am curious about how does Giraph receive messages before processing it

I know that they use their accepted messages in the compute() method on the
next superstep, but when do they receive it? If it is before the checkpoint
process, is there any part in the documentation/code that I can see to
understand it?

Also, what mechanism that Giraph use to store messages before superstep
S+1? Are they store it in a buffer or disk first?

I still cannot find anything about this.

Regards,
Vincentius Martin


storeCheckpoint() in worker can be slow when a slow worker is presence

2014-11-08 Thread Vincentius Martin
Hi all,

I have a question related to my last experience using Giraph.

In Giraph worker's code, I see a line like this:

*getServerData().getCurrentMessageStore().writePartition(verticesOutputStream,
partition.getId());*

To the best of my knowledge, while executing this line, a worker writes
some of its partitionMap entry into outputStream. However, with the
existence of a slow workers, this line execution in another worker also
gets slower.

I see that the execution gets faster again after the slow worker has
finished its storeCheckpoint process.

From what I know, each worker uses its own message store and write to
different output stream. Hence, why the slow storeCheckpoint process in a
worker can affect another worker checkpoint process?

Thanks


Place where Giraph log files are storred

2014-10-16 Thread Vincentius Martin
Hi,

Sorry if it is a basic question.

I'm still using Giraph-1.0.0. How can I enable giraph metrics?

Aside from some logs which are outputted to the console by mapreduce, is
there any log directory that I can check to see what happened inside the
process?

Thanks

Regards,
Vincentius Martin