Currently if failure occurs, the whole job is killed.
After 503, it will restart a single tasks when it fails at superstep 5.
Yes the state (messages) are stored in the sync() method.

2) What other fault tolerance features are implemented in Hama?
>

None yet.

3) What is check pointing in Hama?
>

Writing sent messages to HDFS after a computation phase.

Am 5. April 2012 09:10 schrieb Praveen Sripati <[email protected]>:

> 1) If a BSPJob has 10 super steps and a task fails at step 5, does the job
> need to be run again? Is Hama-503 the solution? Is the state of the job
> stored in HDFS between super steps?
>
> 2) What other fault tolerance features are implemented in Hama?
>
> 3) What is check pointing in Hama?
>
> Praveen
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Reply via email to