Currently if failure occurs, the whole job is killed. After 503, it will restart a single tasks when it fails at superstep 5. Yes the state (messages) are stored in the sync() method.
2) What other fault tolerance features are implemented in Hama? > None yet. 3) What is check pointing in Hama? > Writing sent messages to HDFS after a computation phase. Am 5. April 2012 09:10 schrieb Praveen Sripati <[email protected]>: > 1) If a BSPJob has 10 super steps and a task fails at step 5, does the job > need to be run again? Is Hama-503 the solution? Is the state of the job > stored in HDFS between super steps? > > 2) What other fault tolerance features are implemented in Hama? > > 3) What is check pointing in Hama? > > Praveen > -- Thomas Jungblut Berlin <[email protected]>
