Hello, community, I noticed one surprising code region in Spark. It's in DAGScheduler --> handleTaskCompletion --> case FetchFailed --> failed += magStage. (I am in branch-0.8)
That is to say, if one task failed to fetch its input from shuffledependency, this task's stage and its parent stage will both need to be re-executed. If my understanding is correct, then why does Spark need to materialize the intermediate data? This is "restart" fault tolerance mechanism. thanks for your help, dachuan. -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210
