[ https://issues.apache.org/jira/browse/TEZ-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Zhang updated TEZ-1642: ---------------------------- Fix Version/s: (was: 0.6.0) 0.5.4 > TestAMRecovery sometimes fail > ----------------------------- > > Key: TEZ-1642 > URL: https://issues.apache.org/jira/browse/TEZ-1642 > Project: Apache Tez > Issue Type: Bug > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Fix For: 0.5.4 > > Attachments: TEZ-1642-2.patch, TEZ-1642-3.patch, TEZ-1642-4.patch, > TEZ-1642-5.patch, TEZ-1642.patch > > > TestAMRecovery fails sometimes on testVertexPartiallyFinished_XXX. > The scenario is that we'd like kill AM when vertex is partially finished ( > with 2 tasks, task_0 is finished and task_1 is running). When in recovery, > task_0 should not rerun and task_1 should rerun. ( We use the recovery > log(TaskAttemptFinishedEvent) to judge whether task is rerun) > Currently, using VertexManager.onSourceTaskCompleted to control when to kill > AM, but it is not perfect. VertexManager.onSourceTaskCompleted is not > invoked at the moment task attempt is finished ( TaskAttempt send event to > Task to tell TaskAttempt is finsihed, and then Task send event to Vertex to > trigger VM.onSourceTaskCompleted) > The following case is possible: task_0 finished -> task_1 finished -> > VM.onSourceTaskCompleted -> VM.onSourceTaskCompleted > In this case, we will take it as partially completed in the first > VM.onSourceTaskCompleted, but actually the vertex is fully completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)