[ https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529440#comment-14529440 ]
Hitesh Shah commented on TEZ-2404: ---------------------------------- The basic assumption in recovery is that all events are written to the recovery log before a task is marked as completed. I do not believe recovery even cares about distinguishing events generated by different attempts as it holds on to all of them and routes them on recovery. TEZ-2325 seems to have changed the above assumption. > Handle DataMovementEvent before its TaskAttemptCompletedEvent > ------------------------------------------------------------- > > Key: TEZ-2404 > URL: https://issues.apache.org/jira/browse/TEZ-2404 > Project: Apache Tez > Issue Type: Bug > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Priority: Critical > Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch > > > TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it > would cause recovery issue. Recovery need that DataMovement event is handled > before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in > recovering and cause the its dependent tasks hang. > 2 Ways to fix this issue. > 1. Still route TaskAtttemptCompletedEvent in Vertex > 2. route DataMovementEvent before TaskAttemptCompeltedEvent in > TezTaskAttemptListener -- This message was sent by Atlassian JIRA (v6.3.4#6332)