[ https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe updated TEZ-3102: ---------------------------- Attachment: TEZ-3102.001.patch Attaching a patch that does sufficient processing of the kill event for the task that lost the speculation race to prevent the task state machine from thinking it still has outstanding attempts. > Fetch failure of a speculated task causes job hang > -------------------------------------------------- > > Key: TEZ-3102 > URL: https://issues.apache.org/jira/browse/TEZ-3102 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Critical > Attachments: TEZ-3102.001.patch > > > If a task speculates then succeeds, one task will be marked successful and > the other killed. Then if the task retroactively fails due to fetch failures > the Tez AM will fail to reschedule another task. This results in a hung job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)