[ https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe updated TEZ-3102: ---------------------------- Attachment: TEZ-3102.003.patch Thanks for the reviews, Bikas! testTaskSucceedAndRetroActiveFailure doesn't cover the change since it's using the failed transition rather than the killed transition, so I added a test that explicitly kills a successful attempt to verify it reverts back to scheduling a new attempt. The reported test failures appear to be unrelated, as they pass for me locally. > Fetch failure of a speculated task causes job hang > -------------------------------------------------- > > Key: TEZ-3102 > URL: https://issues.apache.org/jira/browse/TEZ-3102 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Critical > Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch, > TEZ-3102.003.patch > > > If a task speculates then succeeds, one task will be marked successful and > the other killed. Then if the task retroactively fails due to fetch failures > the Tez AM will fail to reschedule another task. This results in a hung job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)