[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated TEZ-3102:
----------------------------
    Attachment: TEZ-3102.003.patch

Thanks for the reviews, Bikas!

testTaskSucceedAndRetroActiveFailure doesn't cover the change since it's using 
the failed transition rather than the killed transition, so I added a test that 
explicitly kills a successful attempt to verify it reverts back to scheduling a 
new attempt.

The reported test failures appear to be unrelated, as they pass for me locally.

> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>
>                 Key: TEZ-3102
>                 URL: https://issues.apache.org/jira/browse/TEZ-3102
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch, 
> TEZ-3102.003.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to