[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated TEZ-3102:
----------------------------
    Attachment: TEZ-3102.001.patch

Attaching a patch that does sufficient processing of the kill event for the 
task that lost the speculation race to prevent the task state machine from 
thinking it still has outstanding attempts.

> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>
>                 Key: TEZ-3102
>                 URL: https://issues.apache.org/jira/browse/TEZ-3102
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: TEZ-3102.001.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to