[jira] [Commented] (TEZ-814) Improve heuristic for determining a task has failed outputs

Rajesh Balamohan (JIRA) Thu, 17 Sep 2015 19:34:12 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804876#comment-14804876
 ]


Rajesh Balamohan commented on TEZ-814:
--------------------------------------

lgtm. +1. Even when tez.task.max.allowed.output.failures & 
tez.task.max.allowed.output.failures.fraction are not converging, this would 
end up restarting producer after 300 seconds in case of output read-error. 
Should this be backported to 0.6 and 0.5 as well?

> Improve heuristic for determining a task has failed outputs
> -----------------------------------------------------------
>
>                 Key: TEZ-814
>                 URL: https://issues.apache.org/jira/browse/TEZ-814
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.7.1
>
>         Attachments: TEZ-814.1.patch, TEZ-814.2.patch
>
>
> Currently 25% of consumers need to report failure. However we may not always 
> have those many error reports. Eg. this is the last consumer and it the 
> source is lost. Or some consumers are cut off from the source. The job may 
> hang on those consumers waiting for a re-run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-814) Improve heuristic for determining a task has failed outputs

Reply via email to