[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437909#comment-16437909
 ] 

Kuhu Shukla commented on TEZ-3817:
----------------------------------

Thank you for the review comments Jon! I have attached a revised patch and the 
test now checks for the appropriate end state. One thing to note here is that 
the end state is FAILED and not error since we catch Exception and let the 
{{finalState}} passed to {{finished()}} call decide the dag's internal state. 
The AM is still notified of the DAG error.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -------------------------------------------------------------------------
>
>                 Key: TEZ-3817
>                 URL: https://issues.apache.org/jira/browse/TEZ-3817
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.1, 0.9.0
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to