[
https://issues.apache.org/jira/browse/TEZ-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947328#comment-13947328
]
Siddharth Seth commented on TEZ-973:
------------------------------------
{code}
endState = DAGState.ERROR;
{code}
DAGImpl - during recovery sets state to ERROR ? Should that just be FAILED.
DAGCommitStartedEvent - the failure reason gets reported as COMMIT_FAILED.
Should this be INTERNAL_ERROR to be consistent with VertexComitFailure.
Similarly for VertexGroupCommitStarted / VertexGroupCommitFinished.
Vertex history event write failure is putting the DAG into ERROR state. FAILED
with a different cause seems more appropriate - that's consistent with critical
summary failures causing the DAG to stay in it's current state or be marked as
FAILEd/KILLed. INTERNAL_ERROR remains as a means of indicating likely bugs in
the state machines.
> Abort additional attempts if recovery fails.
> --------------------------------------------
>
> Key: TEZ-973
> URL: https://issues.apache.org/jira/browse/TEZ-973
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Hitesh Shah
> Attachments: TEZ-973.1.patch, TEZ-973.2.patch, TEZ-973.3.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)