[ 
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533094#comment-14533094
 ] 

Siddharth Seth commented on TEZ-2426:
-------------------------------------

[~bikassaha] - do you have additional logs - the entire AM log specifically. 
There seems to be a discrepancy in the AM / task log times as well. Assuming 
the nodes are out of sync. 

I can see how the exception happens during execution of the next task - since 
we don't join on the eventRouter thread.
However, I'm not sure how the FAILED message will go through for the previous 
attempt as a result of this. It should have gone through for the currently 
running task. If it went for the previous task - the AM should have thrown an 
error related to an invalid taskAttemptId. That leads me to believe something 
else is broken at the same time.

> Task input not complete before sending Task completed event
> -----------------------------------------------------------
>
>                 Key: TEZ-2426
>                 URL: https://issues.apache.org/jira/browse/TEZ-2426
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Priority: Critical
>         Attachments: am.log, container.log
>
>
> Sequence of events
> 1) Task A starts in a container
> 2) Task A complete event comes to AM
> 3) Task B starts in the same container
> 4) Task A's input calls some method on its context. Crashes with NPE
> 5) The crash sends an input failed event for Task A to the AM
> 6) Task A state machine crashes saying cannot handle failed after success
> In some cases, it could be that status update event is also sent after 
> completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to