[ https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533094#comment-14533094 ]
Siddharth Seth commented on TEZ-2426: ------------------------------------- [~bikassaha] - do you have additional logs - the entire AM log specifically. There seems to be a discrepancy in the AM / task log times as well. Assuming the nodes are out of sync. I can see how the exception happens during execution of the next task - since we don't join on the eventRouter thread. However, I'm not sure how the FAILED message will go through for the previous attempt as a result of this. It should have gone through for the currently running task. If it went for the previous task - the AM should have thrown an error related to an invalid taskAttemptId. That leads me to believe something else is broken at the same time. > Task input not complete before sending Task completed event > ----------------------------------------------------------- > > Key: TEZ-2426 > URL: https://issues.apache.org/jira/browse/TEZ-2426 > Project: Apache Tez > Issue Type: Bug > Reporter: Bikas Saha > Priority: Critical > Attachments: am.log, container.log > > > Sequence of events > 1) Task A starts in a container > 2) Task A complete event comes to AM > 3) Task B starts in the same container > 4) Task A's input calls some method on its context. Crashes with NPE > 5) The crash sends an input failed event for Task A to the AM > 6) Task A state machine crashes saying cannot handle failed after success > In some cases, it could be that status update event is also sent after > completion, though not sure if its related to the failed event being sent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)