[ https://issues.apache.org/jira/browse/MAPREDUCE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060228#comment-15060228 ]
Jason Lowe commented on MAPREDUCE-4788: --------------------------------------- I'm confused as to why increasing the sleep period is an appropriate fix for this. Even if the AM doesn't stick around the job client should be redirected to the history server if the AM has already exited. Is the job history not correct on this state as well? Normally for a job to fail at least one task fails (ignoring the cases where we fail during job init or job commit). Can someone explain the sequence of events that allows the job to be marked failed due to task failure but no tasks are in the FAILED state? Normally a job will fail because a task reported failure, and at that point that task should be in the FAILED state. Is there an AM log or some other evidence that shows the sequence of state transitions that leads to this problem? > Job are marking as FAILED even if there are no failed tasks in it > ----------------------------------------------------------------- > > Key: MAPREDUCE-4788 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4788 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Affects Versions: 2.6.0 > Reporter: Devaraj K > Attachments: MAPREDUCE-4788.patch > > > Sometimes Jobs are marking as FAILED and some the tasks are marking as KILLED > in it. > In MRAppMaster, JobFinishEvent is triggering and waiting for the 5000 millis. > If any tasks final state is unknown by this time those tasks are marking as > KILLED and Job state is marking as FAILED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)