[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060228#comment-15060228
 ] 

Jason Lowe commented on MAPREDUCE-4788:
---------------------------------------

I'm confused as to why increasing the sleep period is an appropriate fix for 
this.  Even if the AM doesn't stick around the job client should be redirected 
to the history server if the AM has already exited.  Is the job history not 
correct on this state as well?

Normally for a job to fail at least one task fails (ignoring the cases where we 
fail during job init or job commit).  Can someone explain the sequence of 
events that allows the job to be marked failed due to task failure but no tasks 
are in the FAILED state?  Normally a job will fail because a task reported 
failure, and at that point that task should be in the FAILED state.  Is there 
an AM log or some other evidence that shows the sequence of state transitions 
that leads to this problem?

> Job are marking as FAILED even if there are no failed tasks in it
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4788
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4788
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.6.0
>            Reporter: Devaraj K
>         Attachments: MAPREDUCE-4788.patch
>
>
> Sometimes Jobs are marking as FAILED and some the tasks are marking as KILLED 
> in it. 
> In MRAppMaster, JobFinishEvent is triggering and waiting for the 5000 millis. 
> If any tasks final state is unknown by this time those tasks are marking as 
> KILLED and Job state is marking as FAILED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to