[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168220#comment-13168220
 ] 

Mac Fang commented on MAPREDUCE-3362:
-------------------------------------

I think the 2 scenarios are different. 

The scenario in this issue is the Map/Reduce tasks in this job are done, but 
the job still stay pending. The root cause is the 
ConcurrentModificationException, if the exception happen, the counter is wrong.
                
> Job always stay at 'Pending' status and cannot finish several days
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3362
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3362
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, jobtracker
>    Affects Versions: 0.20.2
>            Reporter: Denny Ye
>            Priority: Critical
>              Labels: jobtracker
>
> Our jobs are always keeping at 'pending' status several days. We checked 
> jobtracker log and found that one task(attemp) failed due to failure to store 
> job history to HDFS. 
> The issue begins from the business that another job remove the folder that 
> this job is being written with history log. In this case, there has 
> 'ConcurrentModificationException' at JobHistory#log(ArrayList<PrintWriter> 
> writers, RecordTypes recordType, Keys[] keys, String[] values, JobID id). One 
> thread checked if there has any output error and removed output with history 
> folder at HDFS has been removed, another thread got 
> 'ConcurrentModificationException' at current 'writers' is blank.
> Unfortunately, no one want to catch this exception and it go thought to 
> TaskTracker(it jump over the calculating part to add 'finishedMapTask'). 
> Then, another task(attemp) runs from 'failedMap' successfully, but the total 
> 'finishedMapTask' number is not the all finishedMapTask. JobCleanupTask 
> cannot startup and job always stay at 'pending' status.
> The root cause:
> First task(attemp) failed with exception and this task add to 'failedMap' 
> with decrease the 'finishedMap' counter. Next task(attemp) runs successfully 
> and increase one for 'finishedMap'. Due to failure the total 'finishedMap' is 
> less that actual finishedMap counter, so the cleanup task cannot runs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to