[ https://issues.apache.org/jira/browse/MAPREDUCE-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163548#comment-13163548 ]
Subroto Sanyal commented on MAPREDUCE-3362: ------------------------------------------- Hi Denny, By any chance are you falling into this scenario: https://issues.apache.org/jira/browse/MAPREDUCE-2129?focusedCommentId=13081564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13081564 > Job always stay at 'Pending' status and cannot finish several days > ------------------------------------------------------------------ > > Key: MAPREDUCE-3362 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3362 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, jobtracker > Affects Versions: 0.20.2 > Reporter: Denny Ye > Priority: Critical > Labels: jobtracker > > Our jobs are always keeping at 'pending' status several days. We checked > jobtracker log and found that one task(attemp) failed due to failure to store > job history to HDFS. > The issue begins from the business that another job remove the folder that > this job is being written with history log. In this case, there has > 'ConcurrentModificationException' at JobHistory#log(ArrayList<PrintWriter> > writers, RecordTypes recordType, Keys[] keys, String[] values, JobID id). One > thread checked if there has any output error and removed output with history > folder at HDFS has been removed, another thread got > 'ConcurrentModificationException' at current 'writers' is blank. > Unfortunately, no one want to catch this exception and it go thought to > TaskTracker(it jump over the calculating part to add 'finishedMapTask'). > Then, another task(attemp) runs from 'failedMap' successfully, but the total > 'finishedMapTask' number is not the all finishedMapTask. JobCleanupTask > cannot startup and job always stay at 'pending' status. > The root cause: > First task(attemp) failed with exception and this task add to 'failedMap' > with decrease the 'finishedMap' counter. Next task(attemp) runs successfully > and increase one for 'finishedMap'. Due to failure the total 'finishedMap' is > less that actual finishedMap counter, so the cleanup task cannot runs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira