[ https://issues.apache.org/jira/browse/MAPREDUCE-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413840#comment-13413840 ]
Robert Joseph Evans commented on MAPREDUCE-4428: ------------------------------------------------ You should not need to restart all of yarn to update the counters max. You should be able to set it on a per application basis assuming that you do not have it marked as final in mapred-site.xml, although you may get similar errors in the History Server if you do that. Could you please file a separate JIRA for the counter's limit issue. We should have a cleaner way to deal with the counter's limit being exceeded. I agree with you that this is a fix that needs to happen, Sadly it is just not a simple fix. I will talk with some co-workers about this to see that we can come up with. > A failed job is not available under job history if the job is killed right > around the time job is notified as failed > --------------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4428 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4428 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, jobtracker > Affects Versions: 2.0.0-alpha > Reporter: Rahul Jain > Attachments: am_failed_counter_limits.txt, appMaster_bad.txt, > appMaster_good.txt, resrcmgr_bad.txt > > > We have observed this issue consistently running hadoop CDH4 version (based > upon 2.0 alpha release): > In case our hadoop client code gets a notification for a completed job ( > using RunningJob object job, with (job.isComplete() && > job.isSuccessful()==false) > the hadoop client code does an unconditional job.killJob() to terminate the > job. > With earlier hadoop versions (verified on hadoop 0.20.2 version), we still > have full access to job logs afterwards through hadoop console. However, when > using MapReduceV2, the failed hadoop job no longer shows up under jobhistory > server. Also, the tracking URL of the job still points to the non-existent > Application master http port. > Once we removed the call to job.killJob() for failed jobs from our hadoop > client code, we were able to access the job in job history with mapreduce V2 > as well. Therefore this appears to be a race condition in the job management > wrt. job history for failed jobs. > We do have the application master and node manager logs collected for this > scenario if that'll help isolate the problem and the fix better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira