[ https://issues.apache.org/jira/browse/MAPREDUCE-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rahul Jain updated MAPREDUCE-4428: ---------------------------------- Attachment: appMaster_good.txt appMaster_bad.txt Included both a good case from web interface (appMaster_good.txt) where no kill was done on the job; And the bad case logs collected from hdfs sifting (appMaster_bad.txt) > A failed job is not available under job history if the job is killed right > around the time job is notified as failed > --------------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4428 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4428 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, jobtracker > Affects Versions: 2.0.0-alpha > Reporter: Rahul Jain > Attachments: appMaster_bad.txt, appMaster_good.txt > > > We have observed this issue consistently running hadoop CDH4 version (based > upon 2.0 alpha release): > In case our hadoop client code gets a notification for a completed job ( > using RunningJob object job, with (job.isComplete() && > job.isSuccessful()==false) > the hadoop client code does an unconditional job.killJob() to terminate the > job. > With earlier hadoop versions (verified on hadoop 0.20.2 version), we still > have full access to job logs afterwards through hadoop console. However, when > using MapReduceV2, the failed hadoop job no longer shows up under jobhistory > server. Also, the tracking URL of the job still points to the non-existent > Application master http port. > Once we removed the call to job.killJob() for failed jobs from our hadoop > client code, we were able to access the job in job history with mapreduce V2 > as well. Therefore this appears to be a race condition in the job management > wrt. job history for failed jobs. > We do have the application master and node manager logs collected for this > scenario if that'll help isolate the problem and the fix better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira