[jira] [Commented] (MAPREDUCE-4428) A failed job is not available under job history if the job is killed right around the time job is notified as failed

Robert Joseph Evans (JIRA) Fri, 13 Jul 2012 09:01:39 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413840#comment-13413840
 ]


Robert Joseph Evans commented on MAPREDUCE-4428:
------------------------------------------------

You should not need to restart all of yarn to update the counters max.  You 
should be able to set it on a per application basis assuming that you do not 
have it marked as final in mapred-site.xml, although you may get similar errors 
in the History Server if you do that.

Could you please file a separate JIRA for the counter's limit issue.  We should 
have a cleaner way to deal with the counter's limit being exceeded.  

I agree with you that this is a fix that needs to happen, Sadly it is just not 
a simple fix.  I will talk with some co-workers about this to see that we can 
come up with.
                
> A failed job is not available under job history if the job is killed right 
> around the time job is notified as failed 
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4428
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4428
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, jobtracker
>    Affects Versions: 2.0.0-alpha
>            Reporter: Rahul Jain
>         Attachments: am_failed_counter_limits.txt, appMaster_bad.txt, 
> appMaster_good.txt, resrcmgr_bad.txt
>
>
> We have observed this issue consistently running hadoop CDH4 version (based 
> upon 2.0 alpha release):
> In case our hadoop client code gets a notification for a completed job ( 
> using RunningJob object job, with (job.isComplete() && 
> job.isSuccessful()==false)
> the hadoop client code does an unconditional job.killJob() to terminate the 
> job.
> With earlier hadoop versions (verified on hadoop 0.20.2 version), we still  
> have full access to job logs afterwards through hadoop console. However, when 
> using MapReduceV2, the failed hadoop job no longer shows up under jobhistory 
> server. Also, the tracking URL of the job still points to the non-existent 
> Application master http port.
> Once we removed the call to job.killJob() for failed jobs from our hadoop 
> client code, we were able to access the job in job history with mapreduce V2 
> as well. Therefore this appears to be a race condition in the job management 
> wrt. job history for failed jobs.
> We do have the application master and node manager logs collected for this 
> scenario if that'll help isolate the problem and the fix better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4428) A failed job is not available under job history if the job is killed right around the time job is notified as failed

Reply via email to