[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413840#comment-13413840
 ] 

Robert Joseph Evans commented on MAPREDUCE-4428:
------------------------------------------------

You should not need to restart all of yarn to update the counters max.  You 
should be able to set it on a per application basis assuming that you do not 
have it marked as final in mapred-site.xml, although you may get similar errors 
in the History Server if you do that.

Could you please file a separate JIRA for the counter's limit issue.  We should 
have a cleaner way to deal with the counter's limit being exceeded.  

I agree with you that this is a fix that needs to happen, Sadly it is just not 
a simple fix.  I will talk with some co-workers about this to see that we can 
come up with.
                
> A failed job is not available under job history if the job is killed right 
> around the time job is notified as failed 
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4428
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4428
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, jobtracker
>    Affects Versions: 2.0.0-alpha
>            Reporter: Rahul Jain
>         Attachments: am_failed_counter_limits.txt, appMaster_bad.txt, 
> appMaster_good.txt, resrcmgr_bad.txt
>
>
> We have observed this issue consistently running hadoop CDH4 version (based 
> upon 2.0 alpha release):
> In case our hadoop client code gets a notification for a completed job ( 
> using RunningJob object job, with (job.isComplete() && 
> job.isSuccessful()==false)
> the hadoop client code does an unconditional job.killJob() to terminate the 
> job.
> With earlier hadoop versions (verified on hadoop 0.20.2 version), we still  
> have full access to job logs afterwards through hadoop console. However, when 
> using MapReduceV2, the failed hadoop job no longer shows up under jobhistory 
> server. Also, the tracking URL of the job still points to the non-existent 
> Application master http port.
> Once we removed the call to job.killJob() for failed jobs from our hadoop 
> client code, we were able to access the job in job history with mapreduce V2 
> as well. Therefore this appears to be a race condition in the job management 
> wrt. job history for failed jobs.
> We do have the application master and node manager logs collected for this 
> scenario if that'll help isolate the problem and the fix better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to