[jira] [Commented] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits

Hadoop QA (JIRA) Wed, 17 Apr 2013 01:11:20 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633886#comment-13633886
 ]


Hadoop QA commented on MAPREDUCE-4443:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12579104/MAPREDUCE-4443-trunk-2.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3532//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3532//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3532//console

This message is automatically generated.
                
> MR AM and job history server should be resilient to jobs that exceed counter 
> limits 
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4443
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Rahul Jain
>            Assignee: Mayank Bansal
>              Labels: usability
>         Attachments: am_failed_counter_limits.txt, 
> MAPREDUCE-4443-trunk-1.patch, MAPREDUCE-4443-trunk-2.patch, 
> MAPREDUCE-4443-trunk-draft.patch
>
>
> We saw this problem migrating applications to MapReduceV2:
> Our applications use hadoop counters extensively (1000+ counters for certain 
> jobs). While this may not be one of recommended best practices in hadoop, the 
> real issue here is reliability of the framework when applications exceed 
> counter limits.
> The hadoop servers (yarn, history server) were originally brought up with 
> mapreduce.job.counters.max=1000 under core-site.xml
> We then ran map-reduce job under an application using its own job specific 
> overrides, with  mapreduce.job.counters.max=10000
> All the tasks for the job finished successfully; however the overall job 
> still failed due to AM encountering exceptions as:
> {code}
> 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks
> : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa
> dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
> counters: 1001 max=1000
>         at 
> org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58)     
>    at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65)
>         at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77)
>         at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94)
>         at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105)
>         at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202)
>         at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)   
>      at java.lang.Thread.run(Thread.java:662)
> 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 
> 17:31:43,503 INFO [Thread-1] org.apache.had
> {code}
> The overall job failed, and the job history wasn't accessible either at the 
> end of the job (didn't show up in job history server).
> We were able to workaround the issue by changing to higher limits in 
> core-site.xml and restarting yarn servers. However that forced us to increase 
> the counters global limit to be as high as possible use by any individual 
> application, which is hard to predict.
> The original job then succeeded with new global limits. 
> However, since we didn't restart the job history server, it was unable to 
> display job history page for the successful job altogether as it still hit 
> counter exceeded exception. Restart of job history server finally got the 
> application available under job history.
> I'll also attach AM logs to help debug the issue 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits

Reply via email to