[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151089#comment-14151089 ] Rajesh Veeranki commented on MAPREDUCE-5903: Hi Dapeng Sun, Here's a snippet of my yarn-site.xml: {code:title=yarn-site.xml|borderStyle=solid} property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.nodemanager.log-aggregation.compression-type/name valuegz/value /property property nameyarn.nodemanager.health-checker.script.path/name value/etc/hadoop/conf/health_check/value /property property nameyarn.nodemanager.container-executor.class/name valueorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor/value /property property nameyarn.nodemanager.linux-container-executor.group/name valueyarn/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property {code} Here's my container-executor.cfg {code:title=container-executor.cfg} yarn.nodemanager.local-dirs=/grid/hadoop/yarn/local yarn.nodemanager.linux-container-executor.group=yarn yarn.nodemanager.log-dirs=/var/log/hadoop/yarn banned.users=hfds,bin,0 min.user.id=0 {code} If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase Key: MAPREDUCE-5903 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.4.0 Environment: hadoop: 2.4.0.2.1.2.0 Reporter: Victor Kim Priority: Critical Labels: shuffle I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos principal. Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one having Kerberos principal on all boxes). Result: job successfully completed. Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. Result: Map tasks are completed SUCCESSfully, Reduce task fails with ShuffleError Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). The use case with user impersonation used to work on earlier versions, without YARN (with JTTT). I found similar issue with Kerberos AUTH involved here: https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as resolved, which is not the case when Kerberos Authentication is enabled. The exception trace from YarnChild JVM: 2014-05-21 12:49:35,687 FATAL [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed with too many fetch failures and insufficient progress! 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6096: Hadoop Flags: (was: Reviewed) SummarizedJob class NPEs with some jhist files -- Key: MAPREDUCE-6096 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: trunk Reporter: zhangyubiao Labels: easyfix, patch Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096.patch When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's map-reduce-client-core project org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like job_1408862281971_489761-1410883171851_XXX.jhist) and it throw an Exception Just Like Exception in thread pool-1-thread-1 java.lang.NullPointerException at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626) at com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70) After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is NULL , So I change the order of attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) and it works well . So I wonder If we can change all attempt.getTaskStatus() after TaskStatus.State.XXX.toString() ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6096: Assignee: (was: Allen Wittenauer) SummarizedJob class NPEs with some jhist files -- Key: MAPREDUCE-6096 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: trunk Reporter: zhangyubiao Labels: easyfix, patch Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096.patch When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's map-reduce-client-core project org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like job_1408862281971_489761-1410883171851_XXX.jhist) and it throw an Exception Just Like Exception in thread pool-1-thread-1 java.lang.NullPointerException at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626) at com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70) After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is NULL , So I change the order of attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) and it works well . So I wonder If we can change all attempt.getTaskStatus() after TaskStatus.State.XXX.toString() ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151203#comment-14151203 ] Allen Wittenauer commented on MAPREDUCE-6096: - Your test code needs to generate some input and then try to read it. Also: * Don't set the reviewed flag. * Don't assign this to anyone but yourself. Thanks. SummarizedJob class NPEs with some jhist files -- Key: MAPREDUCE-6096 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: trunk Reporter: zhangyubiao Labels: easyfix, patch Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096.patch When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's map-reduce-client-core project org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like job_1408862281971_489761-1410883171851_XXX.jhist) and it throw an Exception Just Like Exception in thread pool-1-thread-1 java.lang.NullPointerException at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626) at com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70) After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is NULL , So I change the order of attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) and it works well . So I wonder If we can change all attempt.getTaskStatus() after TaskStatus.State.XXX.toString() ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)