[
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146144#comment-14146144
]
Dapeng Sun commented on MAPREDUCE-5903:
---------------------------------------
I use user {{yarn}} to run {{ResourceManager}} and {{NodeManager}}, Here is my
{{container-executor.cfg}}
{noformat}
yarn.nodemanager.linux-container-executor.group=yarn
banned.users=bin
min.user.id=500
{noformat}
Please also check the following configuration in {{yarn-site.xml}}
{code:xml}
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>yarn</value>
</property>
{code}
> If Kerberos Authentication is enabled, MapReduce job is failing on reducer
> phase
> --------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5903
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.4.0
> Environment: hadoop: 2.4.0.2.1.2.0
> Reporter: Victor Kim
> Priority: Critical
> Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers,
> Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs.
> ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos
> principal.
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one
> having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI.
> Result: Map tasks are completed SUCCESSfully, Reduce task fails with
> ShuffleError Caused by: java.io.IOException: Exceeded
> MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions,
> without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here:
> https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as
> resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3]
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed
> with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child :
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
> shuffle in fetcher#3
> at
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> bailing-out.
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
> at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
> at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)