[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

Vinod Kumar Vavilapalli (JIRA) Mon, 29 Sep 2014 11:48:10 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152049#comment-14152049
 ]


Vinod Kumar Vavilapalli commented on MAPREDUCE-5903:
----------------------------------------------------

bq. This is not going to work in case of user impersonation, when john is not 
presented on local box (e.g. he comes from ActiveDirectory), because the file 
from local FS cannot be read.
If you using secure mode (kerberos + LinuxContainerExecutor), you need to have 
user-accounts on individual machines - that is the only supported setup today. 
Running it without setting up user-accounts on all nodes will not work.

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
> phase
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5903
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>         Environment: hadoop: 2.4.0.2.1.2.0
>            Reporter: Victor Kim
>            Priority: Critical
>              Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
> Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
> ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
> principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
> having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
> Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
> ShuffleError Caused by: java.io.IOException: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, 
> without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here: 
> https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
> resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
> with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#3
>         at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
> bailing-out.
>         at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
>         at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
>         at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
>         at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

Reply via email to