[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151707#comment-14151707 ] Rajesh Veeranki commented on MAPREDUCE-5903: Yeah, I am using FQDN.Its connecting to resource manager and maps are completed, but in reduce phase it just hangs forever. > If Kerberos Authentication is enabled, MapReduce job is failing on reducer > phase > > > Key: MAPREDUCE-5903 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: hadoop: 2.4.0.2.1.2.0 >Reporter: Victor Kim >Priority: Critical > Labels: shuffle > > I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, > Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. > ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos > principal. > Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one > having Kerberos principal on all boxes). Result: job successfully completed. > Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. > Result: Map tasks are completed SUCCESSfully, Reduce task fails with > ShuffleError Caused by: java.io.IOException: Exceeded > MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). > The use case with user impersonation used to work on earlier versions, > without YARN (with JT&TT). > I found similar issue with Kerberos AUTH involved here: > https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ > And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as > resolved, which is not the case when Kerberos Authentication is enabled. > The exception trace from YarnChild JVM: > 2014-05-21 12:49:35,687 FATAL [fetcher#3] > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed > with too many fetch failures and insufficient progress! > 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#3 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; > bailing-out. > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2503) Job submission fails if mapreduce.cluster.local.dir is given URIs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Veeranki updated MAPREDUCE-2503: --- Summary: Job submission fails if mapreduce.cluster.local.dir is given URIs (was: Jub submission fails if mapreduce.cluster.local.dir is given URIs) > Job submission fails if mapreduce.cluster.local.dir is given URIs > - > > Key: MAPREDUCE-2503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2503 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Reporter: Eli Collins >Assignee: Ke Zhu >Priority: Minor > > Job submission (specifically TaskTracker#localizeJobJarFile) fails if > {{mapreduce.cluster.local.dir}} has a URI with a scheme (eg > file:///home/eli/hadoop-dirs1/mr1) vs just the path component. MR > configuration parameters should accept full URIs to be consistent with common > and HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151089#comment-14151089 ] Rajesh Veeranki commented on MAPREDUCE-5903: Hi Dapeng Sun, Here's a snippet of my yarn-site.xml: {code:title=yarn-site.xml|borderStyle=solid} yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.log-aggregation.compression-type gz yarn.nodemanager.health-checker.script.path /etc/hadoop/conf/health_check yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor yarn.nodemanager.linux-container-executor.group yarn yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler {code} Here's my container-executor.cfg {code:title=container-executor.cfg} yarn.nodemanager.local-dirs=/grid/hadoop/yarn/local yarn.nodemanager.linux-container-executor.group=yarn yarn.nodemanager.log-dirs=/var/log/hadoop/yarn banned.users=hfds,bin,0 min.user.id=0 {code} > If Kerberos Authentication is enabled, MapReduce job is failing on reducer > phase > > > Key: MAPREDUCE-5903 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: hadoop: 2.4.0.2.1.2.0 >Reporter: Victor Kim >Priority: Critical > Labels: shuffle > > I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, > Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. > ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos > principal. > Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one > having Kerberos principal on all boxes). Result: job successfully completed. > Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. > Result: Map tasks are completed SUCCESSfully, Reduce task fails with > ShuffleError Caused by: java.io.IOException: Exceeded > MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). > The use case with user impersonation used to work on earlier versions, > without YARN (with JT&TT). > I found similar issue with Kerberos AUTH involved here: > https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ > And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as > resolved, which is not the case when Kerberos Authentication is enabled. > The exception trace from YarnChild JVM: > 2014-05-21 12:49:35,687 FATAL [fetcher#3] > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed > with too many fetch failures and insufficient progress! > 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#3 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; > bailing-out. > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150796#comment-14150796 ] Rajesh Veeranki commented on MAPREDUCE-5903: Hi, I seem to have encountered this bug.How i can see the exception trace of YarnChild? > If Kerberos Authentication is enabled, MapReduce job is failing on reducer > phase > > > Key: MAPREDUCE-5903 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: hadoop: 2.4.0.2.1.2.0 >Reporter: Victor Kim >Priority: Critical > Labels: shuffle > > I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, > Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. > ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos > principal. > Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one > having Kerberos principal on all boxes). Result: job successfully completed. > Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. > Result: Map tasks are completed SUCCESSfully, Reduce task fails with > ShuffleError Caused by: java.io.IOException: Exceeded > MAX_FAILED_UNIQUE_FETCHES (see the stack trace below). > The use case with user impersonation used to work on earlier versions, > without YARN (with JT&TT). > I found similar issue with Kerberos AUTH involved here: > https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ > And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as > resolved, which is not the case when Kerberos Authentication is enabled. > The exception trace from YarnChild JVM: > 2014-05-21 12:49:35,687 FATAL [fetcher#3] > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed > with too many fetch failures and insufficient progress! > 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#3 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; > bailing-out. > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) -- This message was sent by Atlassian JIRA (v6.3.4#6332)