[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2014-09-29 Thread Rajesh Veeranki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151707#comment-14151707
 ] 

Rajesh Veeranki commented on MAPREDUCE-5903:


Yeah, I am using FQDN.Its connecting to resource manager and maps are 
completed, but in reduce phase it just hangs forever.

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
> phase
> 
>
> Key: MAPREDUCE-5903
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: hadoop: 2.4.0.2.1.2.0
>Reporter: Victor Kim
>Priority: Critical
>  Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
> Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
> ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
> principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
> having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
> Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
> ShuffleError Caused by: java.io.IOException: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, 
> without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here: 
> https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
> resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
> with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#3
> at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
> bailing-out.
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2503) Job submission fails if mapreduce.cluster.local.dir is given URIs

2014-09-29 Thread Rajesh Veeranki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Veeranki updated MAPREDUCE-2503:
---
Summary: Job submission fails if mapreduce.cluster.local.dir is given URIs  
(was: Jub submission fails if mapreduce.cluster.local.dir is given URIs)

> Job submission fails if mapreduce.cluster.local.dir is given URIs
> -
>
> Key: MAPREDUCE-2503
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2503
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Eli Collins
>Assignee: Ke Zhu
>Priority: Minor
>
> Job submission (specifically TaskTracker#localizeJobJarFile) fails if 
> {{mapreduce.cluster.local.dir}} has a URI with a scheme (eg 
> file:///home/eli/hadoop-dirs1/mr1) vs just the path component. MR 
> configuration parameters should accept full URIs to be consistent with common 
> and HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2014-09-28 Thread Rajesh Veeranki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151089#comment-14151089
 ] 

Rajesh Veeranki commented on MAPREDUCE-5903:


Hi Dapeng Sun,
Here's a snippet of my yarn-site.xml:
{code:title=yarn-site.xml|borderStyle=solid}
 
yarn.nodemanager.aux-services
mapreduce_shuffle
  

yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
  

yarn.nodemanager.log-aggregation.compression-type
gz
  

yarn.nodemanager.health-checker.script.path
/etc/hadoop/conf/health_check
  

yarn.nodemanager.container-executor.class

org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
  

  yarn.nodemanager.linux-container-executor.group
  yarn

 
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
  
{code}
Here's my container-executor.cfg
{code:title=container-executor.cfg}
yarn.nodemanager.local-dirs=/grid/hadoop/yarn/local
yarn.nodemanager.linux-container-executor.group=yarn
yarn.nodemanager.log-dirs=/var/log/hadoop/yarn
banned.users=hfds,bin,0
min.user.id=0
{code}

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
> phase
> 
>
> Key: MAPREDUCE-5903
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: hadoop: 2.4.0.2.1.2.0
>Reporter: Victor Kim
>Priority: Critical
>  Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
> Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
> ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
> principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
> having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
> Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
> ShuffleError Caused by: java.io.IOException: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, 
> without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here: 
> https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
> resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
> with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#3
> at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
> bailing-out.
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2014-09-27 Thread Rajesh Veeranki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150796#comment-14150796
 ] 

Rajesh Veeranki commented on MAPREDUCE-5903:


Hi,
I seem to have encountered this bug.How i can see the exception trace of 
YarnChild?

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
> phase
> 
>
> Key: MAPREDUCE-5903
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: hadoop: 2.4.0.2.1.2.0
>Reporter: Victor Kim
>Priority: Critical
>  Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
> Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
> ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
> principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
> having Kerberos principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
> Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
> ShuffleError Caused by: java.io.IOException: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, 
> without YARN (with JT&TT).
> I found similar issue with Kerberos AUTH involved here: 
> https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
> resolved, which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
> with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#3
> at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
> bailing-out.
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
> at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)