[jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase

2014-09-28 Thread Rajesh Veeranki (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151089#comment-14151089
 ] 

Rajesh Veeranki commented on MAPREDUCE-5903:


Hi Dapeng Sun,
Here's a snippet of my yarn-site.xml:
{code:title=yarn-site.xml|borderStyle=solid}
 property
nameyarn.nodemanager.aux-services/name
valuemapreduce_shuffle/value
  /property
property
nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
  /property
property
nameyarn.nodemanager.log-aggregation.compression-type/name
valuegz/value
  /property
property
nameyarn.nodemanager.health-checker.script.path/name
value/etc/hadoop/conf/health_check/value
  /property
property
nameyarn.nodemanager.container-executor.class/name

valueorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor/value
  /property
property
  nameyarn.nodemanager.linux-container-executor.group/name
  valueyarn/value
/property
 property
nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
  /property
{code}
Here's my container-executor.cfg
{code:title=container-executor.cfg}
yarn.nodemanager.local-dirs=/grid/hadoop/yarn/local
yarn.nodemanager.linux-container-executor.group=yarn
yarn.nodemanager.log-dirs=/var/log/hadoop/yarn
banned.users=hfds,bin,0
min.user.id=0
{code}

 If Kerberos Authentication is enabled, MapReduce job is failing on reducer 
 phase
 

 Key: MAPREDUCE-5903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
 Environment: hadoop: 2.4.0.2.1.2.0
Reporter: Victor Kim
Priority: Critical
  Labels: shuffle

 I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, 
 Kerberos is enabled, have hdfs, yarn, mapred principals\keytabs. 
 ResourceManager and NodeManager are ran under yarn user, using yarn Kerberos 
 principal. 
 Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one 
 having Kerberos principal on all boxes). Result: job successfully completed.
 Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. 
 Result: Map tasks are completed SUCCESSfully, Reduce task fails with 
 ShuffleError Caused by: java.io.IOException: Exceeded 
 MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
 The use case with user impersonation used to work on earlier versions, 
 without YARN (with JTTT).
 I found similar issue with Kerberos AUTH involved here: 
 https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
 And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as 
 resolved, which is not the case when Kerberos Authentication is enabled.
 The exception trace from YarnChild JVM:
 2014-05-21 12:49:35,687 FATAL [fetcher#3] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Shuffle failed 
 with too many fetch failures and insufficient progress!
 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#3
 at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; 
 bailing-out.
 at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
 at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
 at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
 at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files

2014-09-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6096:

Hadoop Flags:   (was: Reviewed)

 SummarizedJob class NPEs with some jhist files
 --

 Key: MAPREDUCE-6096
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: trunk
Reporter: zhangyubiao
  Labels: easyfix, patch
 Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096.patch


 When I Parse  the JobHistory in the HistoryFile,I use the Hadoop System's  
 map-reduce-client-core project 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser  class and 
 HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like 
 job_1408862281971_489761-1410883171851_XXX.jhist)  
 and it throw an Exception Just Like 
 Exception in thread pool-1-thread-1 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626)
   at 
 com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70)
 After I'm see the SummarizedJob class I  find that attempt.getTaskStatus() is 
 NULL , So I change the order of 
 attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString())  to 
 TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) 
 and it works well .
 So I wonder If we can change all  attempt.getTaskStatus()  after 
 TaskStatus.State.XXX.toString() ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files

2014-09-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6096:

Assignee: (was: Allen Wittenauer)

 SummarizedJob class NPEs with some jhist files
 --

 Key: MAPREDUCE-6096
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: trunk
Reporter: zhangyubiao
  Labels: easyfix, patch
 Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096.patch


 When I Parse  the JobHistory in the HistoryFile,I use the Hadoop System's  
 map-reduce-client-core project 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser  class and 
 HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like 
 job_1408862281971_489761-1410883171851_XXX.jhist)  
 and it throw an Exception Just Like 
 Exception in thread pool-1-thread-1 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626)
   at 
 com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70)
 After I'm see the SummarizedJob class I  find that attempt.getTaskStatus() is 
 NULL , So I change the order of 
 attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString())  to 
 TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) 
 and it works well .
 So I wonder If we can change all  attempt.getTaskStatus()  after 
 TaskStatus.State.XXX.toString() ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files

2014-09-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151203#comment-14151203
 ] 

Allen Wittenauer commented on MAPREDUCE-6096:
-

Your test code needs to generate some input and then try to read it.

Also:
* Don't set the reviewed flag.
* Don't assign this to anyone but yourself.

Thanks.

 SummarizedJob class NPEs with some jhist files
 --

 Key: MAPREDUCE-6096
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: trunk
Reporter: zhangyubiao
  Labels: easyfix, patch
 Attachments: MAPREDUCE-6096-v2.patch, MAPREDUCE-6096.patch


 When I Parse  the JobHistory in the HistoryFile,I use the Hadoop System's  
 map-reduce-client-core project 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser  class and 
 HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like 
 job_1408862281971_489761-1410883171851_XXX.jhist)  
 and it throw an Exception Just Like 
 Exception in thread pool-1-thread-1 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.init(HistoryViewer.java:626)
   at 
 com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70)
 After I'm see the SummarizedJob class I  find that attempt.getTaskStatus() is 
 NULL , So I change the order of 
 attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString())  to 
 TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) 
 and it works well .
 So I wonder If we can change all  attempt.getTaskStatus()  after 
 TaskStatus.State.XXX.toString() ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)