[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-07-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642801#comment-14642801
 ] 

Sangjin Lee commented on MAPREDUCE-6361:


The patch applies to 2.6.0 cleanly.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552599#comment-14552599
 ] 

Hudson commented on MAPREDUCE-6361:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552555#comment-14552555
 ] 

Hudson commented on MAPREDUCE-6361:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552443#comment-14552443
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552393#comment-14552393
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552210#comment-14552210
 ] 

Hudson commented on MAPREDUCE-6361:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/933/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552188#comment-14552188
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550997#comment-14550997
 ] 

Hudson commented on MAPREDUCE-6361:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #7866 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7866/])
Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 
8ca1dfeebb660741aa6e5b137cd1088815b614cf)
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542155#comment-14542155
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2142 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2142/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542110#comment-14542110
 ] 

Hudson commented on MAPREDUCE-6361:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #194 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/194/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541991#comment-14541991
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #184 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/184/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541971#comment-14541971
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #2124 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2124/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541793#comment-14541793
 ] 

Hudson commented on MAPREDUCE-6361:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #926 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/926/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541771#comment-14541771
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #195 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/195/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
* hadoop-mapreduce-project/CHANGES.txt


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540062#comment-14540062
 ] 

Hudson commented on MAPREDUCE-6361:
---

FAILURE: Integrated in Hadoop-trunk-Commit #7806 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7806/])
MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between 
copySucceeded() in one thread and copyFailed() in another thread on the same 
host. Contributed by Junping Du. (ozawa: rev 
f4e2b3cc0b1f4e49c306bc09a90495225bb2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java


> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540032#comment-14540032
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-6361:
---

+1, I confirmed that this is a bug about concurrency between threads of 
Fether#run as Junping mentioned. Committing this shortly.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539726#comment-14539726
 ] 

Hadoop QA commented on MAPREDUCE-6361:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 57s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 9  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests |   1m 36s | Tests passed in 
hadoop-mapreduce-client-core. |
| | |  38m 22s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732215/MAPREDUCE-6361-v1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8badd82 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/console |


This message was automatically generated.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539594#comment-14539594
 ] 

Junping Du commented on MAPREDUCE-6361:
---

There are basically two ways to fix the race condition here:
1. abstract following code into a synchronized method, so copySucceeded() would 
get blocked until copyFailed() finished.
{code}
scheduler.hostFailed(host.getHostName());
for(TaskAttemptID left: failedTasks) {
scheduler.copyFailed(left, host, true, false);
}
{code}
This sounds like more performance impact on shuffle as failure in fetching map 
output on one thread will block copySucceeded() for other threads with longer 
time.

2. Update copyFailed() to have assumption that hostFailures could be cleanup in 
the other thread. In case of that, adding back host to hostFailed as the first 
time host failed.

Prefer the 2nd option which sounds more lightweight. Will deliver a quick patch 
soon.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-11 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538071#comment-14538071
 ] 

Junping Du commented on MAPREDUCE-6361:
---

NPE get throw in copyFailed() in ShuffleSchedulerImpl.java:267:
{code}
"boolean hostFail = hostFailures.get(hostname).get() > getMaxHostFailures() ? 
true : false;"
{code} 
It means hostFailures doesn't include hostname that just failed, which is not 
expected because we call hostFailed() to put host into hostFailures before 
anytime to call copyFailed():
{code}
scheduler.hostFailed(host.getHostName());
for(TaskAttemptID left: failedTasks) {
  scheduler.copyFailed(left, host, true, false);
}
{code}
Although hostFailed() and copyFailed() are both synchronized method (so as 
copySucceeded()), it is still possible (like the only reason) to cause this NPE 
for the other thread calls copySucceeded() on the same host (for other map 
output) between we call hostFailed() and copyFailed() in this thread when 
taking care of one map output failure.
We need to fix this concurrent issue to get rid of NPE issue which failed map 
output copy directly without any retry.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)