[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642801#comment-14642801 ] Sangjin Lee commented on MAPREDUCE-6361: The patch applies to 2.6.0 cleanly. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552599#comment-14552599 ] Hudson commented on MAPREDUCE-6361: --- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552555#comment-14552555 ] Hudson commented on MAPREDUCE-6361: --- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552443#comment-14552443 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/191/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552393#comment-14552393 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2131/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552210#comment-14552210 ] Hudson commented on MAPREDUCE-6361: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552188#comment-14552188 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550997#comment-14550997 ] Hudson commented on MAPREDUCE-6361: --- SUCCESS: Integrated in Hadoop-trunk-Commit #7866 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7866/]) Moving MAPREDUCE-6361 to 2.7.1 CHANGES.txt (junping_du: rev 8ca1dfeebb660741aa6e5b137cd1088815b614cf) * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542155#comment-14542155 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2142 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2142/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542110#comment-14542110 ] Hudson commented on MAPREDUCE-6361: --- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #194 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/194/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541991#comment-14541991 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #184 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/184/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541971#comment-14541971 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2124 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2124/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541793#comment-14541793 ] Hudson commented on MAPREDUCE-6361: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #926 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/926/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541771#comment-14541771 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #195 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/195/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java * hadoop-mapreduce-project/CHANGES.txt > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540062#comment-14540062 ] Hudson commented on MAPREDUCE-6361: --- FAILURE: Integrated in Hadoop-trunk-Commit #7806 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7806/]) MAPREDUCE-6361. NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host. Contributed by Junping Du. (ozawa: rev f4e2b3cc0b1f4e49c306bc09a90495225bb2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540032#comment-14540032 ] Tsuyoshi Ozawa commented on MAPREDUCE-6361: --- +1, I confirmed that this is a bug about concurrency between threads of Fether#run as Junping mentioned. Committing this shortly. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539726#comment-14539726 ] Hadoop QA commented on MAPREDUCE-6361: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 57s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 9 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 36s | Tests passed in hadoop-mapreduce-client-core. | | | | 38m 22s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12732215/MAPREDUCE-6361-v1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8badd82 | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5714/console | This message was automatically generated. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539594#comment-14539594 ] Junping Du commented on MAPREDUCE-6361: --- There are basically two ways to fix the race condition here: 1. abstract following code into a synchronized method, so copySucceeded() would get blocked until copyFailed() finished. {code} scheduler.hostFailed(host.getHostName()); for(TaskAttemptID left: failedTasks) { scheduler.copyFailed(left, host, true, false); } {code} This sounds like more performance impact on shuffle as failure in fetching map output on one thread will block copySucceeded() for other threads with longer time. 2. Update copyFailed() to have assumption that hostFailures could be cleanup in the other thread. In case of that, adding back host to hostFailed as the first time host failed. Prefer the 2nd option which sounds more lightweight. Will deliver a quick patch soon. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538071#comment-14538071 ] Junping Du commented on MAPREDUCE-6361: --- NPE get throw in copyFailed() in ShuffleSchedulerImpl.java:267: {code} "boolean hostFail = hostFailures.get(hostname).get() > getMaxHostFailures() ? true : false;" {code} It means hostFailures doesn't include hostname that just failed, which is not expected because we call hostFailed() to put host into hostFailures before anytime to call copyFailed(): {code} scheduler.hostFailed(host.getHostName()); for(TaskAttemptID left: failedTasks) { scheduler.copyFailed(left, host, true, false); } {code} Although hostFailed() and copyFailed() are both synchronized method (so as copySucceeded()), it is still possible (like the only reason) to cause this NPE for the other thread calls copySucceeded() on the same host (for other map output) between we call hostFailed() and copyFailed() in this thread when taking care of one map output failure. We need to fix this concurrent issue to get rid of NPE issue which failed map output copy directly without any retry. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)