[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6361:
--
Fix Version/s: 2.8.0

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.1, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-09-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6361:
---
Fix Version/s: 2.6.1

Pulled this into 2.6.1, after fixing a minor merge conflict in 
TestShuffleScheduler.

Ran compilation and TestShuffleScheduler before the push.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-07-17 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6361:
---
Labels: 2.6.1-candidate  (was: )

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6361:
--
Target Version/s: 2.7.1  (was: 2.8.0)
   Fix Version/s: (was: 2.8.0)
  2.7.1

Thanks [~ozawa] for review and commit the patch! Move the commit from 2.8 to 
2.7.1 as we need this fix asap.

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6361:
--
Attachment: MAPREDUCE-6361-v1.patch

Upload the patch with the 2nd solution proposed above with unit test.

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6361:
--
Status: Patch Available  (was: Open)

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated MAPREDUCE-6361:
--
Affects Version/s: 2.7.0

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-12 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated MAPREDUCE-6361:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed this to trunk and branch-2. Thanks [~djp] for your report and 
contribution!

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6361:
--
Priority: Critical  (was: Major)

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical

 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)