[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-09-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464854#comment-13464854
 ] 

Hadoop QA commented on MAPREDUCE-4464:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12546872/MAPREDUCE-4464.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2886//console

This message is automatically generated.

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch, 
 MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-09-27 Thread Clint Heath (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464860#comment-13464860
 ] 

Clint Heath commented on MAPREDUCE-4464:


Thanks Harsh!  I look forward to contributing much more too

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Fix For: 1.2.0

 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch, 
 MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-09-26 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463833#comment-13463833
 ] 

Harsh J commented on MAPREDUCE-4464:


Hi Clint,

Sorry on the delay here!

I noticed that the line:

bq. String host = u.getHost();

Which is the one in question of carrying a null, is then used in the lookup as:

bq. ListMapOutputLocation loc = mapLocations.get(host);

Hence, I think the most ideal fix would be to throw an exception. Because, in 
the chunks later, we rely heavily on host:

{code}
  URI u = URI.create(event.getTaskTrackerHttp());
  String host = u.getHost();
  TaskAttemptID taskId = event.getTaskAttemptId();
  URL mapOutputLocation = new URL(event.getTaskTrackerHttp() + 
  /mapOutput?job= + taskId.getJobID() +
  map= + taskId + 
  reduce= + getPartition());
  ListMapOutputLocation loc = mapLocations.get(host);
  if (loc == null) {
loc = Collections.synchronizedList
  (new LinkedListMapOutputLocation());
mapLocations.put(host, loc);
   }
  loc.add(new MapOutputLocation(taskId, host, mapOutputLocation));
  numNewMaps ++;
{code}

As seen by its usage, if host itself is undeterminable, and is consistently 
null, we cannot really work with it, and throwing an IOException makes sense.

I'm currently running test-patch on your patch for branch-1, depending on whose 
results I'll commit it in or post some further comments.

MR2 may be similarly affected on the netty side but may be failing properly 
already, I haven't the time to verify at the moment (perhaps another JIRA). So 
I'll just focus on the MR1 side now.

Thanks for the patch!

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-09-26 Thread Clint Heath (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463842#comment-13463842
 ] 

Clint Heath commented on MAPREDUCE-4464:


Thanks Harsh!  I'll take a look at yarn and see if a similar situation is 
present there.

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-09-26 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463873#comment-13463873
 ] 

Harsh J commented on MAPREDUCE-4464:


Clint,

Thanks for looking at YARN (do file a new JIRA even if its just for 
investigation). You will need to look at the ShuffleHandler class downwards.

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-09-26 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463991#comment-13463991
 ] 

Harsh J commented on MAPREDUCE-4464:


From test-patch on branch-1:

{code}
[exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 220 new Findbugs 
(version 2.0.1-rc3) warnings.
{code}

The 220 new findbugs from version 2.0.1-rc3 is what we get when the findbugs 
target is run on the patch-less branch-1. Doesn't look like there are any 
existing test-cases to cover this from an initial look. Also, the method in 
which we'll be throwing this exception, already grants it via a throws 
IOException agreement.

I ran -Dtestcase=TestMR* and -Dtestcase=TestMap* to run some MR tests over 
branch-1 and they seem to pass with this applied.

+1 for committing. Just gonna run one job over a cluster instance with a 
hostname with an underscore before doing so, to make sure this is working 
reliably well.

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-08-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436522#comment-13436522
 ] 

Hadoop QA commented on MAPREDUCE-4464:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537745/MAPREDUCE-4464_new.patch
  against trunk revision .

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2747//console

This message is automatically generated.

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Assignee: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464_new.patch, MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-07-19 Thread Clint Heath (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418643#comment-13418643
 ] 

Clint Heath commented on MAPREDUCE-4464:


Sorry, I should have supplied the exception that we encountered when this issue 
happened.  As it turned out, the host names in the cluster all had illegal DNS 
characters in them (the underscore _), so when the getHost() call was made, 
null was returned and we saw the following.

Mappers get about 80% complete when the reducers all begin to throw the 
following exceptions and then die almost immediately...eventually the whole job 
dies:

{noformat}
2012-06-26 15:56:02,326 FATAL org.apache.hadoop.mapred.Task: 
attempt_201206251823_0004_r_36_1 GetMapEventsThread Ignoring exception : 
java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2835)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2756)

2012-06-26 15:56:02,356 FATAL org.apache.hadoop.mapred.Task: 
attempt_201206251823_0004_r_36_1 GetMapEventsThread Ignoring exception : 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: JvmValidate Failed. 
Ignoring request from task: attempt_201206251823_0004_r_36_1, with JvmId: 
jvm_201206251823_0004_r_-396118293
at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3468)
at 
org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:3731)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.getMapCompletionEvents(Unknown Source)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2798)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2756)

2012-06-26 15:56:02,361 FATAL org.apache.hadoop.mapred.Task: Failed to contact 
the tasktracker
org.apache.hadoop.ipc.RemoteException: java.io.IOException: JvmValidate Failed. 
Ignoring request from task: attempt_201206251823_0004_r_36_1, with JvmId: 
jvm_201206251823_0004_r_-396118293
at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3468)
at org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3714)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.fatalError(Unknown Source)
at org.apache.hadoop.mapred.Task.reportFatalError(Task.java:294)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2781)
{noformat}

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems 

[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-07-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418642#comment-13418642
 ] 

Karthik Kambatla commented on MAPREDUCE-4464:
-

Clint, thanks a lot for looking into this issue. 

*Minor comment: Would it be better to throw an IOException wrapped with your 
message, so that we can avoid the subsequent NullPointerException?

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()

2012-07-19 Thread Clint Heath (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418649#comment-13418649
 ] 

Clint Heath commented on MAPREDUCE-4464:


Karthik,

  I'm fine with that as long as it doesn't interrupt the overall flow and 
process of what's supposed to happen when a task fails.  In our case, every 
reduce task failed and therefore the entire job, but I can see a situation 
where only one TT machine had a bad hostname and therefore only a subset of 
reduce tasks would fail and the overall job may still complete.  I just want to 
make sure we are informative in the logs and that the tasks are allowed to be 
re-tried if applicable, etc.  I haven't thought through all the logic far 
enough yet to know the ramifications of throwing an IOE right there.  Harsh and 
I chatted about the same idea earlier, though.  I'll vet that out...

 Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
 -

 Key: MAPREDUCE-4464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.0.0
Reporter: Clint Heath
Priority: Minor
 Attachments: MAPREDUCE-4464.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 If DNS does not resolve hostnames properly, reduce tasks can fail with a very 
 misleading exception.
 as per my peer Ahmed's diagnosis:
 In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed 
 URI, and so host from:
 {code}
 String host = u.getHost();
 {code}
 is evaluated to null and the NullPointerException is thrown afterwards in the 
 ConcurrentHashMap.
 I have written a patch to check for a null hostname condition when getHost is 
 called in the getMapCompletionEvents method and print an intelligible warning 
 message rather than suppressing it until later when it becomes confusing and 
 misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira