[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069293#comment-14069293
 ] 

Allen Wittenauer commented on MAPREDUCE-50:
---

We should verify if this condition still exists in 2.x and if so, fix it.

 NPE in heartbeat when the configured topology script doesn't exist
 --

 Key: MAPREDUCE-50
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Vinod Kumar Vavilapalli
Assignee: Ivan Mitic
  Labels: newbie





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist

2014-05-11 Thread Andrew Liles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993529#comment-13993529
 ] 

Andrew Liles commented on MAPREDUCE-50:
---

To help others who encounter this:

This problem can occur where the myPath/topologyProgram cannot be executed 
not because the script does not exist, but because of an operating system 
restriction.  

In the case of a standard Ubuntu Server, this prohibits execution on the /run 
mount point.  Simply solved by executing:
sudo mount -o remount,exec /run

See 
https://groups.google.com/a/cloudera.org/forum/#!msg/scm-users/mDlnpWSfXMI/39cDl2C7gSUJ

Cloudera install should detect and warn/remedy this.  But that's another 
ticket...

 NPE in heartbeat when the configured topology script doesn't exist
 --

 Key: MAPREDUCE-50
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Vinod Kumar Vavilapalli
Assignee: Ivan Mitic





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist

2013-05-01 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646670#comment-13646670
 ] 

Ivan Mitic commented on MAPREDUCE-50:
-

Hi Steve, Vinod,

I've run into the similar problem to this one. In my case, JobTracker started 
failing jobs because the network topology resolution started failing for a 
single node in the cluster:
{code}
2013-04-27 08:33:08,204 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.lang.NullPointerException
at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:3205)
at 
org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:550)
at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4214)
at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{code}

What happens is that some input split blocks are located on the datanode with 
the same IP/hostname as the TT. As a side effect this results in many of the 
customer jobs to fail during initialization.

NN on the other hand has a fallback logic that defaults to /default-rack, and 
this inconsistency actually makes this problem more severe :)
{code}
2013-04-27 04:36:47,185 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The resolve call returned 
null! Using /default-rack for host [100.64.34.3]
2013-04-27 04:36:47,185 INFO org.apache.hadoop.net.NetworkTopology: Adding a 
new node: /default-rack/100.64.34.3:50010  
{code}

In terms of the fix, my proposal would be to add the same fallback logic to the 
JobTracker. In our case, we actually had a network topology script that worked 
fine for a year or so, and now started failing for a single node for a reason 
we cannot explain yet.

Let me know what you think. I'll take up this Jira if you don't mind.

 NPE in heartbeat when the configured topology script doesn't exist
 --

 Key: MAPREDUCE-50
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Vinod Kumar Vavilapalli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist

2012-02-09 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204667#comment-13204667
 ] 

Steve Loughran commented on MAPREDUCE-50:
-

I've seen logs in emails that indicate this is still possible to arise in the 
0.20.20x series.

The problem is that the {{ScriptBasedMapping.RawScriptMapping.resolve()} method 
behaves as follows
 # return /default-rack if there is no script
 # return null if the script fails
 # {{CachedDNSToSwitchMapping.resolve()} (which invokes this inner resolve) 
sees the null pointer, doesn't add those entries to its cache, then builds a 
list of results for each supplied argument -and for unknown hosts returns null.
 # The JobTracker doesn't expect null.

What to do?
 * The WARN could somehow be raised to a higher level so that it doesn't get 
lost to the user -or the error message changed to be more meaningful to users 
who don't want to delve into the source. 
 * The JT (and its YARN successor) need a policy of handling null-ness.
 * Better preflight checking of the shell script

Unit tests could verify how the Namenode, JT and RM handle null entries with a 
switch mapping that returns null always; trivial.

 NPE in heartbeat when the configured topology script doesn't exist
 --

 Key: MAPREDUCE-50
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist

2012-02-09 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204676#comment-13204676
 ] 

Steve Loughran commented on MAPREDUCE-50:
-

Note that the {{DNSToSwitchMapping}} says nothing about returning null on any 
problem; the behaviour of the implementation is not consistent with the javadocs

 NPE in heartbeat when the configured topology script doesn't exist
 --

 Key: MAPREDUCE-50
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira