[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069293#comment-14069293 ] Allen Wittenauer commented on MAPREDUCE-50: --- We should verify if this condition still exists in 2.x and if so, fix it. NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.3 Reporter: Vinod Kumar Vavilapalli Assignee: Ivan Mitic Labels: newbie -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993529#comment-13993529 ] Andrew Liles commented on MAPREDUCE-50: --- To help others who encounter this: This problem can occur where the myPath/topologyProgram cannot be executed not because the script does not exist, but because of an operating system restriction. In the case of a standard Ubuntu Server, this prohibits execution on the /run mount point. Simply solved by executing: sudo mount -o remount,exec /run See https://groups.google.com/a/cloudera.org/forum/#!msg/scm-users/mDlnpWSfXMI/39cDl2C7gSUJ Cloudera install should detect and warn/remedy this. But that's another ticket... NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.3 Reporter: Vinod Kumar Vavilapalli Assignee: Ivan Mitic -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646670#comment-13646670 ] Ivan Mitic commented on MAPREDUCE-50: - Hi Steve, Vinod, I've run into the similar problem to this one. In my case, JobTracker started failing jobs because the network topology resolution started failing for a single node in the cluster: {code} 2013-04-27 08:33:08,204 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:3205) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:550) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4214) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} What happens is that some input split blocks are located on the datanode with the same IP/hostname as the TT. As a side effect this results in many of the customer jobs to fail during initialization. NN on the other hand has a fallback logic that defaults to /default-rack, and this inconsistency actually makes this problem more severe :) {code} 2013-04-27 04:36:47,185 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The resolve call returned null! Using /default-rack for host [100.64.34.3] 2013-04-27 04:36:47,185 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/100.64.34.3:50010 {code} In terms of the fix, my proposal would be to add the same fallback logic to the JobTracker. In our case, we actually had a network topology script that worked fine for a year or so, and now started failing for a single node for a reason we cannot explain yet. Let me know what you think. I'll take up this Jira if you don't mind. NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.3 Reporter: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204667#comment-13204667 ] Steve Loughran commented on MAPREDUCE-50: - I've seen logs in emails that indicate this is still possible to arise in the 0.20.20x series. The problem is that the {{ScriptBasedMapping.RawScriptMapping.resolve()} method behaves as follows # return /default-rack if there is no script # return null if the script fails # {{CachedDNSToSwitchMapping.resolve()} (which invokes this inner resolve) sees the null pointer, doesn't add those entries to its cache, then builds a list of results for each supplied argument -and for unknown hosts returns null. # The JobTracker doesn't expect null. What to do? * The WARN could somehow be raised to a higher level so that it doesn't get lost to the user -or the error message changed to be more meaningful to users who don't want to delve into the source. * The JT (and its YARN successor) need a policy of handling null-ness. * Better preflight checking of the shell script Unit tests could verify how the Namenode, JT and RM handle null entries with a switch mapping that returns null always; trivial. NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204676#comment-13204676 ] Steve Loughran commented on MAPREDUCE-50: - Note that the {{DNSToSwitchMapping}} says nothing about returning null on any problem; the behaviour of the implementation is not consistent with the javadocs NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira