[ https://issues.apache.org/jira/browse/MAPREDUCE-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13551584#comment-13551584 ]
Joep Rottinghuis commented on MAPREDUCE-4939: --------------------------------------------- Thinking about it, we may want to consider not caching entries that are mapped to unresolved. Otherwise (faulty) inputSplit implementations can cause an infinite number of bad entries to be cached (eventually they would map to NetworkTopology.DEFAULT_RACK). The downside is that if there in fact is a small set of incorrect hosts being requested, the topology script would get called over and over again instead of getting the cached NetworkTopology.DEFAULT_RACK. > Faulty getLocations input split passed on blindly to topology script > -------------------------------------------------------------------- > > Key: MAPREDUCE-4939 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4939 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.205.0, 2.0.3-alpha > Reporter: Joep Rottinghuis > > When users implement a (faulty) InputSplit that returns splits that do not > map to hostnames, they get passed on to the topology script. > Call stack on 0.20 equivalent code: > o.a.h.net.NetworkTopology.add > o.a.h.mapred.JobTracker.addHostToNodeMapping > o.a.h.mapred.JobTracker.resolveAndAddToTopology > o.a.h.mapred.JobInProgress.createCache > o.a.h.mapred.JobInProgress.initTasks > o.apache.hadoop.mapred.JobTracker.initJob > The CachedDNSToSwitchMapping wraps RawScriptBasedMapping (if a topology > script is configured). > The arguments (input splits) are simply passed to the topology script as > arguments without any further checks. > The input split could incorrectly return a comma separated list of hosts, or > in the worst case something like > {noformat} > topology fake-hostname; rm -rf * > {noformat} > In 2.0 something similar happens in ScriptBasedMapping.runResolveCommand. > That method should check the input values before executing the command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira