[
https://issues.apache.org/jira/browse/MAPREDUCE-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551584#comment-13551584
]
Joep Rottinghuis commented on MAPREDUCE-4939:
-
Thinking about it, we may want to consider not caching entries that are mapped
to unresolved.
Otherwise (faulty) inputSplit implementations can cause an infinite number of
bad entries to be cached (eventually they would map to
NetworkTopology.DEFAULT_RACK).
The downside is that if there in fact is a small set of incorrect hosts being
requested, the topology script would get called over and over again instead of
getting the cached NetworkTopology.DEFAULT_RACK.
Faulty getLocations input split passed on blindly to topology script
Key: MAPREDUCE-4939
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4939
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.205.0, 2.0.3-alpha
Reporter: Joep Rottinghuis
When users implement a (faulty) InputSplit that returns splits that do not
map to hostnames, they get passed on to the topology script.
Call stack on 0.20 equivalent code:
o.a.h.net.NetworkTopology.add
o.a.h.mapred.JobTracker.addHostToNodeMapping
o.a.h.mapred.JobTracker.resolveAndAddToTopology
o.a.h.mapred.JobInProgress.createCache
o.a.h.mapred.JobInProgress.initTasks
o.apache.hadoop.mapred.JobTracker.initJob
The CachedDNSToSwitchMapping wraps RawScriptBasedMapping (if a topology
script is configured).
The arguments (input splits) are simply passed to the topology script as
arguments without any further checks.
The input split could incorrectly return a comma separated list of hosts, or
in the worst case something like
{noformat}
topology fake-hostname; rm -rf *
{noformat}
In 2.0 something similar happens in ScriptBasedMapping.runResolveCommand.
That method should check the input values before executing the command.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira