[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13551584#comment-13551584
 ] 

Joep Rottinghuis commented on MAPREDUCE-4939:
---------------------------------------------

Thinking about it, we may want to consider not caching entries that are mapped 
to unresolved.
Otherwise (faulty) inputSplit implementations can cause an infinite number of 
bad entries to be cached (eventually they would map to 
NetworkTopology.DEFAULT_RACK).

The downside is that if there in fact is a small set of incorrect hosts being 
requested, the topology script would get called over and over again instead of 
getting the cached NetworkTopology.DEFAULT_RACK.
                
> Faulty getLocations input split passed on blindly to topology script
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4939
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4939
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.205.0, 2.0.3-alpha
>            Reporter: Joep Rottinghuis
>
> When users implement a (faulty) InputSplit that returns splits that do not 
> map to hostnames, they get passed on to the topology script.
> Call stack on 0.20 equivalent code:
> o.a.h.net.NetworkTopology.add
> o.a.h.mapred.JobTracker.addHostToNodeMapping
> o.a.h.mapred.JobTracker.resolveAndAddToTopology
> o.a.h.mapred.JobInProgress.createCache
> o.a.h.mapred.JobInProgress.initTasks
> o.apache.hadoop.mapred.JobTracker.initJob
> The CachedDNSToSwitchMapping wraps RawScriptBasedMapping (if a topology 
> script is configured).
> The arguments (input splits) are simply passed to the topology script as 
> arguments without any further checks.
> The input split could incorrectly return a comma separated list of hosts, or 
> in the worst case something like
> {noformat}
> topology fake-hostname; rm -rf *
> {noformat}
> In 2.0 something similar happens in ScriptBasedMapping.runResolveCommand.
> That method should check the input values before executing the command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to