[jira] [Commented] (MAPREDUCE-4939) Faulty getLocations input split passed on blindly to topology script

2013-01-11 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551575#comment-13551575
 ] 

Joep Rottinghuis commented on MAPREDUCE-4939:
-

Filed under MAPREDUCE although the most obvious place to fix is under common in 
the place where the runResolveCommand is implemented.

 Faulty getLocations input split passed on blindly to topology script
 

 Key: MAPREDUCE-4939
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4939
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0, 2.0.3-alpha
Reporter: Joep Rottinghuis

 When users implement a (faulty) InputSplit that returns splits that do not 
 map to hostnames, they get passed on to the topology script.
 Call stack on 0.20 equivalent code:
 o.a.h.net.NetworkTopology.add
 o.a.h.mapred.JobTracker.addHostToNodeMapping
 o.a.h.mapred.JobTracker.resolveAndAddToTopology
 o.a.h.mapred.JobInProgress.createCache
 o.a.h.mapred.JobInProgress.initTasks
 o.apache.hadoop.mapred.JobTracker.initJob
 The CachedDNSToSwitchMapping wraps RawScriptBasedMapping (if a topology 
 script is configured).
 The arguments (input splits) are simply passed to the topology script as 
 arguments without any further checks.
 The input split could incorrectly return a comma separated list of hosts, or 
 in the worst case something like
 {noformat}
 topology fake-hostname; rm -rf *
 {noformat}
 In 2.0 something similar happens in ScriptBasedMapping.runResolveCommand.
 That method should check the input values before executing the command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4939) Faulty getLocations input split passed on blindly to topology script

2013-01-11 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551584#comment-13551584
 ] 

Joep Rottinghuis commented on MAPREDUCE-4939:
-

Thinking about it, we may want to consider not caching entries that are mapped 
to unresolved.
Otherwise (faulty) inputSplit implementations can cause an infinite number of 
bad entries to be cached (eventually they would map to 
NetworkTopology.DEFAULT_RACK).

The downside is that if there in fact is a small set of incorrect hosts being 
requested, the topology script would get called over and over again instead of 
getting the cached NetworkTopology.DEFAULT_RACK.

 Faulty getLocations input split passed on blindly to topology script
 

 Key: MAPREDUCE-4939
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4939
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0, 2.0.3-alpha
Reporter: Joep Rottinghuis

 When users implement a (faulty) InputSplit that returns splits that do not 
 map to hostnames, they get passed on to the topology script.
 Call stack on 0.20 equivalent code:
 o.a.h.net.NetworkTopology.add
 o.a.h.mapred.JobTracker.addHostToNodeMapping
 o.a.h.mapred.JobTracker.resolveAndAddToTopology
 o.a.h.mapred.JobInProgress.createCache
 o.a.h.mapred.JobInProgress.initTasks
 o.apache.hadoop.mapred.JobTracker.initJob
 The CachedDNSToSwitchMapping wraps RawScriptBasedMapping (if a topology 
 script is configured).
 The arguments (input splits) are simply passed to the topology script as 
 arguments without any further checks.
 The input split could incorrectly return a comma separated list of hosts, or 
 in the worst case something like
 {noformat}
 topology fake-hostname; rm -rf *
 {noformat}
 In 2.0 something similar happens in ScriptBasedMapping.runResolveCommand.
 That method should check the input values before executing the command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira