locating map outputs via random probing is inefficient
------------------------------------------------------
Key: HADOOP-248
URL: http://issues.apache.org/jira/browse/HADOOP-248
Project: Hadoop
Type: Improvement
Components: mapred
Versions: 0.2.1
Reporter: Owen O'Malley
Assigned to: Owen O'Malley
Fix For: 0.3
Currently the ReduceTaskRunner polls the JobTracker for a random list of map
tasks asking for their output locations. It would be better if the JobTracker
kept an ordered log and the interface was changed to:
class MapLocationResults {
public int getTimestamp();
public MapOutputLocation[] getLocations();
}
interface InterTrackerProtocol {
...
MapLocationResults locateMapOutputs(int prevTimestamp);
}
with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it
passes back the "timestamp" that it got from the previous result. That way,
reduces can easily find the new MapOutputs. This should help the "ramp up" when
the maps first start finishing.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira