[jira] Commented: (HADOOP-248) locating map outputs via random probing is inefficient

Devaraj Das (JIRA) Wed, 17 Jan 2007 03:29:01 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465422
 ]


Devaraj Das commented on HADOOP-248:
------------------------------------

Propose the following:

1) Modify the "TaskCompletionEvent[] getTaskCompletionEvents(String jobid, int 
fromEventId)" defined in IntertrackerProtocol and JobSubmissionProtocol to have 
a new argument that will signify how many events we want to fetch. We may get a 
smaller number depending on how many events got registered for the job. 
So, it becomes: TaskCompletionEvent[] getTaskCompletionEvents(String jobid, int 
fromEventId, int maxEvents)
This will generally be more scalable. In the case of map-output-fetches, it 
helps in the way that we do the same thing as we do today (except that the 
randomness is not there and the TT exactly knows which maps finished).

2) Since the events IDs are numbered in a monotonically increasing sequence for 
a Job, we don't need to maintain timestamps (as the original comment on this 
bug suggests).

3) Add a "boolean isMapTask()" method to TaskCompletionEvent class that will 
return true if the event is from a map task, false otherwise.

Comments?

> locating map outputs via random probing is inefficient
> ------------------------------------------------------
>
>                 Key: HADOOP-248
>                 URL: https://issues.apache.org/jira/browse/HADOOP-248
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.2.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>
> Currently the ReduceTaskRunner polls the JobTracker for a random list of map 
> tasks asking for their output locations. It would be better if the JobTracker 
> kept an ordered log and the interface was changed to:
> class MapLocationResults {
>    public int getTimestamp();
>    public MapOutputLocation[] getLocations();
> }
> interface InterTrackerProtocol {
>   ...
>   MapLocationResults locateMapOutputs(int prevTimestamp);
> } 
> with the intention that each time a ReduceTaskRunner calls locateMapOutputs, 
> it passes back the "timestamp" that it got from the previous result. That 
> way, reduces can easily find the new MapOutputs. This should help the "ramp 
> up" when the maps first start finishing.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-248) locating map outputs via random probing is inefficient

Reply via email to