[ 
https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656246#action_12656246
 ] 

devaraj edited comment on HADOOP-3136 at 12/12/08 8:54 PM:
---------------------------------------------------------------

I guess I am too late in commenting on this. But one thing that might be worth 
doing is to go and ask for a bunch of tasks as soon as the TT empties its queue 
of tasks (TaskTracker.TaskLauncher.tasksToLaunch is the queue). We batch the 
request for new tasks on the basis that the TT started running all the queued 
tasks and we should now backfill the queue. That way, the TT would never really 
be idle. This might be important for GridMix kind of applications where there 
are many small tasks...

      was (Author: devaraj):
    BTW I guess I am too late in commenting on this. But one thing that might 
be worth doing is to go and ask for a bunch of tasks as soon as the TT empties 
its queue of tasks (TaskTracker.TaskLauncher.tasksToLaunch is the queue). That 
way the status updates about running/completed tasks from the TT might be 
lagging behind in time due to the fixed heartbeat interval but the TT would not 
really be idle.
  
> Assign multiple tasks per TaskTracker heartbeat
> -----------------------------------------------
>
>                 Key: HADOOP-3136
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3136
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-3136_0_20080805.patch, 
> HADOOP-3136_1_20080809.patch, HADOOP-3136_2_20080911.patch, 
> HADOOP-3136_3_20081211.patch, HADOOP-3136_4_20081212.patch
>
>
> In today's logic of finding a new task, we assign only one task per heartbeat.
> We probably could give the tasktracker multiple tasks subject to the max 
> number of free slots it has - for maps we could assign it data local tasks. 
> We could probably run some logic to decide what to give it if we run out of 
> data local tasks (e.g., tasks from overloaded racks, tasks that have least 
> locality, etc.). In addition to maps, if it has reduce slots free, we could 
> give it reduce task(s) as well. Again for reduces we could probably run some 
> logic to give more tasks to nodes that are closer to nodes running most maps 
> (assuming data generated is proportional to the number of maps). For e.g., if 
> rack1 has 70% of the input splits, and we know that most maps are data/rack 
> local, we try to schedule ~70% of the reducers there.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to