[ 
https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700550#action_12700550
 ] 

Devaraj Das edited comment on HADOOP-5632 at 4/18/09 11:40 PM:
---------------------------------------------------------------

If we go the route of lightweight/heavyweight heartbeat, I'd suggest that we 
explicitly call those out as separate RPCs. Tasktrackers makes certain 
assumptions about a successful heartbeat, and since tasktrackers always sends a 
regular (heavyweight) heartbeat, there is a problem to do with status reporting 
for KILLED/FAILED tasks. Assume, at a certain TaskTracker node, some task(s) 
fails just before sending the heartbeat. The tasktracker sends the status of 
those tasks, and the JobTracker processes this heartbeat as a lightweight one 
(thereby doesn't do the processing of status updates). The tasktracker removes 
these from the runningTasks map after getting the heartbeat response, and won't 
report the statuses of those tasks again. The JobTracker will be unaware of 
such task failures..

Also, maybe, we should process the failed/killed tasks' statuses in the 
lightweight heartbeat as well. The logic being failed/killed tasks should be 
given the same treatment as virgin tasks. It actually makes sense to give 
higher priority to failed tasks during task assignment since if there is a 
deterministic failure on every attempt, the job would fail fast (after a 
certain number of attempts of the same task), leading to better cluster 
utilization..

      was (Author: devaraj):
    If we go the route of lightweight/heavyweight heartbeat, I'd suggest that 
we explicitly call those out as separate RPCs. Tasktrackers makes certain 
assumptions about a successful heartbeat, and since tasktrackers always sends a 
regular (heavyweight) heartbeat, there is a problem to do with status reporting 
for KILLED/FAILED tasks. Assume, at a certain TaskTracker node, some task(s) 
fails just before sending the heartbeat. The tasktracker sends the status of 
those tasks. The tasktracker removes these from the runningTasks map after 
getting the heartbeat response, and won't report the statuses of those tasks 
again. The JobTracker will be unaware of such task failures..

Also, maybe, we should process the failed/killed tasks' statuses in the 
lightweight heartbeat as well. The logic being failed/killed tasks should be 
given the same treatment as virgin tasks. It actually makes sense to give 
higher priority to failed tasks during task assignment since if there is a 
deterministic failure on every attempt, the job would fail fast (after a 
certain number of attempts of the same task), leading to better cluster 
utilization..
  
> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
>                 Key: HADOOP-5632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5632
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux 
> boxes, 100 node cluster
>            Reporter: Khaled Elmeleegy
>         Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, 
> hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf, jobtracker.patch, 
> jobtracker20.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even 
> under heavy load.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to