[ 
https://issues.apache.org/jira/browse/TEZ-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063760#comment-16063760
 ] 

Muhammad Samir Khan commented on TEZ-3772:
------------------------------------------

For example, when there is a single reducer and some of the mappers fall on 
slow machines, the reducer can benefit by starting early. I ran an example with 
modified wordcount, where approx 10% of the tokenizer tasks will randomly sleep 
for 2 minutes and one summation task. I set the small job threshold to 1. With 
slowstart at 1.0, the total runtime was 360 seconds and with slowstart at 0.0, 
the total runtime was 270 seconds.

For smaller jobs, we should be more aggressive with the slow start. I'll change 
the defaults to be 0.2 and 0.5 for min and max respectively.


> Allow slowstart for small vertices to be treated differently
> ------------------------------------------------------------
>
>                 Key: TEZ-3772
>                 URL: https://issues.apache.org/jira/browse/TEZ-3772
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Muhammad Samir Khan
>            Assignee: Muhammad Samir Khan
>         Attachments: tez-3772.001.patch
>
>
> If there are a small number of reduces (configurable), then having a 
> different threshold can benefit. Performance of jobs with a small number of 
> reduce tasks can benefit significantly. Yes, the job could specify slowstart 
> as 0.0 instead of the default, but that requires job owners to do something. 
> It would be better if the defaults did something more optimal for both large 
> and small jobs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to