[ 
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286551#comment-16286551
 ] 

Jason Lowe commented on TEZ-394:
--------------------------------

bq. I believe "max distance to root for any of a vertices descendants" is equal 
to "furthest distance from leaf".

It is not equivalent, otherwise there would be no semantic difference between 
v2 and v3 of the patch, since the former implements furthest distance from leaf 
and the latter implements max distance to root for any child vertex 
(recursively).  I added unit tests in the v3 patch that demonstrate the 
different scheduling between the two algorithms since it uses a DAG with 
disconnected sub-DAGs.

In the extended example, V2 will not be scheduled early because we're using max 
distance to root across all children (recursively).  V2 would be scheduled just 
before V5 since both have children with the same max depth (V8->V7 vs. V6->V7). 
 If we were using furthest distance to leaf it would be scheduled similarly, 
but disconnected DAGs would have suboptimal scheduling per the above discussion.

> Better scheduling for uneven DAGs
> ---------------------------------
>
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>         Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to