[ 
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030365#comment-16030365
 ] 

Bikas Saha commented on TEZ-394:
--------------------------------

Not sure I understood this correctly.

bq.V1->V3->V4->V5
bq.V2->V5
bq.V6->V7

The V2 being lower priority seems to be similar to the original description 
issue of this jira. 

V6 -> V7 being disconnected from the other vertices makes sense. For that using 
current approach of distance from root or distance from leaf, both would give 
V6 high priority. Is the intent to make V6 low priority?

> Better scheduling for uneven DAGs
> ---------------------------------
>
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>         Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to