[ 
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated TEZ-394:
---------------------------
    Attachment: TEZ-394.003.patch

During an offline discussion with [~rohini] she pointed out that simply 
ordering vertices by max distance from a leaf will generate suboptimal 
scheduling for DAGs with disconnected groups of vertices.  For example, if we 
slightly extend the previous sample DAG to this:

V1->V3->V4->V5
V2->V5
V6->V7

The original patch will schedule V6 relatively late since it only has a 
distance of 1 to a leaf.  However V6 has no inputs, and we may get much poorer 
container reuse if all root vertices have similar containers (as is the case 
with Pig).

A better approach to handle disconnected vertex groups is to order vertices by 
the maximum path to root for a vertex's child vertices.  V2 in the above graph 
would normally have a depth of 0 in the tree since it's a root vertex.  However 
it's child vertex V5 is at depth 3 in the graph due to another, longer path to 
root via the other parent, V4, so we should demote V2's depth from 0 to 2 so 
it's priority indicates it should run closer to when V5 will be able to run.

I updated the patch to implement this new approach to reordering the vertices 
and added the above sample DAG layout as the test case.

> Better scheduling for uneven DAGs
> ---------------------------------
>
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>         Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to