[
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030718#comment-16030718
]
Rohini Palaniswamy commented on TEZ-394:
----------------------------------------
bq. Is the intent to make V6 low priority?
No. Intent is for V6 to have higher priority (same as V1).
bq. For that using current approach of distance from root or distance from
leaf, both would give V6 high priority
Distance from root would have both V1 and V6 at higher priority as the distance
from root is 0. Distance from leaf did not. V1's distance from leaf was 3,
while V6's distance from leaf was 1. So V1 had higher priority, while V6 had
priority similar to V4. Since that is not correct, Jason changed the logic.
> Better scheduling for uneven DAGs
> ---------------------------------
>
> Key: TEZ-394
> URL: https://issues.apache.org/jira/browse/TEZ-394
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rohini Palaniswamy
> Assignee: Jason Lowe
> Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
> Consider a series of joins or group by on dataset A with few datasets that
> takes 10 hours followed by a final join with a dataset X. The vertex that
> loads dataset X will be one of the top vertexes and initialized early even
> though its output is not consumed till the end after 10 hours.
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases
> where the nodes which executed the MapTask might have gone down when the
> final join happens.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)