[ https://issues.apache.org/jira/browse/TEZ-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018391#comment-15018391 ]
Bikas Saha commented on TEZ-2956: --------------------------------- This is another case which should be covered via TEZ-2943 where we overhaul the heuristics. However, this patch adds a test case which should continue to pass after TEZ-2943. So lets take this patch. On the patch - I dont think we should overrule minPartitions since this is a user defined config. This may break the assumptions in the overall DAG plan that users have submitted. Thoughts? {code} if(desiredTaskParallelism < minTaskParallelism) { - desiredTaskParallelism = minTaskParallelism; + desiredTaskParallelism = + (totalNumBipartiteSourceTasks == 0) ? 1: minTaskParallelism; }{code} > Handle auto-reduce parallelism when the totalNumBipartiteSourceTasks is 0 > ------------------------------------------------------------------------- > > Key: TEZ-2956 > URL: https://issues.apache.org/jira/browse/TEZ-2956 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Attachments: TEZ-2956.1.patch, TEZ-2956_DAG.png, With_Patch.png, > Without_Patch.png > > > In certain cases (e.g M --> R --> R), if the parent vertex has 0 tasks tez > currently does not modify the parallelism factor in downstream. > e.g > {noformat} > SELECT ss_store_sk, > ss_sold_date_sk, > ss_quantity, > ss_sales_price, > LEAD(ss_sales_price, 1) OVER(PARTITION BY ss_store_sk > ORDER BY ss_quantity) > FROM store_sales > WHERE ss_sold_date_sk IS NOT NULL > AND ss_quantity IS NOT NULL > AND ss_sales_price > 2857684 > AND ss_sales_price < 2857685 > AND ss_store_sk > 10234233423 > AND ss_store_sk < 20234234324 > ORDER BY ss_store_sk, > ss_sold_date_sk; > {noformat} > This would launch DAG "M1 (0) --> R2 (156) --> R3 (1)". However, R2 retains > the parallelism of 156 even though no output would be generated in M1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)