Rajesh Balamohan created HIVE-22269: ---------------------------------------
Summary: Missing stats in the operator with "hive.optimize.sort.dynamic.partition" (SortedDynPartitionOptimizer) misestimates reducer count Key: HIVE-22269 URL: https://issues.apache.org/jira/browse/HIVE-22269 Project: Hive Issue Type: Bug Components: Statistics Reporter: Rajesh Balamohan {{hive.optimize.sort.dynamic.partition=true}} introduces new stage to reduce number of writes in dynamic partitioning usecase. Earlier {{SortedDynPartitionOptimizer}} added this new operator via {{Optimizer.java}} and the stats for the newly added operator was populated via {{StatsRulesProcFactory$ReduceSinkStatsRule}}. However, with "HIVE-20703" this got changed. This is moved to {{TezCompiler}} for cost based decision. Though the operator gets added correctly, the stats for this does not get added (as it runs after runStatsAnnotation()). This causes reducer count to be mis-estimated in the query. {noformat} e.g For the following query, reducer_2 would be estimated as "2" instead of "1009". This causes huge delay in the runtime. explain from tpcds_xtext_1000.store_sales ss insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk where ss.ss_sold_date_sk is not null insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk where ss.ss_sold_date_sk is null distribute by ss.ss_item_sk ; {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)