[ https://issues.apache.org/jira/browse/HIVE-22269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated HIVE-22269: ------------------------------------ Status: Patch Available (was: Open) > Missing stats in the operator with "hive.optimize.sort.dynamic.partition" > (SortedDynPartitionOptimizer) misestimates reducer count > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-22269 > URL: https://issues.apache.org/jira/browse/HIVE-22269 > Project: Hive > Issue Type: Bug > Components: Statistics > Reporter: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-22269.1.patch > > > {{hive.optimize.sort.dynamic.partition=true}} introduces new stage to reduce > number of writes in dynamic partitioning usecase. Earlier > {{SortedDynPartitionOptimizer}} added this new operator via > {{Optimizer.java}} and the stats for the newly added operator was populated > via {{StatsRulesProcFactory$ReduceSinkStatsRule}}. > However, with "HIVE-20703" this got changed. This is moved to {{TezCompiler}} > for cost based decision. Though the operator gets added correctly, the stats > for this does not get added (as it runs after runStatsAnnotation()). > This causes reducer count to be mis-estimated in the query. > {noformat} > e.g For the following query, reducer_2 would be estimated as "2" instead of > "1009". This causes huge delay in the runtime. > explain > from tpcds_xtext_1000.store_sales ss > insert overwrite table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is not null > insert overwrite table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is null > distribute by ss.ss_item_sk > ; > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)