[ https://issues.apache.org/jira/browse/HIVE-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gunther Hagleitner resolved HIVE-8752. -------------------------------------- Resolution: Duplicate > Disjunction cardinality estimation has selectivity of 1 > ------------------------------------------------------- > > Key: HIVE-8752 > URL: https://issues.apache.org/jira/browse/HIVE-8752 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Laljo John Pullokkaran > Priority: Critical > Fix For: 0.14.0 > > > TPC-DS Q89 has the wrong join order. > Store_sales should be joining with item first then date_dim. > The issue is that the predicate on item shows a selectivity of 1 > {code} > ((i_category in ('Home','Books','Electronics') and > i_class in ('wallpaper','parenting','musical') > ) > or (i_category in ('Shoes','Jewelry','Men') and > i_class in ('womens','birdal','pants') > )) > {code} > {code} > HiveProjectRel(i_item_sk=[$0], i_brand=[$8], i_class=[$10], > i_category=[$12]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, > 0.0 io}, id = 4052 > HiveFilterRel(condition=[OR(AND(in($12, 'Home', > 'Books', 'Electronics'), in($10, 'wallpaper', 'parenting', 'musical')), > AND(in($12, 'Shoes', 'Jewelry', 'Men'), in($10, 'womens', 'birdal', > 'pants')))]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 > io}, id = 4050 > > HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = > 462000.0, cumulative cost = {0}, id = 3818 > {code} > Query > {code} > select * > from( > select i_category, i_class, i_brand, > s_store_name, s_company_name, > d_moy, > sum(ss_sales_price) sum_sales, > avg(sum(ss_sales_price)) over > (partition by i_category, i_brand, s_store_name, s_company_name) > avg_monthly_sales > from item, store_sales, date_dim, store > where store_sales.ss_item_sk = item.i_item_sk and > store_sales.ss_sold_date_sk = date_dim.d_date_sk and > store_sales.ss_store_sk = store.s_store_sk and > d_year in (2000) and > ((i_category in ('Home','Books','Electronics') and > i_class in ('wallpaper','parenting','musical') > ) > or (i_category in ('Shoes','Jewelry','Men') and > i_class in ('womens','birdal','pants') > )) > group by i_category, i_class, i_brand, > s_store_name, s_company_name, d_moy) tmp1 > where case when (avg_monthly_sales <> 0) then (abs(sum_sales - > avg_monthly_sales) / avg_monthly_sales) else null end > 0.1 > order by sum_sales - avg_monthly_sales, s_store_name > limit 100 > {code} > The result of the wrong join order is that the query runs in 335 seconds > compared to 124 seconds with the correct join order. > Removing the disjunction in the item filter produces the correct plan > {code} > i_category in ('Home','Books','Electronics') and > i_class in ('wallpaper','parenting','musical') > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)