Song Jun created SPARK-27280: -------------------------------- Summary: infer filters from Join's OR condition Key: SPARK-27280 URL: https://issues.apache.org/jira/browse/SPARK-27280 Project: Spark Issue Type: Improvement Components: Optimizer, SQL Affects Versions: 3.0.0 Reporter: Song Jun
In some case, We can infer filters from Join condition with OR expressions. for example, tpc-ds query 48: {code:java} select sum (ss_quantity) from store_sales, store, customer_demographics, customer_address, date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and ( ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Secondary' and ss_sales_price between 100.00 and 150.00 ) or ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = 'College' and ss_sales_price between 50.00 and 100.00 ) or ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = '2 yr Degree' and ss_sales_price between 150.00 and 200.00 ) ) and ( ( ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('AL', 'OH', 'MD') and ss_net_profit between 0 and 2000 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('VA', 'TX', 'IA') and ss_net_profit between 150 and 3000 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('RI', 'WI', 'KY') and ss_net_profit between 50 and 25000 ) ) ; {code} we can infer two filters from the join or condidtion: {code:java} for customer_demographics: cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr Degree’,’Secondary’,’Primary') for store_sales: (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 and 100.00 or ss_sales_price between 150.00 and 200.00) {code} then then we can push down the above two filters to filter customer_demographics/store_sales. A pr will be submit soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org