[ https://issues.apache.org/jira/browse/SPARK-27280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-27280: ------------------------------------ Assignee: Apache Spark > infer filters from Join's OR condition > -------------------------------------- > > Key: SPARK-27280 > URL: https://issues.apache.org/jira/browse/SPARK-27280 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL > Affects Versions: 3.0.0 > Reporter: Song Jun > Assignee: Apache Spark > Priority: Major > > In some case, We can infer filters from Join condition with OR expressions. > for example, tpc-ds query 48: > {code:java} > select sum (ss_quantity) > from store_sales, store, customer_demographics, customer_address, date_dim > where s_store_sk = ss_store_sk > and ss_sold_date_sk = d_date_sk and d_year = 2000 > and > ( > ( > cd_demo_sk = ss_cdemo_sk > and > cd_marital_status = 'S' > and > cd_education_status = 'Secondary' > and > ss_sales_price between 100.00 and 150.00 > ) > or > ( > cd_demo_sk = ss_cdemo_sk > and > cd_marital_status = 'M' > and > cd_education_status = 'College' > and > ss_sales_price between 50.00 and 100.00 > ) > or > ( > cd_demo_sk = ss_cdemo_sk > and > cd_marital_status = 'U' > and > cd_education_status = '2 yr Degree' > and > ss_sales_price between 150.00 and 200.00 > ) > ) > and > ( > ( > ss_addr_sk = ca_address_sk > and > ca_country = 'United States' > and > ca_state in ('AL', 'OH', 'MD') > and ss_net_profit between 0 and 2000 > ) > or > (ss_addr_sk = ca_address_sk > and > ca_country = 'United States' > and > ca_state in ('VA', 'TX', 'IA') > and ss_net_profit between 150 and 3000 > ) > or > (ss_addr_sk = ca_address_sk > and > ca_country = 'United States' > and > ca_state in ('RI', 'WI', 'KY') > and ss_net_profit between 50 and 25000 > ) > ) > ; > {code} > we can infer two filters from the join or condidtion: > {code:java} > for customer_demographics: > cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr > Degree’,’Secondary’,’Primary') > for store_sales: > (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 > and 100.00 or ss_sales_price between 150.00 and 200.00) > {code} > then then we can push down the above two filters to filter > customer_demographics/store_sales. > A pr will be submit soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org