[ https://issues.apache.org/jira/browse/SPARK-12032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332721#comment-15332721 ]
Apache Spark commented on SPARK-12032: -------------------------------------- User 'flyson' has created a pull request for this issue: https://github.com/apache/spark/pull/10258 > Filter can't be pushed down to correct Join because of bad order of Join > ------------------------------------------------------------------------ > > Key: SPARK-12032 > URL: https://issues.apache.org/jira/browse/SPARK-12032 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Davies Liu > Assignee: Davies Liu > Priority: Critical > Fix For: 2.0.0 > > > For this query: > {code} > select d.d_year, count(*) cnt > FROM store_sales, date_dim d, customer c > WHERE ss_customer_sk = c.c_customer_sk AND c.c_first_shipto_date_sk = > d.d_date_sk > group by d.d_year > {code} > Current optimized plan is > {code} > == Optimized Logical Plan == > Aggregate [d_year#147], [d_year#147,(count(1),mode=Complete,isDistinct=false) > AS cnt#425L] > Project [d_year#147] > Join Inner, Some(((ss_customer_sk#283 = c_customer_sk#101) && > (c_first_shipto_date_sk#106 = d_date_sk#141))) > Project [d_date_sk#141,d_year#147,ss_customer_sk#283] > Join Inner, None > Project [ss_customer_sk#283] > Relation[] ParquetRelation[store_sales] > Project [d_date_sk#141,d_year#147] > Relation[] ParquetRelation[date_dim] > Project [c_customer_sk#101,c_first_shipto_date_sk#106] > Relation[] ParquetRelation[customer] > {code} > It will join store_sales and date_dim together without any condition, the > condition c.c_first_shipto_date_sk = d.d_date_sk is not pushed to it because > the bad order of joins. > The optimizer should re-order the joins, join date_dim after customer, then > it can pushed down the condition correctly. > The plan should be > {code} > Aggregate [d_year#147], [d_year#147,(count(1),mode=Complete,isDistinct=false) > AS cnt#425L] > Project [d_year#147] > Join Inner, Some((c_first_shipto_date_sk#106 = d_date_sk#141)) > Project [c_first_shipto_date_sk#106] > Join Inner, Some((ss_customer_sk#283 = c_customer_sk#101)) > Project [ss_customer_sk#283] > Relation[store_sales] > Project [c_first_shipto_date_sk#106,c_customer_sk#101] > Relation[customer] > Project [d_year#147,d_date_sk#141] > Relation[date_dim] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org