[jira] [Commented] (SPARK-12085) The join condition hidden in DNF can't be pushed down to join operator
[ https://issues.apache.org/jira/browse/SPARK-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036685#comment-15036685 ] Min Qiu commented on SPARK-12085: - looks like the BooleanSimplification rule in Spark 1.5 provides a general way to rewrite the predicate expression. It should covers my cases. Will test the query on Spark 1.5. > The join condition hidden in DNF can't be pushed down to join operator > --- > > Key: SPARK-12085 > URL: https://issues.apache.org/jira/browse/SPARK-12085 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Min Qiu > > TPC-H Q19: > {quote} > SELECT sum(l_extendedprice * (1 - l_discount)) AS revenue FROM part join > lineitem > WHERE ({color: red}p_partkey = l_partkey {color} >AND p_brand = 'Brand#12' >AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') >AND l_quantity >= 1 AND l_quantity <= 1 + 10 >AND p_size BETWEEN 1 AND 5 >AND l_shipmode IN ('AIR', 'AIR REG') >AND l_shipinstruct = 'DELIVER IN PERSON') >OR ({color: red}p_partkey = l_partkey{color} >AND p_brand = 'Brand#23' >AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') >AND l_quantity >= 10 AND l_quantity <= 10 + 10 >AND p_size BETWEEN 1 AND 10 >AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') >OR ({color: red}p_partkey = l_partkey{color} >AND p_brand = 'Brand#34' >AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') >AND l_quantity >= 20 AND l_quantity <= 20 + 10 >AND p_size BETWEEN 1 AND 15 >AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') > {quote} > The equality condition {color:red} p_partkey = l_partkey{color} matches the > join relations but it cannot be recogized by optimizer because it's hidden in > a disjunctive normal form. As a result the entire where clause will be in a > filter operator on top of the join operator where the join condition would be > "None" in the optimized plan. Finally the query planner will apply a > prohibitive expensive cartesian product on the physical plan which causes OOM > exception or very bad performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12085) The join condition hidden in DNF can't be pushed down to join operator
[ https://issues.apache.org/jira/browse/SPARK-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035146#comment-15035146 ] Apache Spark commented on SPARK-12085: -- User 'flyson' has created a pull request for this issue: https://github.com/apache/spark/pull/10087 > The join condition hidden in DNF can't be pushed down to join operator > --- > > Key: SPARK-12085 > URL: https://issues.apache.org/jira/browse/SPARK-12085 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Min Qiu > > TPC-H Q19: > {quote} > SELECT sum(l_extendedprice * (1 - l_discount)) AS revenue FROM part join > lineitem > WHERE ({color: red}p_partkey = l_partkey {color} >AND p_brand = 'Brand#12' >AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') >AND l_quantity >= 1 AND l_quantity <= 1 + 10 >AND p_size BETWEEN 1 AND 5 >AND l_shipmode IN ('AIR', 'AIR REG') >AND l_shipinstruct = 'DELIVER IN PERSON') >OR ({color: red}p_partkey = l_partkey{color} >AND p_brand = 'Brand#23' >AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') >AND l_quantity >= 10 AND l_quantity <= 10 + 10 >AND p_size BETWEEN 1 AND 10 >AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') >OR ({color: red}p_partkey = l_partkey{color} >AND p_brand = 'Brand#34' >AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') >AND l_quantity >= 20 AND l_quantity <= 20 + 10 >AND p_size BETWEEN 1 AND 15 >AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') > {quote} > The equality condition {color:red} p_partkey = l_partkey{color} matches the > join relations but it cannot be recogized by optimizer because it's hidden in > a disjunctive normal form. As a result the entire where clause will be in a > filter operator on top of the join operator where the join condition would be > "None" in the optimized plan. Finally the query planner will apply a > prohibitive expensive cartesian product on the physical plan which causes OOM > exception or very bad performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12085) The join condition hidden in DNF can't be pushed down to join operator
[ https://issues.apache.org/jira/browse/SPARK-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035147#comment-15035147 ] Min Qiu commented on SPARK-12085: - just submitted a [pull request #10087|https://github.com/apache/spark/pull/10087] > The join condition hidden in DNF can't be pushed down to join operator > --- > > Key: SPARK-12085 > URL: https://issues.apache.org/jira/browse/SPARK-12085 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Min Qiu > > TPC-H Q19: > {quote} > SELECT sum(l_extendedprice * (1 - l_discount)) AS revenue FROM part join > lineitem > WHERE ({color: red}p_partkey = l_partkey {color} >AND p_brand = 'Brand#12' >AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') >AND l_quantity >= 1 AND l_quantity <= 1 + 10 >AND p_size BETWEEN 1 AND 5 >AND l_shipmode IN ('AIR', 'AIR REG') >AND l_shipinstruct = 'DELIVER IN PERSON') >OR ({color: red}p_partkey = l_partkey{color} >AND p_brand = 'Brand#23' >AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') >AND l_quantity >= 10 AND l_quantity <= 10 + 10 >AND p_size BETWEEN 1 AND 10 >AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') >OR ({color: red}p_partkey = l_partkey{color} >AND p_brand = 'Brand#34' >AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') >AND l_quantity >= 20 AND l_quantity <= 20 + 10 >AND p_size BETWEEN 1 AND 15 >AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') > {quote} > The equality condition {color:red} p_partkey = l_partkey{color} matches the > join relations but it cannot be recogized by optimizer because it's hidden in > a disjunctive normal form. As a result the entire where clause will be in a > filter operator on top of the join operator where the join condition would be > "None" in the optimized plan. Finally the query planner will apply a > prohibitive expensive cartesian product on the physical plan which causes OOM > exception or very bad performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org