[ https://issues.apache.org/jira/browse/SPARK-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036685#comment-15036685 ]
Min Qiu commented on SPARK-12085: --------------------------------- looks like the BooleanSimplification rule in Spark 1.5 provides a general way to rewrite the predicate expression. It should covers my cases. Will test the query on Spark 1.5. > The join condition hidden in DNF can't be pushed down to join operator > ----------------------------------------------------------------------- > > Key: SPARK-12085 > URL: https://issues.apache.org/jira/browse/SPARK-12085 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Min Qiu > > TPC-H Q19: > {quote} > SELECT sum(l_extendedprice * (1 - l_discount)) AS revenue FROM part join > lineitem > WHERE ({color: red}p_partkey = l_partkey {color} > AND p_brand = 'Brand#12' > AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') > AND l_quantity >= 1 AND l_quantity <= 1 + 10 > AND p_size BETWEEN 1 AND 5 > AND l_shipmode IN ('AIR', 'AIR REG') > AND l_shipinstruct = 'DELIVER IN PERSON') > OR ({color: red}p_partkey = l_partkey{color} > AND p_brand = 'Brand#23' > AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') > AND l_quantity >= 10 AND l_quantity <= 10 + 10 > AND p_size BETWEEN 1 AND 10 > AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') > OR ({color: red}p_partkey = l_partkey{color} > AND p_brand = 'Brand#34' > AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') > AND l_quantity >= 20 AND l_quantity <= 20 + 10 > AND p_size BETWEEN 1 AND 15 > AND l_shipmode IN ('AIR', 'AIR REG') AND l_shipinstruct = 'DELIVER IN > PERSON') > {quote} > The equality condition {color:red} p_partkey = l_partkey{color} matches the > join relations but it cannot be recogized by optimizer because it's hidden in > a disjunctive normal form. As a result the entire where clause will be in a > filter operator on top of the join operator where the join condition would be > "None" in the optimized plan. Finally the query planner will apply a > prohibitive expensive cartesian product on the physical plan which causes OOM > exception or very bad performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org