[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12954 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217605162 LGTM, Merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217605013 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217605015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58047/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217604963 **[Test build #58047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58047/consoleFull)** for PR 12954 at commit [`e28bbb6`](https://github.com/apache/spark/commit/e28bbb6c90aef0a17caab5db8072327fcf93e59d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217600276 **[Test build #58047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58047/consoleFull)** for PR 12954 at commit [`e28bbb6`](https://github.com/apache/spark/commit/e28bbb6c90aef0a17caab5db8072327fcf93e59d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62409983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/BooleanSimplification.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.{And, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * Simplifies boolean expressions: + * 1. Simplifies expressions whose answer can be determined without evaluating both sides. + * 2. Eliminates / extracts common factors. + * 3. Merge same expressions + * 4. Removes `Not` operator. + */ +object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- import it into analyzer as temporary fix, we could think of a proper way later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62409870 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/BooleanSimplification.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.{And, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * Simplifies boolean expressions: + * 1. Simplifies expressions whose answer can be determined without evaluating both sides. + * 2. Eliminates / extracts common factors. + * 3. Merge same expressions + * 4. Removes `Not` operator. + */ +object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- Ok, what do you suggest then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62409566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/BooleanSimplification.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.{And, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, Not, Or, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * Simplifies boolean expressions: + * 1. Simplifies expressions whose answer can be determined without evaluating both sides. + * 2. Eliminates / extracts common factors. + * 3. Merge same expressions + * 4. Removes `Not` operator. + */ +object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { --- End diff -- Moving this one into analysis is even worse ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217580900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58031/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217580898 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217580745 **[Test build #58031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58031/consoleFull)** for PR 12954 at commit [`b487925`](https://github.com/apache/spark/commit/b487925dea798a2ae624242f3d9a18aa84f0b6fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217560774 **[Test build #58031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58031/consoleFull)** for PR 12954 at commit [`b487925`](https://github.com/apache/spark/commit/b487925dea798a2ae624242f3d9a18aa84f0b6fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62391466 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -26,6 +26,7 @@ import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} import org.apache.spark.sql.catalyst.encoders.OuterScopes import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification --- End diff -- I moved the rule in the latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62391407 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1645,3 +1646,30 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * Rewrite [[Filter]] plans that contain correlated [[ScalarSubquery]] expressions. When these + * correlated [[ScalarSubquery]] expressions are wrapped in a some Predicate expression, we rewrite + * them into [[PredicateSubquery]] expressions. + */ +object RewriteScalarSubqueriesInFilter extends Rule[LogicalPlan] { --- End diff -- The advantage of Semi Join is that a subquery can actually return multiple results for one row without causing correctness problems. My initial approach was to relax the rules for scalar subquery (allow disjunctive predicates) and prevent possible duplicates by using left semis. This didn't work because I was also pulling non-correlated predicates through the aggregate (which makes its results invalid). I am not sure which predicates you want to push through the left semi. Since all predicates that should be pushed down the left hand side are already in the predicate condition. But I might be missing something here. I have remove this in my last commit. I do feel that this might be a small improvement over the current situation. Let's revisit this after Spark 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62388521 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1645,3 +1646,30 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * Rewrite [[Filter]] plans that contain correlated [[ScalarSubquery]] expressions. When these + * correlated [[ScalarSubquery]] expressions are wrapped in a some Predicate expression, we rewrite + * them into [[PredicateSubquery]] expressions. + */ +object RewriteScalarSubqueriesInFilter extends Rule[LogicalPlan] { --- End diff -- With inner join, the predicate could be pushed down into subquery actually, but it was not with semi join. So it's not that obvious that which one is better. After looking it for a long time, I'm not convinced it's correct ( or the best way to do it). Could you keep it out of this PR for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62382220 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1645,3 +1646,30 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * Rewrite [[Filter]] plans that contain correlated [[ScalarSubquery]] expressions. When these + * correlated [[ScalarSubquery]] expressions are wrapped in a some Predicate expression, we rewrite + * them into [[PredicateSubquery]] expressions. + */ +object RewriteScalarSubqueriesInFilter extends Rule[LogicalPlan] { --- End diff -- I'll remove it if you want to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62382186 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1645,3 +1646,30 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * Rewrite [[Filter]] plans that contain correlated [[ScalarSubquery]] expressions. When these + * correlated [[ScalarSubquery]] expressions are wrapped in a some Predicate expression, we rewrite + * them into [[PredicateSubquery]] expressions. + */ +object RewriteScalarSubqueriesInFilter extends Rule[LogicalPlan] { --- End diff -- It is not nessecary for this fix. It is left over from the initial attempt. I do think it adds value, since it allows us to use queries which might return multiple values for one key. We do have a rule that promotes and outer join into an inner join, but we do not have one that turns a outer/inner join into a semi join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62377447 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1645,3 +1646,30 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * Rewrite [[Filter]] plans that contain correlated [[ScalarSubquery]] expressions. When these + * correlated [[ScalarSubquery]] expressions are wrapped in a some Predicate expression, we rewrite + * them into [[PredicateSubquery]] expressions. + */ +object RewriteScalarSubqueriesInFilter extends Rule[LogicalPlan] { --- End diff -- This rule is not necessary for this fix, right? Also, we have a rule to turn outer join into semi join if possible, does that one not work in case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217533079 @hvanhovell never mind, I made a mistake in the query --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217527040 I still can't run Q41 after this PR: ``` select distinct(i_product_name) from item i1 where i_manufact_id between 738 and 738+40 and (select count(*) as item_cnt from item where (i_manufact = i1.i_manufact and (i_category = 'Women' and (i_color = 'powder' or i_color = 'khaki') and (i_units = 'Ounce' or i_units = 'Oz') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Women' and (i_color = 'brown' or i_color = 'honeydew') and (i_units = 'Bunch' or i_units = 'Ton') and (i_size = 'N/A' or i_size = 'small') ) or (i_category = 'Men' and (i_color = 'floral' or i_color = 'deep') and (i_units = 'N/A' or i_units = 'Dozen') and (i_size = 'petite' or i_size = 'large') ) or (i_category = 'Men' and (i_color = 'light' or i_color = 'cornflower') and (i_units = 'Box' or i_units = 'Pound') and (i_size = 'medium' or i_size = 'extra large') )) or (i_manufact = i1.i_manufact and ((i_category = 'Women' and (i_color = 'midnight' or i_color = 'snow') and (i_units = 'Pallet' or i_units = 'Gross') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Women' and (i_color = 'cyan' or i_color = 'papaya') and (i_units = 'Cup' or i_units = 'Dram') and (i_size = 'N/A' or i_size = 'small') ) or (i_category = 'Men' and (i_color = 'orange' or i_color = 'frosted') and (i_units = 'Each' or i_units = 'Tbl') and (i_size = 'petite' or i_size = 'large') ) or (i_category = 'Men' and (i_color = 'forest' or i_color = 'ghost') and (i_units = 'Lb' or i_units = 'Bundle') and (i_size = 'medium' or i_size = 'extra large') > 0 order by i_product_name limit 100 Traceback (most recent call last): File "tpcds.py", line 117, in test(sys.argv[1], 2) File "tpcds.py", line 83, in test sqlContext.sql(q).explain(False) File "/Users/davies/work/spark/python/pyspark/sql/context.py", line 345, in sql return self.sparkSession.sql(sqlQuery) File "/Users/davies/work/spark/python/pyspark/sql/session.py", line 505, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "//anaconda/lib/python2.7/site-packages/py4j/java_gateway.py", line 538, in __call__ self.target_id, self.name) File "/Users/davies/work/spark/python/pyspark/sql/utils.py", line 63, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u'The correlated scalar subquery can only contain equality predicates: i_category#215#484 = Women) && (i_manufact#217#486 = i_manufact#217) && ((i_color#220#485 = powder) || (i_color#220#485 = khaki))) && ((i_units#221#488 = Ounce) || (i_units#221#488 = Oz))) && ((i_size#218#487 = medium) || (i_size#218#487 = extra large))) || i_color#220#485 = brown) || (i_color#220#485 = honeydew)) && ((i_units#221#488 = Bunch) || (i_units#221#488 = Ton))) && ((i_size#218#487 = N/A) || (i_size#218#487 = small) || ((i_category#215#484 = Men) && (i_color#220#485 = floral) || (i_color#220#485 = deep)) && ((i_units#221#488 = N/A) || (i_units#221#488 = Dozen))) && ((i_size#218#487 = petite) || (i_size#218#487 = large))) || i_color#220#485 = light) || (i_color#220#485 = cornflower)) && ((i_units#221#488 = Box) || (i_units#221#488 = Pound))) && ((i_size#218#487 = medium) || (i_size#218#487 = extra large)) || ((i_manufact#217#486 = i_manu fact#217) && (((i_category#215#484 = Women) && (i_color#220#485 = midnight) || (i_color#220#485 = snow)) && ((i_units#221#488 = Pallet) || (i_units#221#488 = Gross))) && ((i_size#218#487 = medium) || (i_size#218#487 = extra large))) || i_color#220#485 = cyan) || (i_color#220#485 = papaya)) && ((i_units#221#488 = Cup) || (i_units#221#488 = Dram))) && ((i_size#218#487 = N/A) || (i_size#218#487 = small) || ((i_category#215#484 = Men) && (i_color#220#485 = orange) || (i_color#220#485 = frosted)) && ((i_units#221#488 = Each) || (i_units#221#488 = Tbl))) && ((i_size#218#487 = petite) || (i_size#218#487 = large))) || i_color#220#485 = forest) || (i_color#220#485 = ghost)) && ((i_units#221#488 = Lb) || (i_units#221#488 = Bundle))) && ((i_size#218#487 = medium) || (i_size#218#487 = extra large;' ``` --- If your project is set up for it, you can reply to this email and have your reply appear
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62357864 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -26,6 +26,7 @@ import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} import org.apache.spark.sql.catalyst.encoders.OuterScopes import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification --- End diff -- I totally agree on that. I could move the rule from `optimizer` to `analysis`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12954#discussion_r62357413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -26,6 +26,7 @@ import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} import org.apache.spark.sql.catalyst.encoders.OuterScopes import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification --- End diff -- It weird to use optimizer rule in analyzer. We could have this workaround for now, could you create a JIRA for do a proper fix later? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217472088 **[Test build #57994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57994/consoleFull)** for PR 12954 at commit [`f0871c9`](https://github.com/apache/spark/commit/f0871c921285a05602cf566c9f2c23901224d73e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217472331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57994/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217472325 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217447455 **[Test build #57994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57994/consoleFull)** for PR 12954 at commit [`f0871c9`](https://github.com/apache/spark/commit/f0871c921285a05602cf566c9f2c23901224d73e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12954#issuecomment-217447302 cc @rxin @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/12954 [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize predicates before pulling them out ## What changes were proposed in this pull request? The official TPC-DS 41 query currently fails because it contains a scalar subquery with a disjunctive correlated predicate (the correlated predicates were nested in ORs). This makes the `Analyzer` pull out the entire predicate which is wrong and causes the following (correct) analysis exception: `The correlated scalar subquery can only contain equality predicates` This PR fixes this by first simplifing (or normalizing) the correlated predicates before pulling them out of the subquery. I have also added a small optimizer rule that rewrites correlated scalar subqueries into predicate subqueries if they are used in a `Filter` and are wrapped by a predicate. This is allows us to use semi joins instead of left outer joins. ## How was this patch tested? Manual testing on TPC-DS 41, and added a test to SubquerySuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-15122 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12954.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12954 commit f0871c921285a05602cf566c9f2c23901224d73e Author: Herman van HovellDate: 2016-05-06T13:39:43Z Fix TPC-DS 41 - normalize predicates before pulling them out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org