[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yjshen closed the pull request at: https://github.com/apache/spark/pull/8742 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yjshen commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-141873047 Thanks @yhuai, I'll close this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8823 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-141555821 Merging to master and branch 1.5. Thanks @yjshen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-14163 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-14166 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42675/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-141555402 [Test build #42675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42675/console) for PR 8823 at commit [`1f56d2e`](https://github.com/apache/spark/commit/1f56d2e626ed75d4ce96ffd70929243eed4c8143). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-141525663 [Test build #42675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42675/consoleFull) for PR 8823 at commit [`1f56d2e`](https://github.com/apache/spark/commit/1f56d2e626ed75d4ce96ffd70929243eed4c8143). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-141524234 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-141524170 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8823#issuecomment-141524014 The fix is good. I only added comments. Will merge it to both master and branch 1.5 once it passes jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-141523674 @yjshen I added the comments and create a new PR (https://github.com/apache/spark/pull/8823). Can you close this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/8823 [SPARK-10539][SQL]Project should not be pushed down through Intersect or Except #8742 Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation. JIRA: https://issues.apache.org/jira/browse/SPARK-10539 I added some comments based on the fix of https://github.com/apache/spark/pull/8742. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark fix_set_optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8823.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8823 commit a9aa64388bae6afffecb27c432d9cf3de77aeba0 Author: Yijie Shen Date: 2015-09-14T09:05:38Z fix set optimization by eliminate empty project push down commit f9a3b70c7f4c73c7f0572c31a90bb7f7b4698ef7 Author: Yijie Shen Date: 2015-09-14T16:30:38Z Project should not be pushed down through Intersect or Except commit 1f56d2e626ed75d4ce96ffd70929243eed4c8143 Author: Yin Huai Date: 2015-09-18T18:13:05Z Add comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-141297061 @yjshen The fix is good. Can you address comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8742#discussion_r39815622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -136,26 +136,12 @@ object SetOperationPushDown extends Rule[LogicalPlan] { Filter(condition, left), Filter(pushToRight(condition, rewrites), right)) -// Push down projection into intersect -case Project(projectList, i @ Intersect(left, right)) => - val rewrites = buildRewrites(i) - Intersect( -Project(projectList, left), -Project(projectList.map(pushToRight(_, rewrites)), right)) - // Push down filter into except case Filter(condition, e @ Except(left, right)) => val rewrites = buildRewrites(e) Except( Filter(condition, left), Filter(pushToRight(condition, rewrites), right)) - -// Push down projection into except -case Project(projectList, e @ Except(left, right)) => - val rewrites = buildRewrites(e) - Except( -Project(projectList, left), -Project(projectList.map(pushToRight(_, rewrites)), right)) --- End diff -- Can you add comments at here the reason that we cannot pushdown projections and why we can pushdown filters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-140935988 ping :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/8742#discussion_r39702841 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -907,4 +907,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { assert(row.getDouble(1) - row.getDouble(3) === 0.0 +- 0.001) } } + + test("SPARK-10539: Project should not be pushed down through Intersect or Except") { +val df1 = (1 to 100).map(Tuple1.apply).toDF("i") --- End diff -- nit: `sqlContext.range(100)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yjshen commented on a diff in the pull request: https://github.com/apache/spark/pull/8742#discussion_r39478700 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -136,26 +136,12 @@ object SetOperationPushDown extends Rule[LogicalPlan] { Filter(condition, left), Filter(pushToRight(condition, rewrites), right)) -// Push down projection into intersect -case Project(projectList, i @ Intersect(left, right)) => - val rewrites = buildRewrites(i) - Intersect( -Project(projectList, left), -Project(projectList.map(pushToRight(_, rewrites)), right)) - // Push down filter into except case Filter(condition, e @ Except(left, right)) => val rewrites = buildRewrites(e) Except( Filter(condition, left), Filter(pushToRight(condition, rewrites), right)) - -// Push down projection into except -case Project(projectList, e @ Except(left, right)) => - val rewrites = buildRewrites(e) - Except( -Project(projectList, left), -Project(projectList.map(pushToRight(_, rewrites)), right)) --- End diff -- @yhuai, thanks for your comment. I didn't consider non-deterministic filters' effect on push down when I was doing this, I will think about it and make comments soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8742#discussion_r39474132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -136,26 +136,12 @@ object SetOperationPushDown extends Rule[LogicalPlan] { Filter(condition, left), Filter(pushToRight(condition, rewrites), right)) -// Push down projection into intersect -case Project(projectList, i @ Intersect(left, right)) => - val rewrites = buildRewrites(i) - Intersect( -Project(projectList, left), -Project(projectList.map(pushToRight(_, rewrites)), right)) - // Push down filter into except case Filter(condition, e @ Except(left, right)) => val rewrites = buildRewrites(e) Except( Filter(condition, left), Filter(pushToRight(condition, rewrites), right)) - -// Push down projection into except -case Project(projectList, e @ Except(left, right)) => - val rewrites = buildRewrites(e) - Except( -Project(projectList, left), -Project(projectList.map(pushToRight(_, rewrites)), right)) --- End diff -- Can we add comments in this class to explain why we cannot pushdown projections? For filter pushdown, if the condition has non-deterministic expressions, it is not safe to pushdown filters for some cases. But, it will not be the case because of https://github.com/apache/spark/pull/7446. But, it is still good to think about if there is any case that filter pushdown is not safe. If we determine it is safe to do filter pushdown, let's add comments to explain the reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-140184801 cc @yhuai for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-140165971 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-140165974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42432/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-140165825 [Test build #42432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42432/console) for PR 8742 at commit [`ce6ed80`](https://github.com/apache/spark/commit/ce6ed80f1d3b7138664010a415a8501ea68dcd28). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8742#issuecomment-140136499 [Test build #42432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42432/consoleFull) for PR 8742 at commit [`ce6ed80`](https://github.com/apache/spark/commit/ce6ed80f1d3b7138664010a415a8501ea68dcd28). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org