[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195168179 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195168182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52885/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195167992 **[Test build #52885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52885/consoleFull)** for PR 11565 at commit [`bd35ee7`](https://github.com/apache/spark/commit/bd35ee7144a9b88281f189748fd044df7971dbf3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11565 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195167265 nvm, that's my job:) Thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195165526 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195165530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52883/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195165383 **[Test build #52883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52883/consoleFull)** for PR 11565 at commit [`6a59b42`](https://github.com/apache/spark/commit/6a59b4240eaf1c63c6ad2dcb3ce6876ef9f2d189). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195163951 Thank you for your detailed review! Actually, the problem was first found by you. I thought my PR can save your time, but it sounds like you still waste a lot of time on the review. Anyway, really thank you! @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195152972 LGTM, thanks for working on it! will merge after tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195149566 **[Test build #52885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52885/consoleFull)** for PR 11565 at commit [`bd35ee7`](https://github.com/apache/spark/commit/bd35ee7144a9b88281f189748fd044df7971dbf3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55782088 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -384,6 +381,9 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child +// Eliminate no-op Window +case w: Window if w.windowExpressions.isEmpty => w.child --- End diff -- sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55781475 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -384,6 +381,9 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child +// Eliminate no-op Window +case w: Window if w.windowExpressions.isEmpty => w.child --- End diff -- move this rule near the rule that filter out useless window expressions, which makes people eaiser to understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-195144043 **[Test build #52883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52883/consoleFull)** for PR 11565 at commit [`6a59b42`](https://github.com/apache/spark/commit/6a59b4240eaf1c63c6ad2dcb3ce6876ef9f2d189). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55776211 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -384,9 +378,21 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child +// Eliminate no-op Window +case w: Window if w.windowExpressions.isEmpty => w.child + // Can't prune the columns on LeafNode case p @ Project(_, l: LeafNode) => p +// Prune windowExpressions and child of Window +case p @ Project(_, w: Window) if (w.outputSet -- p.references).nonEmpty => + val newWindowExprs = w.windowExpressions.filter(p.references.contains) --- End diff -- Will do it.Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55776014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -384,9 +378,21 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child +// Eliminate no-op Window +case w: Window if w.windowExpressions.isEmpty => w.child + // Can't prune the columns on LeafNode case p @ Project(_, l: LeafNode) => p +// Prune windowExpressions and child of Window +case p @ Project(_, w: Window) if (w.outputSet -- p.references).nonEmpty => + val newWindowExprs = w.windowExpressions.filter(p.references.contains) --- End diff -- After rethink about it, seems we can still separate it into 2 rules. We can add a `def windowOutputSet = AttributeSet(windowExpressions.map(_.toAttribute))` to `Window` operator, and change the first case to `case p @ Project(_, w: Window) if (w.windowOutputSet -- p.references).nonEmpty =>` and filter out useless window expressions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194985436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52831/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194985432 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194984345 **[Test build #52831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52831/consoleFull)** for PR 11565 at commit [`fc96d84`](https://github.com/apache/spark/commit/fc96d84ac1ef6bec70e1af911c2cef0447f1514f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194941252 **[Test build #52831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52831/consoleFull)** for PR 11565 at commit [`fc96d84`](https://github.com/apache/spark/commit/fc96d84ac1ef6bec70e1af911c2cef0447f1514f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194937923 Will add more DSL for `Window` in a separate PR, which is doing predicate push down for `Window`. Thank you for your detailed review! @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55706168 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala --- @@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest { comparePlans(optimized1, analysis.EliminateSubqueryAliases(correctAnswer1)) } + test("Column pruning on Window with useless aggregate functions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.groupBy('a, 'c, 'd)('a, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).select('a, 'c) + +val correctAnswer = + input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze + +val optimized = Optimize.execute(originalQuery.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("Column pruning on Window with selected agg expressions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.select('a, 'b, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).where('window > 1).select('a, 'c) + +val correctAnswer = + input.select('a, 'b, 'c) +.window(WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window) :: Nil, + 'a :: Nil, 'b.asc :: Nil) +.select('a, 'c, 'window).select('a, 'c, 'window, 'window) +.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze + +val optimized = Optimize.execute(originalQuery.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("Column pruning on Window in select") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.select('a, 'b, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).select('a, 'c) + +val correctAnswer = + input.select('a, 'c).analyze --- End diff -- Yeah, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55706094 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala --- @@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest { comparePlans(optimized1, analysis.EliminateSubqueryAliases(correctAnswer1)) } + test("Column pruning on Window with useless aggregate functions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.groupBy('a, 'c, 'd)('a, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).select('a, 'c) + +val correctAnswer = + input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze + +val optimized = Optimize.execute(originalQuery.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("Column pruning on Window with selected agg expressions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.select('a, 'b, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).where('window > 1).select('a, 'c) + +val correctAnswer = + input.select('a, 'b, 'c) +.window(WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window) :: Nil, + 'a :: Nil, 'b.asc :: Nil) +.select('a, 'c, 'window).select('a, 'c, 'window, 'window) +.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze --- End diff -- Will do. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194791853 LGTM exception 2 small comments. It will be better if we can add more DSL to build window expression, but I'm ok to do it in follow-ups. cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55663422 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala --- @@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest { comparePlans(optimized1, analysis.EliminateSubqueryAliases(correctAnswer1)) } + test("Column pruning on Window with useless aggregate functions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.groupBy('a, 'c, 'd)('a, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).select('a, 'c) + +val correctAnswer = + input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze + +val optimized = Optimize.execute(originalQuery.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("Column pruning on Window with selected agg expressions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.select('a, 'b, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).where('window > 1).select('a, 'c) + +val correctAnswer = + input.select('a, 'b, 'c) +.window(WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window) :: Nil, + 'a :: Nil, 'b.asc :: Nil) +.select('a, 'c, 'window).select('a, 'c, 'window, 'window) +.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze + +val optimized = Optimize.execute(originalQuery.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("Column pruning on Window in select") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.select('a, 'b, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).select('a, 'c) + +val correctAnswer = + input.select('a, 'c).analyze --- End diff -- nit: they can fit in one line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55663370 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala --- @@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest { comparePlans(optimized1, analysis.EliminateSubqueryAliases(correctAnswer1)) } + test("Column pruning on Window with useless aggregate functions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.groupBy('a, 'c, 'd)('a, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).select('a, 'c) + +val correctAnswer = + input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze + +val optimized = Optimize.execute(originalQuery.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("Column pruning on Window with selected agg expressions") { +val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int) + +val originalQuery = + input.select('a, 'b, 'c, 'd, +WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window)).where('window > 1).select('a, 'c) + +val correctAnswer = + input.select('a, 'b, 'c) +.window(WindowExpression( + AggregateExpression(Count('b), Complete, isDistinct = false), + WindowSpecDefinition( 'a :: Nil, +SortOrder('b, Ascending) :: Nil, +UnspecifiedFrame)).as('window) :: Nil, + 'a :: Nil, 'b.asc :: Nil) +.select('a, 'c, 'window).select('a, 'c, 'window, 'window) +.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze --- End diff -- It's weird to see 3 selects here, we can add `CollapseProject` rule into the optimizer in this test suite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194748290 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52820/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194748285 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194747990 **[Test build #52820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52820/consoleFull)** for PR 11565 at commit [`44326f1`](https://github.com/apache/spark/commit/44326f11dab33d686885b7f038a1034c1723338e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194720152 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194720154 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52818/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194720120 **[Test build #52818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52818/consoleFull)** for PR 11565 at commit [`6cf6f44`](https://github.com/apache/spark/commit/6cf6f444697c0bc32dbc09906fe144563a7d66df). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11565#issuecomment-194718830 **[Test build #52820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52820/consoleFull)** for PR 11565 at commit [`44326f1`](https://github.com/apache/spark/commit/44326f11dab33d686885b7f038a1034c1723338e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55643000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -384,9 +374,25 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child +// Eliminate no-op Window +case w: Window if sameOutput(w.child.output, w.output) => w.child + +// Convert Aggregate to Project if no aggregate function exists +case a: Aggregate if !containsAggregates(a.expressions) => + Project(a.aggregateExpressions, a.child) --- End diff -- Anyway, I think this is a rare case. Will not do it in this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11565#discussion_r55642840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -384,9 +374,25 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child +// Eliminate no-op Window +case w: Window if sameOutput(w.child.output, w.output) => w.child + +// Convert Aggregate to Project if no aggregate function exists +case a: Aggregate if !containsAggregates(a.expressions) => + Project(a.aggregateExpressions, a.child) --- End diff -- It only makes sense when the grouping and aggregateExpressions are exactly identical. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org