[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user hvanhovell closed the pull request at: https://github.com/apache/spark/pull/8298 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-134620114 @adrian-wang the improvement is absolutely tiny, about 2-3% if you do a lot of ```min```'s of ```max```'es. This PR was a response to misdiagnosed performance regression, the real cause was the use of a map in key-less aggregation. The PR adds some value, we could add it to 1.6. @yhuai / @adrian-wang if you feel differently about this, I'll withdraw the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-134054452 I doubt that whether your decision tree is better. But GreaterThan/LessThan should be a little better than Least/Greatest, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/8298#discussion_r37721828 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -236,15 +234,13 @@ case class Min(child: Expression) extends AlgebraicAggregate { ) override val updateExpressions = Seq( -/* min = */ If(IsNull(child), min, If(IsNull(min), child, Least(Seq(min, child +/* min = */ If(IsNull(min), child, If(GreaterThan(min, child), child, min)) --- End diff -- Nit: redundant space. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132792939 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41273/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132792936 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132792779 [Test build #41273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41273/console) for PR 8298 at commit [`3423912`](https://github.com/apache/spark/commit/3423912c7bf5758d1b4bc5c3add4071c24f01ef4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132752492 [Test build #41273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41273/consoleFull) for PR 8298 at commit [`3423912`](https://github.com/apache/spark/commit/3423912c7bf5758d1b4bc5c3add4071c24f01ef4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132751887 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132751863 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132453945 [Test build #41203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41203/console) for PR 8298 at commit [`2fed4dc`](https://github.com/apache/spark/commit/2fed4dcdddbb22d676a78795a3778c6f02a229ab). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132454008 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132454009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41203/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132437956 [Test build #41203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41203/consoleFull) for PR 8298 at commit [`2fed4dc`](https://github.com/apache/spark/commit/2fed4dcdddbb22d676a78795a3778c6f02a229ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132437659 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132437669 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132437613 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-132426939 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/8298 [SPARK-10100] [SQL] Perfomance improvements to new MIN/MAX aggregate functions. The new MIN/MAX suffer from a performance regression. This PR aims to fix this by simplifying the evaluation of the MIN/MAX functions. See the JIRA [ticket](https://issues.apache.org/jira/browse/SPARK-10100) for more information. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-10100 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8298 commit 2fed4dcdddbb22d676a78795a3778c6f02a229ab Author: Herman van Hovell Date: 2015-08-19T02:36:32Z Performance tweaks to the Min/Max functions: removed a branch in their evaluation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org