[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14446 @davies This PR is created after analyzing the performance impact of #12352, which added the row level metrics and caused 15% performance regression. And I can verify the performance regression consistently by comparing performance of d4b94ea and 6f88006. But the problem is that I cannot reproduce the same performance regression consistently on trunk, the performance improvement after the fix on trunk varies a lot (sometimes 5%, sometimes 20%, sometimes not obvious). The phenomenon I observed is that when running the same benchmark code repeatedly in same spark shell for 100 times, the time it takes for each run doesn't converge. For example, if we run the below code for 100 times, ``` spark.read.parquet("/tmp/data4").filter($"nc.id" < 100).collect())) ``` I observed: 1. For the first run, it may take > 9000 ms 2. Then for the next few runs, it is much faster, around 4700ms 3. After that, the performance suddenly becomes worse. It may take around 8500 ms for each run. I guess the phenomenon has something to do with Java JIT and our codegen logic (Because of codegen, we are creating new class type for each run in spark-shell). As I cannot verify this improvement consistently, I am going to close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63154/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63154/consoleFull)** for PR 14446 at commit [`24988c8`](https://github.com/apache/spark/commit/24988c8612d2dac6a17471e4541ece815f7a6d74). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63154/consoleFull)** for PR 14446 at commit [`24988c8`](https://github.com/apache/spark/commit/24988c8612d2dac6a17471e4541ece815f7a6d74). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user davies commented on the issue: https://github.com/apache/spark/pull/14446 The changes looks good to me. Could you post the numbers of benchmark in PR description ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63101/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63101/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63101/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14446 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14446 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63097/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63097/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14446 **[Test build #63097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63097/consoleFull)** for PR 14446 at commit [`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org