[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Seems to me that the hash map metrics to join operator can be done in later PR. So this change can be small to review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77876/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77876/testReport)** for PR 18258 at commit [`ee3d88f`](https://github.com/apache/spark/commit/ee3d88f56e9032c0d40bc1c810cbece25a86807b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77872/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77872/testReport)** for PR 18258 at commit [`55cd6ad`](https://github.com/apache/spark/commit/55cd6ad71a6beda397d876426a6598bcfadc3470). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77876/testReport)** for PR 18258 at commit [`ee3d88f`](https://github.com/apache/spark/commit/ee3d88f56e9032c0d40bc1c810cbece25a86807b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77872/testReport)** for PR 18258 at commit [`55cd6ad`](https://github.com/apache/spark/commit/55cd6ad71a6beda397d876426a6598bcfadc3470). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Ok. I'll remove the flag. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77866/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)** for PR 18258 at commit [`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 If there is no regression, I'd remove the flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Sure. Three times for each. Track = F: Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 12657 / 12700 6.6 150.9 1.0X codegen = T hashmap = F, track = F6779 / 7582 12.4 80.8 1.9X codegen = T hashmap = T, track = F1505 / 1619 55.7 17.9 8.4X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 10085 / 10597 8.3 120.2 1.0X codegen = T hashmap = F, track = F5915 / 6069 14.2 70.5 1.7X codegen = T hashmap = T, track = F1610 / 1796 52.1 19.2 6.3X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 10275 / 10584 8.2 122.5 1.0X codegen = T hashmap = F, track = F6140 / 6557 13.7 73.2 1.7X codegen = T hashmap = T, track = F1301 / 1565 64.5 15.5 7.9X Track = T: Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 10723 / 10865 7.8 127.8 1.0X codegen = T hashmap = F, track = T6246 / 6432 13.4 74.5 1.7X codegen = T hashmap = T, track = T1465 / 1571 57.3 17.5 7.3X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 9964 / 10348 8.4 118.8 1.0X codegen = T hashmap = F, track = T6225 / 6375 13.5 74.2 1.6X codegen = T hashmap = T, track = T1361 / 1485 61.6 16.2 7.3X Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 10125 / 10674 8.3 120.7 1.0X codegen = T hashmap = F, track = T6865 / 6980 12.2 81.8 1.5X codegen = T hashmap = T, track = T1491 / 1579 56.3 17.8 6.8X --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you run it a few more times to tell? Right now it's a difference of 7% almost --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Is it significant? Seems to me that it's in the variance of different runs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 16.8 vs 15.8? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 I just ran the existing `AggregateBenchmark` with the new tracking config: Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = F 10655 / 11043 7.9 127.0 1.0X codegen = T hashmap = F, track = F 6923 / 7133 12.1 82.5 1.5X codegen = T hashmap = T, track = F 1325 / 1511 63.3 15.8 8.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz Aggregate w keys:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F, track = T 10809 / 11007 7.8 128.9 1.0X codegen = T hashmap = F, track = T6581 / 6629 12.7 78.4 1.6X codegen = T hashmap = T, track = T1411 / 1552 59.4 16.8 7.7X Looks like no obvious perf degradation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 Sure. Will update later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you test the perf degradation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18258 The `enablePerfMetrics` parameter of `UnsafeFixedWidthAggregationMap` has this comment: * @param enablePerfMetrics if true, performance metrics will be recorded (has minor perf impact) It's true those metrics are simple counter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Why would the tracking have perf impact? It's just a simple counter increase isn't it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18258 **[Test build #77866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)** for PR 18258 at commit [`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org