[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Seems to me that the hash map metrics to join operator can be done in later 
PR. So this change can be small to review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77876/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77876/testReport)**
 for PR 18258 at commit 
[`ee3d88f`](https://github.com/apache/spark/commit/ee3d88f56e9032c0d40bc1c810cbece25a86807b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77872/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77872/testReport)**
 for PR 18258 at commit 
[`55cd6ad`](https://github.com/apache/spark/commit/55cd6ad71a6beda397d876426a6598bcfadc3470).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77876/testReport)**
 for PR 18258 at commit 
[`ee3d88f`](https://github.com/apache/spark/commit/ee3d88f56e9032c0d40bc1c810cbece25a86807b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77872/testReport)**
 for PR 18258 at commit 
[`55cd6ad`](https://github.com/apache/spark/commit/55cd6ad71a6beda397d876426a6598bcfadc3470).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Ok. I'll remove the flag. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77866/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18258
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)**
 for PR 18258 at commit 
[`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
If there is no regression, I'd remove the flag.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Sure. Three times for each.

Track = F:

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F  12657 / 12700  6.6  
   150.9   1.0X
codegen = T hashmap = F, track = F6779 / 7582 12.4  
80.8   1.9X
codegen = T hashmap = T, track = F1505 / 1619 55.7  
17.9   8.4X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F  10085 / 10597  8.3  
   120.2   1.0X
codegen = T hashmap = F, track = F5915 / 6069 14.2  
70.5   1.7X
codegen = T hashmap = T, track = F1610 / 1796 52.1  
19.2   6.3X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F  10275 / 10584  8.2  
   122.5   1.0X
codegen = T hashmap = F, track = F6140 / 6557 13.7  
73.2   1.7X
codegen = T hashmap = T, track = F1301 / 1565 64.5  
15.5   7.9X

Track = T:

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T  10723 / 10865  7.8  
   127.8   1.0X
codegen = T hashmap = F, track = T6246 / 6432 13.4  
74.5   1.7X
codegen = T hashmap = T, track = T1465 / 1571 57.3  
17.5   7.3X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T   9964 / 10348  8.4  
   118.8   1.0X
codegen = T hashmap = F, track = T6225 / 6375 13.5  
74.2   1.6X
codegen = T hashmap = T, track = T1361 / 1485 61.6  
16.2   7.3X

Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T  10125 / 10674  8.3  
   120.7   1.0X
codegen = T hashmap = F, track = T6865 / 6980 12.2  
81.8   1.5X
codegen = T hashmap = T, track = T1491 / 1579 56.3  
17.8   6.8X




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Can you run it a few more times to tell? Right now it's a difference of 7% 
almost 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Is it significant? Seems to me that it's in the variance of different runs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
16.8 vs 15.8?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
I just ran the existing `AggregateBenchmark` with the new tracking config:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = F   10655 / 11043  7.9 
127.0   1.0X
codegen = T hashmap = F, track = F 6923 / 7133 12.1 
 82.5   1.5X
codegen = T hashmap = T, track = F 1325 / 1511 63.3 
 15.8   8.0X


Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


codegen = F, track = T  10809 / 11007  7.8  
   128.9   1.0X
codegen = T hashmap = F, track = T6581 / 6629 12.7  
78.4   1.6X
codegen = T hashmap = T, track = T1411 / 1552 59.4  
16.8   7.7X

Looks like no obvious perf degradation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
Sure. Will update later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Can you test the perf degradation?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18258
  
The `enablePerfMetrics` parameter of `UnsafeFixedWidthAggregationMap` has 
this comment:

* @param enablePerfMetrics if true, performance metrics will be 
recorded (has minor perf impact)

It's true those metrics are simple counter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18258
  
Why would the tracking have perf impact? It's just a simple counter 
increase isn't it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18258
  
**[Test build #77866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77866/testReport)**
 for PR 18258 at commit 
[`e4cfe1c`](https://github.com/apache/spark/commit/e4cfe1ca358aee4842395904a1d29da4b9d534e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org