[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-03 Thread clockfly
Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14446
  
@davies 

This PR is created after analyzing the performance impact of #12352, which 
added the row level metrics and caused 15% performance regression. And I can 
verify the performance regression consistently by comparing performance of 
d4b94ea and 6f88006. 

But the problem is that I cannot reproduce the same performance regression 
consistently on trunk, the performance improvement after the fix on trunk 
varies a lot (sometimes 5%, sometimes 20%, sometimes not obvious). The 
phenomenon I observed is that when running the same benchmark code repeatedly 
in same spark shell for 100 times, the time it takes for each run doesn't 
converge.

For example, if we run the below code for 100 times,
```
spark.read.parquet("/tmp/data4").filter($"nc.id" < 100).collect()))
```

I observed:
1. For the first run, it may take > 9000 ms
2. Then for the next few runs, it is much faster, around 4700ms
3. After that, the performance suddenly becomes worse. It may take around 
8500 ms for each run.

I guess the phenomenon has something to do with Java JIT and our codegen 
logic (Because of codegen, we are creating new class type for each run in 
spark-shell).

As I cannot verify this improvement consistently, I am going to close this 
PR. 
 





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14446
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14446
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63154/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14446
  
**[Test build #63154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63154/consoleFull)**
 for PR 14446 at commit 
[`24988c8`](https://github.com/apache/spark/commit/24988c8612d2dac6a17471e4541ece815f7a6d74).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14446
  
**[Test build #63154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63154/consoleFull)**
 for PR 14446 at commit 
[`24988c8`](https://github.com/apache/spark/commit/24988c8612d2dac6a17471e4541ece815f7a6d74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-02 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/14446
  
The changes looks good to me. Could you post the numbers of benchmark in PR 
description ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14446
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63101/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14446
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14446
  
**[Test build #63101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63101/consoleFull)**
 for PR 14446 at commit 
[`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14446
  
**[Test build #63101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63101/consoleFull)**
 for PR 14446 at commit 
[`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread clockfly
Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14446
  
 retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14446
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14446
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63097/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14446
  
**[Test build #63097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63097/consoleFull)**
 for PR 14446 at commit 
[`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14446: [SPARK-16841][SQL] Improves the row level metrics perfor...

2016-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14446
  
**[Test build #63097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63097/consoleFull)**
 for PR 14446 at commit 
[`1054b74`](https://github.com/apache/spark/commit/1054b74f18193378942b7fde26df36e06bff765e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org