[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-13 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-183791219
  
@srowen you are right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180793931
  
@adrian-wang but the `@transient` was applied to the whole reference to the 
`Map`. This doesn't make sense. The way to avoid a big cache is with weak keys, 
which has nothing to do with transient or serialization; this is not an 
instance field anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179741773
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50746/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179741404
  
**[Test build #50746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50746/consoleFull)**
 for PR 11071 at commit 
[`0ab90ed`](https://github.com/apache/spark/commit/0ab90ed1253b5b11a9d24b8fbd1e15b62baf79e9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179741770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51851087
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

This map could be quite big, because the string varies. Actually 
`ZoneInfoFile` does provide a cache for different `ID`s. Let's find out whether 
the boost you mentioned comes from reusing `TimeZone` or `Calendar` instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51851099
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

This map could be quite big, because the string varies. Actually 
`ZoneInfoFile` does provide a cache for different `ID`s. Let's find out whether 
the boost you mentioned comes from reusing `TimeZone` or `Calendar` instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51852765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

By use this map we can skip a lot of calls to `getTimeZone`, which is a 
synchronized method, `ConcurrentHashMap` can help improve performance, that's 
true. Do we need add a `transient`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51853631
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

Actually, we only need to change all `getTimeZone(String ID)` to 
`getTimeZone(String ID, boolean fallback)`, (use true as fallback) to 
workaround the synchronized tag here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180223605
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50804/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180223604
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang closed the pull request at:

https://github.com/apache/spark/pull/11071


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180172844
  
**[Test build #50794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50794/consoleFull)**
 for PR 11071 at commit 
[`72b31c8`](https://github.com/apache/spark/commit/72b31c83732a98720488f6d07bfd900c333e2306).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180172808
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51972422
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

Added `transient`. The total available timezone IDs should be limited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180175835
  
**[Test build #50797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50797/consoleFull)**
 for PR 11071 at commit 
[`72b31c8`](https://github.com/apache/spark/commit/72b31c83732a98720488f6d07bfd900c333e2306).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180199036
  
**[Test build #50797 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50797/consoleFull)**
 for PR 11071 at commit 
[`72b31c8`](https://github.com/apache/spark/commit/72b31c83732a98720488f6d07bfd900c333e2306).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180200750
  
transient is wrong for a static member. What are you trying to do, avoid a 
large cache? There are ways to do this but its not this. but how many timezones 
could there be? And are these not already cached?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180207538
  
I have a sub query like this `SELECT a, b, c FROM table UV  WHERE 
(datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, 
'2015-01-01')<=0)) `
When profiling this stage with Spark 1.6, I noticed a lot time was consumed 
by `DateTimeUtils.stringToDate`. Especially, `TimeZone.getTimeZone` and 
`Calendar.getInstance` are extremely slow. The table stores `visitDate` as 
`String` type and has 3 billion records. This means it creates 3 billion 
`Calendar` and `TimeZone` objects.

`TimeZone.getTimeZone` is a synchronized method and will block other 
threads calling this same method. #10994 fixed one for 
`DateTimeUtils.stringToDate`. But `DateTimeUtils.stringToTimestamp` has the 
same issue so I tried to cache the `TimeZone` objects in a map. The total 
available number of `TimeZone` should be limited.

By reusing `Calendar` object instead of creating it each time in the 
method, I can see more performance improvement. Creating 20 millions `Calendar` 
objects will take more that 20 seconds on my machine. So we will benefit from 
reusing it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180192861
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180192869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50794/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180191627
  
**[Test build #50794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50794/consoleFull)**
 for PR 11071 at commit 
[`72b31c8`](https://github.com/apache/spark/commit/72b31c83732a98720488f6d07bfd900c333e2306).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180199780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50797/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180199774
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180201265
  
The map key could be like "UTC+01:00". "American/Los Angeles", "PST", etc., 
they are already cached in `getTimeZone`, but the method itself is a 
synchronized one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/11071

[SPARK-13185][SQL] Improve the performance of DateTimeUtils by reusing 
TimeZone and Calendar objects

It is expensive to create java TimeZone and Calendar objects in each method 
of DateTimeUtils. We can reuse the objects to improve the performance. In one 
of my Sql queries which calls StringToDate many times, the duration of the 
stage improved from 1.6 minutes to 1.2 minutes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark DateTimeUtilsFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11071


commit cb9b157525c05c1a3d33b8c820595cf020a21b43
Author: Carson Wang 
Date:   2016-02-04T08:12:59Z

Reuse TimeZone and Calendar objects in DateTimeUtils




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179711566
  
**[Test build #50744 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50744/consoleFull)**
 for PR 11071 at commit 
[`cb9b157`](https://github.com/apache/spark/commit/cb9b157525c05c1a3d33b8c820595cf020a21b43).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179711573
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179711327
  
**[Test build #50744 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50744/consoleFull)**
 for PR 11071 at commit 
[`cb9b157`](https://github.com/apache/spark/commit/cb9b157525c05c1a3d33b8c820595cf020a21b43).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179711574
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50744/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-179713645
  
**[Test build #50746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50746/consoleFull)**
 for PR 11071 at commit 
[`0ab90ed`](https://github.com/apache/spark/commit/0ab90ed1253b5b11a9d24b8fbd1e15b62baf79e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org