spark git commit: [SPARK-16078][SQL] from_utc_timestamp/to_utc_timestamp should not depends on local timezone

2016-06-22 Thread hvanhovell
Repository: spark
Updated Branches:
  refs/heads/master 43b04b7ec -> 20d411bc5


[SPARK-16078][SQL] from_utc_timestamp/to_utc_timestamp should not depends on 
local timezone

## What changes were proposed in this pull request?

Currently, we use local timezone to parse or format a timestamp 
(TimestampType), then use Long as the microseconds since epoch UTC.

In from_utc_timestamp() and to_utc_timestamp(), we did not consider the local 
timezone, they could return different results with different local timezone.

This PR will do the conversion based on human time (in local timezone), it 
should return same result in whatever timezone. But because the mapping from 
absolute timestamp to human time is not exactly one-to-one mapping, it will 
still return wrong result in some timezone (also in the begging or ending of 
DST).

This PR is kind of the best effort fix. In long term, we should make the 
TimestampType be timezone aware to fix this totally.

## How was this patch tested?

Tested these function in all timezone.

Author: Davies Liu 

Closes #13784 from davies/convert_tz.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/20d411bc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/20d411bc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/20d411bc

Branch: refs/heads/master
Commit: 20d411bc5d05dd099f6d5234a24e10a519a39bdf
Parents: 43b04b7
Author: Davies Liu 
Authored: Wed Jun 22 13:40:24 2016 -0700
Committer: Herman van Hovell 
Committed: Wed Jun 22 13:40:24 2016 -0700

--
 .../expressions/datetimeExpressions.scala   | 10 +--
 .../spark/sql/catalyst/util/DateTimeUtils.scala | 34 --
 .../sql/catalyst/util/DateTimeUtilsSuite.scala  | 65 
 3 files changed, 73 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/20d411bc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 773431d..04c17bd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -730,16 +730,17 @@ case class FromUTCTimestamp(left: Expression, right: 
Expression)
  """.stripMargin)
   } else {
 val tzTerm = ctx.freshName("tz")
+val utcTerm = ctx.freshName("utc")
 val tzClass = classOf[TimeZone].getName
 ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = 
$tzClass.getTimeZone("$tz");""")
+ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = 
$tzClass.getTimeZone("UTC");""")
 val eval = left.genCode(ctx)
 ev.copy(code = s"""
|${eval.code}
|boolean ${ev.isNull} = ${eval.isNull};
|long ${ev.value} = 0;
|if (!${ev.isNull}) {
-   |  ${ev.value} = ${eval.value} +
-   |   ${tzTerm}.getOffset(${eval.value} / 1000) * 1000L;
+   |  ${ev.value} = $dtu.convertTz(${eval.value}, $utcTerm, $tzTerm);
|}
  """.stripMargin)
   }
@@ -869,16 +870,17 @@ case class ToUTCTimestamp(left: Expression, right: 
Expression)
  """.stripMargin)
   } else {
 val tzTerm = ctx.freshName("tz")
+val utcTerm = ctx.freshName("utc")
 val tzClass = classOf[TimeZone].getName
 ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = 
$tzClass.getTimeZone("$tz");""")
+ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = 
$tzClass.getTimeZone("UTC");""")
 val eval = left.genCode(ctx)
 ev.copy(code = s"""
|${eval.code}
|boolean ${ev.isNull} = ${eval.isNull};
|long ${ev.value} = 0;
|if (!${ev.isNull}) {
-   |  ${ev.value} = ${eval.value} -
-   |   ${tzTerm}.getOffset(${eval.value} / 1000) * 1000L;
+   |  ${ev.value} = $dtu.convertTz(${eval.value}, $tzTerm, $utcTerm);
|}
  """.stripMargin)
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/20d411bc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 56bf9a7..df480a1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/ca

spark git commit: [SPARK-16078][SQL] from_utc_timestamp/to_utc_timestamp should not depends on local timezone

2016-06-22 Thread hvanhovell
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 299f427b7 -> 282a3cd02


[SPARK-16078][SQL] from_utc_timestamp/to_utc_timestamp should not depends on 
local timezone

## What changes were proposed in this pull request?

Currently, we use local timezone to parse or format a timestamp 
(TimestampType), then use Long as the microseconds since epoch UTC.

In from_utc_timestamp() and to_utc_timestamp(), we did not consider the local 
timezone, they could return different results with different local timezone.

This PR will do the conversion based on human time (in local timezone), it 
should return same result in whatever timezone. But because the mapping from 
absolute timestamp to human time is not exactly one-to-one mapping, it will 
still return wrong result in some timezone (also in the begging or ending of 
DST).

This PR is kind of the best effort fix. In long term, we should make the 
TimestampType be timezone aware to fix this totally.

## How was this patch tested?

Tested these function in all timezone.

Author: Davies Liu 

Closes #13784 from davies/convert_tz.

(cherry picked from commit 20d411bc5d05dd099f6d5234a24e10a519a39bdf)
Signed-off-by: Herman van Hovell 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/282a3cd0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/282a3cd0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/282a3cd0

Branch: refs/heads/branch-2.0
Commit: 282a3cd02389464d6adbf02921281c963da29b00
Parents: 299f427
Author: Davies Liu 
Authored: Wed Jun 22 13:40:24 2016 -0700
Committer: Herman van Hovell 
Committed: Wed Jun 22 13:41:33 2016 -0700

--
 .../expressions/datetimeExpressions.scala   | 10 +--
 .../spark/sql/catalyst/util/DateTimeUtils.scala | 34 --
 .../sql/catalyst/util/DateTimeUtilsSuite.scala  | 65 
 3 files changed, 73 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/282a3cd0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 773431d..04c17bd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -730,16 +730,17 @@ case class FromUTCTimestamp(left: Expression, right: 
Expression)
  """.stripMargin)
   } else {
 val tzTerm = ctx.freshName("tz")
+val utcTerm = ctx.freshName("utc")
 val tzClass = classOf[TimeZone].getName
 ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = 
$tzClass.getTimeZone("$tz");""")
+ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = 
$tzClass.getTimeZone("UTC");""")
 val eval = left.genCode(ctx)
 ev.copy(code = s"""
|${eval.code}
|boolean ${ev.isNull} = ${eval.isNull};
|long ${ev.value} = 0;
|if (!${ev.isNull}) {
-   |  ${ev.value} = ${eval.value} +
-   |   ${tzTerm}.getOffset(${eval.value} / 1000) * 1000L;
+   |  ${ev.value} = $dtu.convertTz(${eval.value}, $utcTerm, $tzTerm);
|}
  """.stripMargin)
   }
@@ -869,16 +870,17 @@ case class ToUTCTimestamp(left: Expression, right: 
Expression)
  """.stripMargin)
   } else {
 val tzTerm = ctx.freshName("tz")
+val utcTerm = ctx.freshName("utc")
 val tzClass = classOf[TimeZone].getName
 ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = 
$tzClass.getTimeZone("$tz");""")
+ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = 
$tzClass.getTimeZone("UTC");""")
 val eval = left.genCode(ctx)
 ev.copy(code = s"""
|${eval.code}
|boolean ${ev.isNull} = ${eval.isNull};
|long ${ev.value} = 0;
|if (!${ev.isNull}) {
-   |  ${ev.value} = ${eval.value} -
-   |   ${tzTerm}.getOffset(${eval.value} / 1000) * 1000L;
+   |  ${ev.value} = $dtu.convertTz(${eval.value}, $tzTerm, $utcTerm);
|}
  """.stripMargin)
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/282a3cd0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 56bf9a7..df480a