spark git commit: [SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp

wenchen Thu, 27 Sep 2018 00:03:01 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 0cf4c5bbe -> 0b4e58187



[SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp

## What changes were proposed in this pull request?

We have an agreement that the behavior of `from/to_utc_timestamp` is corrected, 
although the function itself doesn't make much sense in Spark: 
https://issues.apache.org/jira/browse/SPARK-23715

This PR improves the document.

## How was this patch tested?

N/A

Closes #22543 from cloud-fan/doc.

Authored-by: Wenchen Fan <wenc...@databricks.com>
Signed-off-by: Wenchen Fan <wenc...@databricks.com>
(cherry picked from commit ff876137faba1802b66ecd483ba15f6ccd83ffc5)
Signed-off-by: Wenchen Fan <wenc...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0b4e5818
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0b4e5818
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0b4e5818

Branch: refs/heads/branch-2.4
Commit: 0b4e58187b787cc7a6d57a2a9d467934ece24252
Parents: 0cf4c5b
Author: Wenchen Fan <wenc...@databricks.com>
Authored: Thu Sep 27 15:02:20 2018 +0800
Committer: Wenchen Fan <wenc...@databricks.com>
Committed: Thu Sep 27 15:02:52 2018 +0800

----------------------------------------------------------------------
 R/pkg/R/functions.R                             | 26 +++++++++++++----
 python/pyspark/sql/functions.py                 | 30 ++++++++++++++++----
 .../expressions/datetimeExpressions.scala       | 30 ++++++++++++++++----
 3 files changed, 68 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/0b4e5818/R/pkg/R/functions.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 572dee5..63bd427 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2203,9 +2203,16 @@ setMethod("from_json", signature(x = "Column", schema = 
"characterOrstructType")
           })
 
 #' @details
-#' \code{from_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', 
interprets it as a
-#' time in UTC, and renders that time as a timestamp in the given time zone. 
For example, 'GMT+1'
-#' would yield '2017-07-14 03:40:00.0'.
+#' \code{from_utc_timestamp}: This is a common function for databases 
supporting TIMESTAMP WITHOUT
+#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and 
interprets it as a
+#' timestamp in UTC, and renders that timestamp as a timestamp in the given 
time zone.
+#' However, timestamp in Spark represents number of microseconds from the Unix 
epoch, which is not
+#' timezone-agnostic. So in Spark this function just shift the timestamp value 
from UTC timezone to
+#' the given timezone.
+#' This function may return confusing result if the input is a string with 
timezone, e.g.
+#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast 
the string to
+#' timestamp according to the timezone in the string, and finally display the 
result by converting
+#' the timestamp to string according to the session local timezone.
 #'
 #' @rdname column_datetime_diff_functions
 #'
@@ -2261,9 +2268,16 @@ setMethod("next_day", signature(y = "Column", x = 
"character"),
           })
 
 #' @details
-#' \code{to_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', 
interprets it as a
-#' time in the given time zone, and renders that time as a timestamp in UTC. 
For example, 'GMT+1'
-#' would yield '2017-07-14 01:40:00.0'.
+#' \code{to_utc_timestamp}: This is a common function for databases supporting 
TIMESTAMP WITHOUT
+#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and 
interprets it as a
+#' timestamp in the given timezone, and renders that timestamp as a timestamp 
in UTC.
+#' However, timestamp in Spark represents number of microseconds from the Unix 
epoch, which is not
+#' timezone-agnostic. So in Spark this function just shift the timestamp value 
from the given
+#' timezone to UTC timezone.
+#' This function may return confusing result if the input is a string with 
timezone, e.g.
+#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast 
the string to
+#' timestamp according to the timezone in the string, and finally display the 
result by converting
+#' the timestamp to string according to the session local timezone.
 #'
 #' @rdname column_datetime_diff_functions
 #' @aliases to_utc_timestamp to_utc_timestamp,Column,character-method

http://git-wip-us.apache.org/repos/asf/spark/blob/0b4e5818/python/pyspark/sql/functions.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 6da5237..8c54179 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1283,9 +1283,18 @@ def unix_timestamp(timestamp=None, format='yyyy-MM-dd 
HH:mm:ss'):
 @since(1.5)
 def from_utc_timestamp(timestamp, tz):
     """
-    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in 
UTC, and renders
-    that time as a timestamp in the given time zone. For example, 'GMT+1' 
would yield
-    '2017-07-14 03:40:00.0'.
+    This is a common function for databases supporting TIMESTAMP WITHOUT 
TIMEZONE. This function
+    takes a timestamp which is timezone-agnostic, and interprets it as a 
timestamp in UTC, and
+    renders that timestamp as a timestamp in the given time zone.
+
+    However, timestamp in Spark represents number of microseconds from the 
Unix epoch, which is not
+    timezone-agnostic. So in Spark this function just shift the timestamp 
value from UTC timezone to
+    the given timezone.
+
+    This function may return confusing result if the input is a string with 
timezone, e.g.
+    '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the 
string to timestamp
+    according to the timezone in the string, and finally display the result by 
converting the
+    timestamp to string according to the session local timezone.
 
     :param timestamp: the column that contains timestamps
     :param tz: a string that has the ID of timezone, e.g. "GMT", 
"America/Los_Angeles", etc
@@ -1308,9 +1317,18 @@ def from_utc_timestamp(timestamp, tz):
 @since(1.5)
 def to_utc_timestamp(timestamp, tz):
     """
-    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in 
the given time
-    zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' 
would yield
-    '2017-07-14 01:40:00.0'.
+    This is a common function for databases supporting TIMESTAMP WITHOUT 
TIMEZONE. This function
+    takes a timestamp which is timezone-agnostic, and interprets it as a 
timestamp in the given
+    timezone, and renders that timestamp as a timestamp in UTC.
+
+    However, timestamp in Spark represents number of microseconds from the 
Unix epoch, which is not
+    timezone-agnostic. So in Spark this function just shift the timestamp 
value from the given
+    timezone to UTC timezone.
+
+    This function may return confusing result if the input is a string with 
timezone, e.g.
+    '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the 
string to timestamp
+    according to the timezone in the string, and finally display the result by 
converting the
+    timestamp to string according to the session local timezone.
 
     :param timestamp: the column that contains timestamps
     :param tz: a string that has the ID of timezone, e.g. "GMT", 
"America/Los_Angeles", etc

http://git-wip-us.apache.org/repos/asf/spark/blob/0b4e5818/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index eb78e39..45e17ae 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -1018,9 +1018,18 @@ case class TimeAdd(start: Expression, interval: 
Expression, timeZoneId: Option[S
 }
 
 /**
- * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in 
UTC, and renders
- * that time as a timestamp in the given time zone. For example, 'GMT+1' would 
yield
- * '2017-07-14 03:40:00.0'.
+ * This is a common function for databases supporting TIMESTAMP WITHOUT 
TIMEZONE. This function
+ * takes a timestamp which is timezone-agnostic, and interprets it as a 
timestamp in UTC, and
+ * renders that timestamp as a timestamp in the given time zone.
+ *
+ * However, timestamp in Spark represents number of microseconds from the Unix 
epoch, which is not
+ * timezone-agnostic. So in Spark this function just shift the timestamp value 
from UTC timezone to
+ * the given timezone.
+ *
+ * This function may return confusing result if the input is a string with 
timezone, e.g.
+ * '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the 
string to timestamp
+ * according to the timezone in the string, and finally display the result by 
converting the
+ * timestamp to string according to the session local timezone.
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
@@ -1215,9 +1224,18 @@ case class MonthsBetween(
 }
 
 /**
- * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in 
the given time zone,
- * and renders that time as a timestamp in UTC. For example, 'GMT+1' would 
yield
- * '2017-07-14 01:40:00.0'.
+ * This is a common function for databases supporting TIMESTAMP WITHOUT 
TIMEZONE. This function
+ * takes a timestamp which is timezone-agnostic, and interprets it as a 
timestamp in the given
+ * timezone, and renders that timestamp as a timestamp in UTC.
+ *
+ * However, timestamp in Spark represents number of microseconds from the Unix 
epoch, which is not
+ * timezone-agnostic. So in Spark this function just shift the timestamp value 
from the given
+ * timezone to UTC timezone.
+ *
+ * This function may return confusing result if the input is a string with 
timezone, e.g.
+ * '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the 
string to timestamp
+ * according to the timezone in the string, and finally display the result by 
converting the
+ * timestamp to string according to the session local timezone.
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp

Reply via email to