HyukjinKwon commented on code in PR #48619:
URL: https://github.com/apache/arrow/pull/48619#discussion_r2685520381


##########
docs/source/python/timestamps.rst:
##########
@@ -60,23 +60,29 @@ Spark to Pandas (through Apache Arrow)
 The following cases assume the Spark configuration
 ``spark.sql.execution.arrow.enabled`` is set to ``"true"``.
 
-::
+.. code-block:: python
 
+    >>> import pandas as pd
+    >>> from datetime import datetime, timedelta, timezone
     >>> pdf = pd.DataFrame({'naive': [datetime(2019, 1, 1, 0)],
-    ...                     'aware': [Timestamp(year=2019, month=1, day=1,
-    ...                               nanosecond=500, 
tz=timezone(timedelta(hours=-8)))]})
+    ...                     'aware': [pd.Timestamp(year=2019, month=1, day=1,
+    ...                               nanosecond=500,
+    ...                               tz=timezone(timedelta(hours=-8)))]})
     >>> pdf
            naive                               aware
-           0 2019-01-01 2019-01-01 00:00:00.000000500-08:00
+    0 2019-01-01 2019-01-01 00:00:00.000000500-08:00
 
-    >>> spark.conf.set("spark.sql.session.timeZone", "UTC")
-    >>> utc_df = sqlContext.createDataFrame(pdf)
-    >>> utf_df.show()
+    >>> from pyspark.sql import SparkSession  # doctest: +SKIP
+    >>> spark = SparkSession.builder.appName("MyApp").getOrCreate()  # 
doctest: +SKIP
+    >>> spark.conf.set("spark.sql.session.timeZone", "UTC")  # doctest: +SKIP
+    >>> utc_df = spark.createDataFrame(pdf)  # doctest: +SKIP
+    >>> utc_df.show()  # doctest: +SKIP
     +-------------------+-------------------+
     |              naive|              aware|
     +-------------------+-------------------+
-    |2019-01-01 00:00:00|2019-01-01 08:00:00|
+    |2018-12-31 23:00:00|2019-01-01 08:00:00|

Review Comment:
   I think `2019-01-01 00:00:00` became `2018-12-31 23:00:00` here cuz I 
suspect you or CI (?) is somewhere in GMT+1. `datetime(2019, 1, 1, 0)` is 
assumed as local time (yes it's up to the system to interpret but Spark thinks 
so). So, Spark thought that it's a local time but the timezone was set as UTC 
so it decreased one hour.
   
   I think we should probably just skip all here cuz now it seems depending on 
local timezone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to