Fokko commented on PR #6879:
URL: https://github.com/apache/iceberg/pull/6879#issuecomment-1542685254

   I want to revisit this. I had a chat with @danielcweeks on the interesting 
subject of time zones. We also discussed how Spark is doing it, and I like that 
the notion of the session timestamp (same as @rdblue suggested above).
   
   In Spark:
   ```
   spark-sql> create table t1(c1 timestamp);
   Time taken: 0.572 seconds
   
   spark-sql> insert into t1 VALUES(now);
   Time taken: 4.716 seconds
   
   spark-sql> select * from t1;
   2023-05-10 19:51:48.241101
   Time taken: 2.351 seconds, Fetched 1 row(s)
   ```
   
   So it is stored in UTC (as described in the Iceberg spec), but the session 
is already localized to my current timezone (it is 19:51 on the wall clock as 
well).
   
   ```
   >>> df = spark.read.table("default.t1");
   >>> df
   DataFrame[c1: timestamp]
   >>> df.show(truncate=False)
   +--------------------------+                                                 
   
   |c1                        |
   +--------------------------+
   |2023-05-10 19:51:48.241101|
   +--------------------------+
   
   >>> pdf = df.toPandas()
   >>> pdf
                             c1
   0 2023-05-10 19:51:48.241101
   
   >>> >>> pdf.c1
   0   2023-05-10 19:51:48.241101
   Name: c1, dtype: datetime64[ns]
   ```
   
   Spark does not assign any timestamp, which I think is wrong.
   
   With PyIceberg:
   
   ```
   0   2023-05-10 17:51:48.241101+00:00
   Name: c1, dtype: datetime64[ns, UTC]
   ```
   
   Then the question is, should we also localize this information 
(`tz_convert`the columns to `CEST` for the Dutch).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to