to Orc uses UTC time zone

GitBox Mon, 29 Nov 2021 17:04:24 -0800


bersprockets commented on a change in pull request #34741:
URL: https://github.com/apache/spark/pull/34741#discussion_r758851983




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
##########
@@ -531,13 +533,16 @@ object OrcUtils extends Logging {
   }
 
   def fromOrcNTZ(ts: Timestamp): Long = {
-    DateTimeUtils.millisToMicros(ts.getTime) +
+    val utcMicros = DateTimeUtils.millisToMicros(ts.getTime) +
       (ts.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS
+    val micros = DateTimeUtils.fromUTCTime(utcMicros, 
TimeZone.getDefault.getID)
+    micros
   }
 
   def toOrcNTZ(micros: Long): OrcTimestamp = {
-    val seconds = Math.floorDiv(micros, MICROS_PER_SECOND)
-    val nanos = (micros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS
+    val utcMicros = DateTimeUtils.toUTCTime(micros, TimeZone.getDefault.getID)

Review comment:
       There is at least one issue with dates before 1883-11-17. Railway time 
zones didn't exist then, and java.time classes (which fromUTCTime/toUTCTime 
use) care about that.
   
   Also, I surmise there will be an additional issue with pre-Gregorian values, 
since ORC assumes Julian when "shifting" values on read.
   
   Even my POC has issues when the writer is in the Pacific/Pago_Pago time zone 
and reads from some other times zone, because I also used the 
fromUTCTime/toUTCTime utility functions.
   
   When dealing with hybrid Julian times, ORC doesn't have issues shifting 
values between time zones that didn't exist yet, since the old time-related 
classes didn't worry about that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bersprockets commented on a change in pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

Reply via email to