bersprockets commented on a change in pull request #34741: URL: https://github.com/apache/spark/pull/34741#discussion_r758851983
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala ########## @@ -531,13 +533,16 @@ object OrcUtils extends Logging { } def fromOrcNTZ(ts: Timestamp): Long = { - DateTimeUtils.millisToMicros(ts.getTime) + + val utcMicros = DateTimeUtils.millisToMicros(ts.getTime) + (ts.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS + val micros = DateTimeUtils.fromUTCTime(utcMicros, TimeZone.getDefault.getID) + micros } def toOrcNTZ(micros: Long): OrcTimestamp = { - val seconds = Math.floorDiv(micros, MICROS_PER_SECOND) - val nanos = (micros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS + val utcMicros = DateTimeUtils.toUTCTime(micros, TimeZone.getDefault.getID) Review comment: There is at least one issue with dates before 1883-11-17. Railway time zones didn't exist then, and java.time classes (which fromUTCTime/toUTCTime use) care about that. Also, I surmise there will be an additional issue with pre-Gregorian values, since ORC assumes Julian when "shifting" values on read. Even my POC has issues when the writer is in the Pacific/Pago_Pago time zone and reads from some other times zone, because I also used the fromUTCTime/toUTCTime utility functions. When dealing with hybrid Julian times, ORC doesn't have issues shifting values between time zones that didn't exist yet, since the old time-related classes didn't worry about that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org