Re: [PR] HIVE-29033: ORC reader should not assume that TimestampColumnVector is in UTC time zone [hive]

via GitHub Tue, 24 Jun 2025 17:44:49 -0700


vrozov commented on code in PR #5868:
URL: https://github.com/apache/hive/pull/5868#discussion_r2165213074



##########
ql/src/java/org/apache/hadoop/hive/ql/io/BatchToRowReader.java:
##########
@@ -518,7 +521,8 @@ public static TimestampWritableV2 
nextTimestamp(ColumnVector vector,
         result = (TimestampWritableV2) previous;
       }
       TimestampColumnVector tcv = (TimestampColumnVector) vector;
-      result.setInternal(tcv.time[row], tcv.nanos[row]);
+      result.set(Timestamp.ofEpochSecond(Math.floorDiv(tcv.time[row], 1000L), 
tcv.nanos[row],
+          tcv.isUTC() ? ZoneOffset.UTC : ZoneId.systemDefault()));

Review Comment:
   I still don't see why significant refactoring and propagating flag using 
`context` is a better solution. What is wrong with the iterating over columns? 
The PR approach works with existing way how `TimestampTreeReader` sets `isUTC` 
flag in `TimestampColumnVector` and also aligns with the 
https://github.com/apache/orc/pull/2300 changes. Once Hive upgrades to ORC with 
the fix, the workaround in the PR can be kept in place or removed, while 
suggested refactoring becomes obsolete.
   
   > Additionally, it would be nice to reuse the timestamp conversion function 
to avoid code duplication
   I can make that change, if that the only remaining concern with the PR.



##########
ql/src/java/org/apache/hadoop/hive/ql/io/BatchToRowReader.java:
##########
@@ -518,7 +521,8 @@ public static TimestampWritableV2 
nextTimestamp(ColumnVector vector,
         result = (TimestampWritableV2) previous;
       }
       TimestampColumnVector tcv = (TimestampColumnVector) vector;
-      result.setInternal(tcv.time[row], tcv.nanos[row]);
+      result.set(Timestamp.ofEpochSecond(Math.floorDiv(tcv.time[row], 1000L), 
tcv.nanos[row],
+          tcv.isUTC() ? ZoneOffset.UTC : ZoneId.systemDefault()));

Review Comment:
   I still don't see why significant refactoring and propagating flag using 
`context` is a better solution. What is wrong with the iterating over columns? 
The PR approach works with existing way how `TimestampTreeReader` sets `isUTC` 
flag in `TimestampColumnVector` and also aligns with the 
https://github.com/apache/orc/pull/2300 changes. Once Hive upgrades to ORC with 
the fix, the workaround in the PR can be kept in place or removed, while 
suggested refactoring becomes obsolete.
   
   > Additionally, it would be nice to reuse the timestamp conversion function 
to avoid code duplication
   
   I can make that change, if that the only remaining concern with the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29033: ORC reader should not assume that TimestampColumnVector is in UTC time zone [hive]

Reply via email to