aiguofer commented on code in PR #37088:
URL: https://github.com/apache/arrow/pull/37088#discussion_r1288901368


##########
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java:
##########
@@ -188,12 +188,9 @@ public static ArrowType getArrowTypeFromJdbcType(final 
JdbcFieldInfo fieldInfo,
       case Types.TIME:
         return new ArrowType.Time(TimeUnit.MILLISECOND, 32);
       case Types.TIMESTAMP:
-        final String timezone;
-        if (calendar != null) {
-          timezone = calendar.getTimeZone().getID();
-        } else {
-          timezone = null;
-        }
+        return new ArrowType.Timestamp(TimeUnit.MILLISECOND);
+      case Types.TIMESTAMP_WITH_TIMEZONE:
+        final String timezone = calendar == null ? null : 
calendar.getTimeZone().getID();

Review Comment:
   Yeah this one is tough. As I understand it, `TIMESTAMP_WITH_TIMEZONE` means 
that each record includes its own timezone as a UTC offset (or at least, that 
seems to be the case with Snowflake). Based on the JDBC spec, 
`ResultSet.getTimestamp(int columnIndex, Calendar cal)` does the following 
`This method uses the given calendar to construct an appropriate millisecond 
value for the timestamp if the underlying database does not store timezone 
information.`. This leads me to believe that it parses out the offset from the 
underlying db and uses that to convert to UTC. If no offset is available, it 
assumes the timestamp is in w/e is passed in the `calendar` and then converts 
it to UTC. Here's an interesting SO on the topic: 
https://stackoverflow.com/a/63078938/1815486
   
   From my understanding, `ArrowType.Timestamp` does not support per record 
timezones and it does not do any conversions on its own. It seems like the 
timezone is associated with the Vector itself and all records are expected to 
be in that same timezone. This means we must convert `TIMESTAMP_WITH_TIMEZONE` 
values into one specific TZ before we add them to the vector. It's also not 
exactly clear what it means when the TZ is `null` (I'm assuming this means it's 
just a "wall clock" time).
   
   This all leaves some questions:
   - What is the TZ associated with `calendar` supposed to represent for these 
functions? 
     - The TZ to assume the underlying values are in
     - The TZ to report on the ArrowType
     - The TZ that we should convert values to
     
   There's some important connotations because it depends on whether one is 
using the arrow-jdbc library to convert values server side, where it's 
generally safe to use/assume UTC for a lot of things, or client side, where 
they might want to express values in the user's timezone. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to