aiguofer commented on code in PR #37088:
URL: https://github.com/apache/arrow/pull/37088#discussion_r1288901368
##########
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java:
##########
@@ -188,12 +188,9 @@ public static ArrowType getArrowTypeFromJdbcType(final
JdbcFieldInfo fieldInfo,
case Types.TIME:
return new ArrowType.Time(TimeUnit.MILLISECOND, 32);
case Types.TIMESTAMP:
- final String timezone;
- if (calendar != null) {
- timezone = calendar.getTimeZone().getID();
- } else {
- timezone = null;
- }
+ return new ArrowType.Timestamp(TimeUnit.MILLISECOND);
+ case Types.TIMESTAMP_WITH_TIMEZONE:
+ final String timezone = calendar == null ? null :
calendar.getTimeZone().getID();
Review Comment:
Yeah this one is tough. As I understand it, `TIMESTAMP_WITH_TIMEZONE` means
that each record includes its own timezone as a UTC offset (or at least, that
seems to be the case with Snowflake). Based on the JDBC spec,
`ResultSet.getTimestamp(int columnIndex, Calendar cal)` does the following
`This method uses the given calendar to construct an appropriate millisecond
value for the timestamp if the underlying database does not store timezone
information.`. This leads me to believe that it parses out the offset from the
underlying db and uses that to convert to UTC. If no offset is available, it
assumes the timestamp is in w/e is passed in the `calendar` and then converts
it to UTC. Here's an interesting SO on the topic:
https://stackoverflow.com/a/63078938/1815486
From my understanding, `ArrowType.Timestamp` does not support per record
timezones and it does not do any conversions on its own. It seems like the
timezone is associated with the Vector itself and all records are expected to
be in that same timezone. This means we must convert `TIMESTAMP_WITH_TIMEZONE`
values into one specific TZ before we add them to the vector. It's also not
exactly clear what it means when the TZ is `null` (I'm assuming this means it's
just a "wall clock" time).
This all leaves some questions:
- What is the TZ associated with `calendar` supposed to represent for these
functions?
- The TZ to assume the underlying values are in
- The TZ to report on the ArrowType
- The TZ that we should convert values to
There's some important connotations because it depends on whether one is
using the arrow-jdbc library to convert values server side, where it's
generally safe to use/assume UTC for a lot of things, or client side, where
they might want to express values in the user's timezone.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]