Jackie-Jiang commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3180035533
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetNativeRecordExtractor.java:
##########
@@ -148,95 +164,74 @@ private Object extractValue(Group from, int fieldIndex,
Type fieldType, int inde
return BigDecimal.valueOf(longValue,
decimalLogicalTypeAnnotation.getScale());
}
if (logicalTypeAnnotation instanceof
LogicalTypeAnnotation.TimeLogicalTypeAnnotation) {
- // `TIME_MICROS` / `TIME_NANOS` → [LocalTime] preserving full
nanosecond precision.
+ // `TIME_MICROS` / `TIME_NANOS` → [LocalTime] preserving full
nanosecond precision, or raw value
+ // (in the column's declared unit) when configured.
+ if (_extractRawTimeValues) {
+ return longValue;
+ }
LogicalTypeAnnotation.TimeUnit timeUnit =
((LogicalTypeAnnotation.TimeLogicalTypeAnnotation)
logicalTypeAnnotation).getUnit();
return LocalTime.ofNanoOfDay(timeUnit ==
LogicalTypeAnnotation.TimeUnit.MICROS ? longValue * 1_000L
: longValue);
}
if (logicalTypeAnnotation instanceof
LogicalTypeAnnotation.TimestampLogicalTypeAnnotation) {
- // `TIMESTAMP_MILLIS` / `TIMESTAMP_MICROS` / `TIMESTAMP_NANOS` →
`java.sql.Timestamp` preserving
- // sub-millisecond precision via `setNanos`.
- return timestampFromLong(longValue,
+ // `TIMESTAMP_MILLIS` / `TIMESTAMP_MICROS` / `TIMESTAMP_NANOS` →
[Timestamp] preserving sub-millisecond
+ // nanos via `setNanos`. Raw mode returns the value as stored in
the column's declared unit.
+ // TODO: `isAdjustedToUTC()` is ignored — a `timestamp-*
adjustToUTC=false` field is local-clock
+ // time, not a UTC instant, but we treat all values as UTC
instants per the Pinot TIMESTAMP
+ // contract. For local-datetime data the produced `Timestamp`
will be off by the JVM TZ offset.
+ if (_extractRawTimeValues) {
+ return longValue;
+ }
+ return ParquetUtils.convertLongToTimestamp(longValue,
((LogicalTypeAnnotation.TimestampLogicalTypeAnnotation)
logicalTypeAnnotation).getUnit());
Review Comment:
Acknowledged, but out of scope for this PR. The `isAdjustedToUTC=false`
mishandling is pre-existing — the prior code (before this PR) decoded both
variants identically as UTC instants too. This PR keeps that behavior unchanged
and adds an explicit TODO at the call site (`isAdjustedToUTC() is ignored — a
'timestamp-* adjustToUTC=false' field is local-clock time, not a UTC instant,
but we treat all values as UTC instants per the Pinot TIMESTAMP contract`).
Fixing it is itself a user-visible behavior change for anyone whose data is
`adjustToUTC=false` — their TIMESTAMP column values would shift by the JVM TZ
offset on upgrade — so it deserves its own PR with a release note rather than
being folded into the contract-refinement scope here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]