Copilot commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3175820076
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java:
##########
@@ -211,22 +194,18 @@ protected Map<Object, Object> convertRecord(Object value)
{
getClass().getSimpleName() + " does not support nested records;
override convertRecord() to enable.");
}
- /// Converts a single value per the [RecordExtractor] contract:
+ /// Universal normalizations for types that any format extractor might
produce:
/// - `Byte` / `Short` widen to `Integer` so all small ints unify behind a
single Pinot type
/// - [BigInteger] widens to [BigDecimal] (Pinot has no `BigInteger` data
type; downstream transforms handle
/// `BigDecimal` natively)
/// - other `Number` (`Integer` / `Long` / `Float` / `Double` /
`BigDecimal`) passes through
/// - `Boolean` passes through
/// - `byte[]` passes through
- /// - [Timestamp] passes through
/// - `ByteBuffer` materializes to `byte[]` (sliced so the source buffer's
position is not advanced)
///
- /// [Temporal] family (`java.time`):
- /// - [LocalDate] / [LocalTime] pass through (TZ-independent date /
time-of-day)
- /// - [Instant] / [OffsetDateTime] / [ZonedDateTime] bridge to [Timestamp]
via [Timestamp#from] — all three are
- /// unambiguously TZ-anchored, so the conversion is lossless (sub-second
nanos preserved)
- /// - [LocalDateTime] bridges to [Timestamp] interpreting the wall-clock as
UTC
- /// - other [Temporal] types fall through to `value.toString()`
+ /// Date / time types (DATE / TIME / TIMESTAMP) are NOT handled here — see
[RecordExtractor] for the contract
+ /// and the format-specific extractor (`convertSingleValue` override or
direct construction in `extractValue`)
+ /// for the native-to-contract conversion.
Review Comment:
This change silently alters the default SPI behavior for third-party
`BaseRecordExtractor` subclasses: any extractor that relied on the base
implementation to normalize `Timestamp`/`Instant`/`LocalDate`/`LocalTime` now
falls through to `toString()`. That is a breaking contract change for external
formats unless every downstream subclass is updated in lockstep.
##########
pinot-plugins/pinot-input-format/pinot-arrow/src/main/java/org/apache/pinot/plugin/inputformat/arrow/ArrowToGenericRowConverter.java:
##########
@@ -246,12 +245,10 @@ private Object
convertArrowTypeToPinotCompatible(@Nullable Object value) {
return value.toString();
}
- // Handle Arrow LocalDateTime -> java.sql.Timestamp conversion
+ // Handle Arrow LocalDateTime -> Long epoch millis (per RecordExtractor
contract: TIMESTAMP → Long).
if (value instanceof LocalDateTime) {
- // Arrow TimeStampMilliVector.getObject() returns LocalDateTime, but
Pinot expects
- // java.sql.Timestamp objects for proper timestamp handling and native
support
LocalDateTime dateTime = (LocalDateTime) value;
- return Timestamp.from(dateTime.toInstant(ZoneOffset.UTC));
+ return dateTime.toInstant(ZoneOffset.UTC).toEpochMilli();
Review Comment:
The new `LocalDateTime -> epochMillis` conversion is only covered by a test
that checks the result is a positive `Long`, so it would not catch a wrong
epoch calculation or a timezone/offset regression. Since this file already has
Arrow decoder tests, please add an assertion for the exact expected millis
value (ideally including a non-midnight case) for the new UTC interpretation.
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java:
##########
@@ -77,14 +68,6 @@ public void init(@Nullable Set<String> fields, @Nullable
RecordExtractorConfig c
protected void initConfig(@Nullable RecordExtractorConfig config) {
}
Review Comment:
Removing this protected helper is a source-compatible breaking change for
external `pinot-spi` consumers that subclass `BaseRecordExtractor`. Even if the
method is unused inside this repository, third-party extractors can still call
`shouldExtract(...)`, so deleting it from a published SPI will break their
builds on upgrade.
##########
pinot-plugins/pinot-input-format/pinot-orc/src/main/java/org/apache/pinot/plugin/inputformat/orc/ORCRecordExtractor.java:
##########
@@ -206,13 +206,12 @@ private static Object extractSingleValue(String field,
ColumnVector columnVector
}
case TIMESTAMP:
case TIMESTAMP_INSTANT: {
- // `time[rowId]` is epoch millis truncated to second precision;
sub-second nanos (0..999_999_999)
+ // `time[rowId]` is epoch millis (truncated to second precision);
sub-second nanos (0..999_999_999)
// live in `nanos[rowId]`. `TIMESTAMP_INSTANT` (UTC instant) shares
the same vector layout as
- // `TIMESTAMP` (local). Construct a `Timestamp` and set the full nanos
field to preserve precision.
+ // `TIMESTAMP` (local). Per the contract, output is `Long` epoch
millis; sub-millisecond precision is
+ // dropped here to match Pinot's TIMESTAMP physical storage.
TimestampColumnVector tsv = (TimestampColumnVector) columnVector;
- Timestamp ts = new Timestamp(tsv.time[rowId]);
- ts.setNanos(tsv.nanos[rowId]);
- return ts;
+ return tsv.time[rowId] + tsv.nanos[rowId] / 1_000_000L;
Review Comment:
This new manual epoch-millis calculation is only exercised with positive
timestamps in `ORCRecordExtractorTest`. Since the code now combines `time` and
`nanos` directly instead of delegating to `Timestamp`, it needs at least one
pre-epoch test case as well to catch sign/rounding regressions for negative
timestamps.
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/MapUtils.java:
##########
@@ -50,11 +49,13 @@ private MapUtils() {
}
private static final Logger LOGGER = LoggerFactory.getLogger(MapUtils.class);
- private static final ObjectMapper SORTED_MAPPER =
- new
ObjectMapper().configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS, true);
- private static final ObjectMapper UNSORTED_MAPPER = new ObjectMapper();
- private static final TypeReference<Map<String, Object>> MAP_TYPE_REFERENCE =
new TypeReference<>() {
- };
+
+ // Pinot's standard ObjectMapper config (JSR-310 / ISO-8601 for LocalDate /
LocalTime — see JsonUtils),
+ // plus key-ordered output for the sorted variant so byte output is a pure
function of the logical map.
+ // Writes only — read paths delegate to JsonUtils, whose DEFAULT_READER is
already JSR-310-aware.
+ private static final ObjectWriter SORTED_WRITER =
JsonUtils.newObjectMapperWithJavaTime()
+ .configure(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS,
true).writer();
+ private static final ObjectWriter UNSORTED_WRITER =
JsonUtils.newObjectMapperWithJavaTime().writer();
Review Comment:
This switches all `MapUtils` serialization paths to the new JSR-310-aware
mapper, but none of the existing `MapUtilsTest` cases cover
`LocalDate`/`LocalTime` values or nested maps/lists containing them. Because
this file already has round-trip tests for other value types, please add
analogous coverage here so regressions in the new mapper configuration are
caught in the code path that actually serializes `MAP` columns.
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java:
##########
@@ -157,11 +188,6 @@ public static JsonNode stringToJsonNode(String jsonString)
return DEFAULT_READER.readTree(jsonString);
}
Review Comment:
Deleting `jsonNodeToStringMap(...)` is a backward-incompatible change in the
published `pinot-spi` API. There are no in-repo callers anymore, but external
code can still call this public helper, so removing it will break upgrades for
downstream consumers even though the replacement is only an internal call-site
change here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]