Jackie-Jiang commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3180300573
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1590,6 +1700,9 @@ public static PinotDataType getSingleValueType(Class<?>
cls) {
if (cls == byte[].class) {
return BYTES;
}
+ if (cls == UUID.class) {
+ return UUID;
+ }
Review Comment:
Fixed — added `UUID_ARRAY`, plus the same gap was present for the other
newly-introduced contract types: `DATE_ARRAY` (LocalDate) and `TIME_ARRAY`
(LocalTime). All three now have:
- New enum entries (DATE_ARRAY, TIME_ARRAY, UUID_ARRAY) with `convert(...)`
overrides routing through the corresponding source-type's `toLocalDateArray` /
`toLocalTimeArray` / `toUuidArray`
- Three new array-conversion methods on the base (`toLocalDateArray` /
`toLocalTimeArray` / `toUuidArray`) that walk the source array and
per-element-convert via `DATE.convert` / `TIME.convert` / `UUID.convert`
- New cases in `getSingleValueType()` switch and
`getMultiValueType(Class<?>)` for `LocalDate.class` / `LocalTime.class` /
`UUID.class`
- `toInternal` overrides per array enum (Integer[] of epoch-days for
DATE_ARRAY, Long[] of millis-since-midnight for TIME_ARRAY, String[] of
canonical form for UUID_ARRAY)
`PinotDataTypeTest` got `testDateArray` / `testTimeArray` / `testUuidArray`
covering identity, STRING_ARRAY, primitive-array and BYTES_ARRAY destination
paths. `testGetMultipleValueType` now includes the new class → array-type
entries.
The specific `Object[] of UUID → BYTES_ARRAY` path the bot flagged is now
exact: `UUID_ARRAY.toBytesArray` walks per-element via `UUID.toBytes` → 16-byte
big-endian arrays.
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetNativeRecordExtractor.java:
##########
@@ -311,11 +306,26 @@ private Map<String, Object> extractKeyValueMap(Group
group) {
Group keyValueGroup = group.getGroup(0, i);
Object key = extractValue(keyValueGroup, keyIndex);
Object value = extractValue(keyValueGroup, valueIndex);
- map.put(key.toString(), value);
+ map.put(stringifyMapKey(key), value);
}
return map;
}
+ /// Stringifies a map key per the Pinot `Map<String, Object>` contract.
`byte[]` keys (unannotated BINARY /
+ /// FIXED) base64-encode; everything else (`String`, `Integer`, `Long`,
`Float`, `Double`, `Boolean`,
+ /// `BigDecimal`, `UUID`, `LocalDate`, `LocalTime`, `Timestamp`) has a
meaningful `toString()`.
+ ///
+ /// **Why base64 (not hex)**: this matches the Pinot MAP value's JSON ser/de
path — Jackson serializes any
+ /// `byte[]` value as base64, so a base64-encoded key reads uniformly with
the rest of a serialized map.
+ /// Pinot also uses hex via `BytesUtils.toHexString` for the BYTES→STRING
column-type conversion, but the
+ /// MAP path goes through Jackson, so base64 wins on consistency here.
+ private static String stringifyMapKey(Object key) {
+ if (key instanceof byte[]) {
+ return Base64.getEncoder().encodeToString((byte[]) key);
+ }
+ return key.toString();
Review Comment:
Fixed — `stringifyMapKey` now special-cases `Timestamp` to
`Long.toString(timestamp.getTime())`, matching the JSON-serialization
convention this PR keeps for Timestamp values (`JsonUtils` preserves Jackson's
default `WRITE_DATES_AS_TIMESTAMPS = true`, so Timestamp values serialize as
numeric epoch millis). Map keys now use the same numeric-millis form, so a
Pinot Map column with Timestamp keys + Timestamp values produces consistent
JSON like `{"1700000000000": 1700000000000}` instead of mixing ISO-string keys
with numeric values, and the result is JVM-TZ-independent (was the original
concern with `Timestamp#toString`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]