Copilot commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3180068754
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1590,6 +1700,9 @@ public static PinotDataType getSingleValueType(Class<?>
cls) {
if (cls == byte[].class) {
return BYTES;
}
+ if (cls == UUID.class) {
+ return UUID;
+ }
Review Comment:
Adding `UUID` only to `getSingleValueType(Class<?>)` leaves the multi-value
path unchanged. `DataTypeTransformerUtils` determines MV source types via
`PinotDataType.getMultiValueType(values[0].getClass())`; for `Object[]` of
`java.util.UUID` this still falls back to `OBJECT_ARRAY`, and
`BYTES_ARRAY.convert(...)` then fails because `OBJECT.toBytes(...)` is
unsupported. As a result, ingesting a multi-value UUID logical type into a
Pinot `BYTES` column now breaks even though the new single-value UUID contract
says BYTES should receive the 16-byte form.
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetNativeRecordExtractor.java:
##########
@@ -311,11 +306,26 @@ private Map<String, Object> extractKeyValueMap(Group
group) {
Group keyValueGroup = group.getGroup(0, i);
Object key = extractValue(keyValueGroup, keyIndex);
Object value = extractValue(keyValueGroup, valueIndex);
- map.put(key.toString(), value);
+ map.put(stringifyMapKey(key), value);
}
return map;
}
+ /// Stringifies a map key per the Pinot `Map<String, Object>` contract.
`byte[]` keys (unannotated BINARY /
+ /// FIXED) base64-encode; everything else (`String`, `Integer`, `Long`,
`Float`, `Double`, `Boolean`,
+ /// `BigDecimal`, `UUID`, `LocalDate`, `LocalTime`, `Timestamp`) has a
meaningful `toString()`.
+ ///
+ /// **Why base64 (not hex)**: this matches the Pinot MAP value's JSON ser/de
path — Jackson serializes any
+ /// `byte[]` value as base64, so a base64-encoded key reads uniformly with
the rest of a serialized map.
+ /// Pinot also uses hex via `BytesUtils.toHexString` for the BYTES→STRING
column-type conversion, but the
+ /// MAP path goes through Jackson, so base64 wins on consistency here.
+ private static String stringifyMapKey(Object key) {
+ if (key instanceof byte[]) {
+ return Base64.getEncoder().encodeToString((byte[]) key);
+ }
+ return key.toString();
Review Comment:
`stringifyMapKey()` now treats `Timestamp` keys as safe to stringify with
`toString()`, but this PR itself documents in `JsonUtils` that
`Timestamp.toString()` is JVM-timezone-dependent. That means a Parquet MAP
whose key column is a timestamp logical type will ingest different string keys
on machines with different default time zones, even though the underlying
Parquet value is the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]