Jackie-Jiang commented on code in PR #18400:
URL: https://github.com/apache/pinot/pull/18400#discussion_r3180300573


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1590,6 +1700,9 @@ public static PinotDataType getSingleValueType(Class<?> 
cls) {
     if (cls == byte[].class) {
       return BYTES;
     }
+    if (cls == UUID.class) {
+      return UUID;
+    }

Review Comment:
   Fixed — added `UUID_ARRAY`, plus the same gap was present for the other 
newly-introduced contract types: `DATE_ARRAY` (LocalDate) and `TIME_ARRAY` 
(LocalTime). All three now have:
   
   - New enum entries (DATE_ARRAY, TIME_ARRAY, UUID_ARRAY) with `convert(...)` 
overrides routing through the corresponding source-type's `toLocalDateArray` / 
`toLocalTimeArray` / `toUuidArray`
   - Three new array-conversion methods on the base (`toLocalDateArray` / 
`toLocalTimeArray` / `toUuidArray`) that walk the source array and 
per-element-convert via `DATE.convert` / `TIME.convert` / `UUID.convert`
   - New cases in `getSingleValueType()` switch and 
`getMultiValueType(Class<?>)` for `LocalDate.class` / `LocalTime.class` / 
`UUID.class`
   - `toInternal` overrides per array enum (Integer[] of epoch-days for 
DATE_ARRAY, Long[] of millis-since-midnight for TIME_ARRAY, String[] of 
canonical form for UUID_ARRAY)
   
   `PinotDataTypeTest` got `testDateArray` / `testTimeArray` / `testUuidArray` 
covering identity, STRING_ARRAY, primitive-array and BYTES_ARRAY destination 
paths. `testGetMultipleValueType` now includes the new class → array-type 
entries.
   
   The specific `Object[] of UUID → BYTES_ARRAY` path the bot flagged is now 
exact: `UUID_ARRAY.toBytesArray` walks per-element via `UUID.toBytes` → 16-byte 
big-endian arrays.



##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetNativeRecordExtractor.java:
##########
@@ -311,11 +306,26 @@ private Map<String, Object> extractKeyValueMap(Group 
group) {
       Group keyValueGroup = group.getGroup(0, i);
       Object key = extractValue(keyValueGroup, keyIndex);
       Object value = extractValue(keyValueGroup, valueIndex);
-      map.put(key.toString(), value);
+      map.put(stringifyMapKey(key), value);
     }
     return map;
   }
 
+  /// Stringifies a map key per the Pinot `Map<String, Object>` contract. 
`byte[]` keys (unannotated BINARY /
+  /// FIXED) base64-encode; everything else (`String`, `Integer`, `Long`, 
`Float`, `Double`, `Boolean`,
+  /// `BigDecimal`, `UUID`, `LocalDate`, `LocalTime`, `Timestamp`) has a 
meaningful `toString()`.
+  ///
+  /// **Why base64 (not hex)**: this matches the Pinot MAP value's JSON ser/de 
path — Jackson serializes any
+  /// `byte[]` value as base64, so a base64-encoded key reads uniformly with 
the rest of a serialized map.
+  /// Pinot also uses hex via `BytesUtils.toHexString` for the BYTES→STRING 
column-type conversion, but the
+  /// MAP path goes through Jackson, so base64 wins on consistency here.
+  private static String stringifyMapKey(Object key) {
+    if (key instanceof byte[]) {
+      return Base64.getEncoder().encodeToString((byte[]) key);
+    }
+    return key.toString();

Review Comment:
   Fixed — `stringifyMapKey` now special-cases `Timestamp` to 
`Long.toString(timestamp.getTime())`, matching the JSON-serialization 
convention this PR keeps for Timestamp values (`JsonUtils` preserves Jackson's 
default `WRITE_DATES_AS_TIMESTAMPS = true`, so Timestamp values serialize as 
numeric epoch millis). Map keys now use the same numeric-millis form, so a 
Pinot Map column with Timestamp keys + Timestamp values produces consistent 
JSON like `{"1700000000000": 1700000000000}` instead of mixing ISO-string keys 
with numeric values, and the result is JVM-TZ-independent (was the original 
concern with `Timestamp#toString`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to