xiangfu0 commented on code in PR #18325:
URL: https://github.com/apache/pinot/pull/18325#discussion_r3142549552


##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -63,4 +99,22 @@ Object handleDeprecatedTypes(Object value, Schema.Field 
field) {
     }
     return value;
   }
+
+  private boolean isListElementWrapper(Schema.Field field) {
+    Schema schema = getNonNullSchema(field.schema());
+    if (schema.getType() != Schema.Type.ARRAY) {
+      return false;
+    }
+    Schema elementSchema = getNonNullSchema(schema.getElementType());
+    return elementSchema.getType() == Schema.Type.RECORD && 
elementSchema.getFields().size() == 1

Review Comment:
   Confirmed real regression — wrote a quick test for `array<record<UserTag, 
fields=[element: string]>>` and watched it flatten to `["user-1"]`. Thanks for 
catching it.
   
   Added a discriminator on the inner record's name + namespace. parquet-avro 
names the synthesized wrapper after the Parquet repeated group (`list` per the 
3-level convention, `array` legacy) and either gives it no namespace (top-level 
field) or a numeric-suffix namespace like `list2`/`list3` to disambiguate when 
the same record name nests inside another record. User-authored Avro records 
carry user-chosen names and meaningful namespaces, so requiring `name ∈ {list, 
array}` AND `namespace == null/empty || namespace.matches(name + "\\d+")` 
distinguishes the two without needing extra Parquet-schema plumbing.
   
   Pushed 
[9c566f0](https://github.com/apache/pinot/pull/18325/commits/9c566f0ecc) with 
the discriminator and a regression test 
`testAvroRecordReaderPreservesUserRecordWithSingleElementField` that 
round-trips the user-authored shape and asserts the inner records survive as 
Maps. All 19 module tests still pass.
   
   Remaining edge case: a user authoring `record<list, no namespace, 
fields=[element: T]>` (i.e. naming their own record `list` without a namespace) 
would still false-positive. That seems vanishingly unlikely in practice — open 
to thoughts if you want a stricter check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to