xiangfu0 commented on code in PR #18325:
URL: https://github.com/apache/pinot/pull/18325#discussion_r3142549552
##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -63,4 +99,22 @@ Object handleDeprecatedTypes(Object value, Schema.Field
field) {
}
return value;
}
+
+ private boolean isListElementWrapper(Schema.Field field) {
+ Schema schema = getNonNullSchema(field.schema());
+ if (schema.getType() != Schema.Type.ARRAY) {
+ return false;
+ }
+ Schema elementSchema = getNonNullSchema(schema.getElementType());
+ return elementSchema.getType() == Schema.Type.RECORD &&
elementSchema.getFields().size() == 1
Review Comment:
Confirmed real regression — wrote a quick test for `array<record<UserTag,
fields=[element: string]>>` and watched it flatten to `["user-1"]`. Thanks for
catching it.
Added a discriminator on the inner record's name + namespace. parquet-avro
names the synthesized wrapper after the Parquet repeated group (`list` per the
3-level convention, `array` legacy) and either gives it no namespace (top-level
field) or a numeric-suffix namespace like `list2`/`list3` to disambiguate when
the same record name nests inside another record. User-authored Avro records
carry user-chosen names and meaningful namespaces, so requiring `name ∈ {list,
array}` AND `namespace == null/empty || namespace.matches(name + "\\d+")`
distinguishes the two without needing extra Parquet-schema plumbing.
Pushed
[9c566f0](https://github.com/apache/pinot/pull/18325/commits/9c566f0ecc) with
the discriminator and a regression test
`testAvroRecordReaderPreservesUserRecordWithSingleElementField` that
round-trips the user-authored shape and asserts the inner records survive as
Maps. All 19 module tests still pass.
Remaining edge case: a user authoring `record<list, no namespace,
fields=[element: T]>` (i.e. naming their own record `list` without a namespace)
would still false-positive. That seems vanishingly unlikely in practice — open
to thoughts if you want a stricter check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]