ggreco opened a new issue, #6733: URL: https://github.com/apache/arrow-rs/issues/6733
**Describe the bug** arrow-rs generated .parquet files where the schema implies a nested structure should call the list item `element` as of parquet specifications: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists ... the files generated are instead using `item`, probably some legacy code was used to build the code. A similar issue has been recently fixed in polars-rs: https://github.com/pola-rs/polars/pull/17803 Pyarrow let you use `item` instead of `element` (default) to support legacy files, but IMHO arrow-rs should not generate legacy parquet files! The code in arrow-rs that implement this is: https://github.com/apache/arrow-rs/blob/master/arrow-schema/src/field.rs#L147 IMHO the fix will just involve a single line change, I can create a PR, but I want to be sure I'm not reading the specs in the wrong way or there is a reason for hardcoding `item` since it seems too simple... **To Reproduce** Generate a nested parquet file, or use the one attached to this issue and verify (with an hex editor, parquet-schema from this REPO or with a GUI tool that shows the parquet schema like "parquet floor"), that the type name associated to the list item is always `item` instead of `element`. Using [example_parquet.zip](https://github.com/user-attachments/files/17776513/example_parquet.zip) the file attached to this ticket that follow the schema will be reported by `arrow-schema` : ``` { REQUIRED BYTE_ARRAY school (STRING); REQUIRED group students (LIST) { REPEATED group list { OPTIONAL group item { REQUIRED BYTE_ARRAY name (STRING); REQUIRED INT32 age; } } } REQUIRED group teachers (LIST) { REPEATED group list { OPTIONAL group item { REQUIRED BYTE_ARRAY name (STRING); REQUIRED INT32 age; } } } } ``` the expected value was: ``` { REQUIRED BYTE_ARRAY school (STRING); REQUIRED group students (LIST) { REPEATED group list { OPTIONAL group element { REQUIRED BYTE_ARRAY name (STRING); REQUIRED INT32 age; } } } REQUIRED group teachers (LIST) { REPEATED group list { OPTIONAL group element { REQUIRED BYTE_ARRAY name (STRING); REQUIRED INT32 age; } } } } ``` I can get `parquet-schema` to output `element` instead of `item` when generating the parquet file from python or .net. In the hex editor you will see `students.list.item.name` instead of the expected `students.list.element.name`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
