tustvold commented on issue #6733: URL: https://github.com/apache/arrow-rs/issues/6733#issuecomment-2479487010
So I don't think this is a bug per-se, the parquet writer converts the arrow schema faithfully into parquet, preserving the field name of the list elements. The problem arises because the default within the arrow ecosystem is to call this "item" and not "element". ``` >>> import pyarrow as pa >>> pa.list_(pa.string()) ListType(list<item: string>) >>> pa.list_(pa.string()).field(0).name 'item' ``` The reason this matters is because the parquet schema is authoritative, that is when reading back a parquet file with a field name of "element", the arrow schema should reflect this. Therefore if we coerced to "item" the schema would not roundtrip as people might expect. I think the way to handle this is probably #1938, where we add an option to coerce arrow types to be more compatible with parquet's type system, with the understanding that things may not always roundtrip completely faithfully. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
