Kuinox opened a new pull request, #48255:
URL: https://github.com/apache/arrow/pull/48255
### Rationale for this change
pq.read_schema drops extension types (UUID comes back as
fixed_size_binary[16]), while ParquetFile.schema_arrow and read_table preserve
them. Schema inspection via metadata should match table/extension behavior.
### What changes are included in this PR?
- Plumb arrow_extensions_enabled into read_schema and return schema_arrow
when enabled so extension types are preserved.
- Add regression test ensuring UUID extension types are retained by
read_schema and downgraded to binary(16) when extensions are disabled.
### Are these changes tested?
- Yes: added unit test test_read_schema_uuid_extension_type
### Are there any user-facing changes?
- Behavior improvement: read_schema now preserves extension types (e.g.,
UUID) when extensions are enabled; no API break
Notes:
- I don't know if the fact the column types being returned are now
extension<arrow.uuid> instead of fixed_size_binary[16], is considered a
breaking change.
- This PR patch was AI generated, but I personally reviewed it, the scope is
small, and it looks fine to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]