liamphmurphy opened a new issue, #43893: URL: https://github.com/apache/arrow/issues/43893
### Describe the bug, including details regarding any error messages, version, and platform. Following a schema merge operation involving nested columns, PyArrow seems to struggle with loading data with the following error: `pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong order: Input fields: struct<c: int64> output fields: struct<c: int64, d: int64>` I have confirmed this does not happen with a schema merge that DOES NOT involve any nested columns. I believe this is a PyArrow specific problem as Spark does not have this problem. Below is an example of how this can be reproduced: ``` import pyarrow as pa import polars as pl from deltalake import write_deltalake # Create a pyarrow table, include a nested column 'd' df = pa.table({ "a": [1, 2, 3], "b": [{"c": 1}, {"c": 2}, {"c": 3}] }) # Create a PyArrow schema, include a nested column 'd' schema = pa.schema([ pa.field("a", pa.int64()), pa.field("b", pa.struct([ pa.field("c", pa.int64()) ])) ]) local_path = "./tables/merge_delta_table" # Write the table to delta lake write_deltalake(local_path, data=df, engine="rust", schema=schema, mode="append") # Create a new table with a different schema, adding df2 = pa.table({ "a": [4, 5, 6], "b": [{"d": 2, "c": 1}, {"c": 2}, {"c": 3}] }) schema2 = pa.schema([ pa.field("a", pa.int64()), pa.field("b", pa.struct([ pa.field("d", pa.int64()), pa.field("c", pa.int64()) ])) ]) # Write the new table to the same delta lake write_deltalake(local_path, data=df2, schema=schema2, engine="rust", mode="append", schema_mode="merge") # Now read the delta lake using polars df = pl.read_delta(local_path) print(df) ``` ### Component(s) Parquet, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org