sergun commented on issue #37898:
URL: https://github.com/apache/arrow/issues/37898#issuecomment-1925169618
> @PoojaRavi1105
>
> 1. Currently parquet dataset doesn't support iceberg style schema
evolution using `unify_schema`
> 2. But when you set the schema in dataset explicitly yourself, it's able
to be read.
reg. 2
It doesn't work for me in case of adding / removing columns
(pyarrow==15.0.0). E.g.
1.parquet has scheme:
```
pa.schema([
('id', pa.int64()),
('x', pa.int64()),
('a', pa.struct([
('y', pa.int64()),
])),
])
```
2.parquet has scheme:
```
pa.schema([
('id', pa.int64()),
('y', pa.int64()),
('a', pa.struct([
('x', pa.int64()),
])),
])
```
When I read them by manually merged schema:
```
dataset = ds.dataset(["1.parquet", "2.parquet",], schema=
pa.schema([
('id', pa.int64()),
('x', pa.int64()),
('y', pa.int64()),
('a', pa.struct([
('x', pa.int64()),
('y', pa.int64()),
])),
])
)
```
get:
`pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong
order: Input fields: struct<x: int64> output fields: struct<x: int64, y: int64>
`
Guys, @AlenkaF are there some plans to support this in the roadmap?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]