Re: [I] [Python] Using unify_schema() during schema evolution fails [arrow]

via GitHub Fri, 02 Feb 2024 22:42:11 -0800


sergun commented on issue #37898:
URL: https://github.com/apache/arrow/issues/37898#issuecomment-1925169618


   > @PoojaRavi1105
   > 
   > 1. Currently parquet dataset doesn't support iceberg style schema 
evolution using `unify_schema`
   > 2. But when you set the schema in dataset explicitly yourself, it's able 
to be read.
   
   reg. 2
   
   It doesn't work for me in case of adding / removing columns 
(pyarrow==15.0.0). E.g. 
   
   1.parquet has scheme:
   ```
           pa.schema([
               ('id', pa.int64()),
               ('x', pa.int64()),
               ('a', pa.struct([
                   ('y', pa.int64()),
               ])),
           ])
   
   ```
   2.parquet has scheme:
   ```
           pa.schema([
               ('id', pa.int64()),
               ('y', pa.int64()),
               ('a', pa.struct([
                   ('x', pa.int64()),
               ])),
           ])
   
   ```
   
   When I read them by manually merged schema:
   
   ```
       dataset = ds.dataset(["1.parquet", "2.parquet",], schema=
           pa.schema([
               ('id', pa.int64()),
               ('x', pa.int64()),
               ('y', pa.int64()),
               ('a', pa.struct([
                   ('x', pa.int64()),
                   ('y', pa.int64()),
               ])),
           ])
       )
   
   ```
   get:
   
   `pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong 
order: Input fields: struct<x: int64> output fields: struct<x: int64, y: int64>
   `
   Guys, @AlenkaF are there some plans to support this in the roadmap?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Using unify_schema() during schema evolution fails [arrow]

Reply via email to