[ https://issues.apache.org/jira/browse/ARROW-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grayden Shand updated ARROW-17388: ---------------------------------- Priority: Major (was: Minor) > Prevent corrupting files with Multiple matches for FieldRef.Name > ---------------------------------------------------------------- > > Key: ARROW-17388 > URL: https://issues.apache.org/jira/browse/ARROW-17388 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: MacOS, Python 3.10.3 > Reporter: Grayden Shand > Priority: Major > > {*}Version{*}: pyarrow 9.0.0 > > *Description* > Users can add a column with the the same name as an existing column to a > table via `pyarrow.Table.add_column()`. > > Additionally, that table can be written to a parquet file with > `pyarrow.parquet.write_table()`. > > However, the written file cannot be read with `pyarrow.parquet.read_table()` > due to having multiple columns with the same name. > > Flagging this as a bug because I believe anything that is successfully > written by `write_table()` should be readable by `read_table()`. > > *Minimum reproducible example* > ``` > >>> import pyarrow.parquet as pq > >>> import pyarrow as pa > >>> t = pa.Table.from_pydict(\{'a': [1,2,3]}) > >>> pq.write_table(t.add_column(0, 'a', pa.array([1.1,2.2,3.3])), > >>> 'test.parquet') > >>> pq.read_table('test.parquet') > pyarrow.lib.ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: double > a: int64 > __fragment_index: int32 > __batch_index: int32 > __last_in_fragment: bool > __filename: string > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)