[ https://issues.apache.org/jira/browse/ARROW-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-8641: ----------------------------------------- Description: A quite annoying regression (original report from https://github.com/pandas-dev/pandas/issues/33878), is that when specifying {{columns}} to read, this now fails if the order of the columns is not exactly the same as in the file: {code:python} In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 'c']) In [29]: from pyarrow import feather In [30]: feather.write_feather(table, "test.feather") # this works fine In [32]: feather.read_table("test.feather", columns=['a', 'b']) Out[32]: pyarrow.Table a: int64 b: int64 In [33]: feather.read_table("test.feather", columns=['b', 'a']) --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-33-e01caeabb389> in <module> ----> 1 feather.read_table("test.feather", columns=['b', 'a']) ~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, memory_map) 237 return reader.read_indices(columns) 238 elif all(map(lambda t: t == str, column_types)): --> 239 return reader.read_names(columns) 240 241 column_type_names = [t.__name__ for t in column_types] ~/scipy/repos/arrow/python/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.read_names() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Schema at index 0 was different: b: int64 a: int64 vs a: int64 b: int64 {code} was: A quite annoying regression (original report from https://github.com/pandas-dev/pandas/issues/33878), is that when specifying {{columns}} to read, this now fails if the order of the columns is not exactly the same as in the file: {code: python} In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 'c']) In [29]: from pyarrow import feather In [30]: feather.write_feather(table, "test.feather") # this works fine In [32]: feather.read_table("test.feather", columns=['a', 'b']) Out[32]: pyarrow.Table a: int64 b: int64 In [33]: feather.read_table("test.feather", columns=['b', 'a']) --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-33-e01caeabb389> in <module> ----> 1 feather.read_table("test.feather", columns=['b', 'a']) ~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, memory_map) 237 return reader.read_indices(columns) 238 elif all(map(lambda t: t == str, column_types)): --> 239 return reader.read_names(columns) 240 241 column_type_names = [t.__name__ for t in column_types] ~/scipy/repos/arrow/python/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.read_names() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Schema at index 0 was different: b: int64 a: int64 vs a: int64 b: int64 {code} > [Python] Regression in feather: no longer supports permutation in column > selection > ---------------------------------------------------------------------------------- > > Key: ARROW-8641 > URL: https://issues.apache.org/jira/browse/ARROW-8641 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Reporter: Joris Van den Bossche > Priority: Major > Fix For: 1.0.0 > > > A quite annoying regression (original report from > https://github.com/pandas-dev/pandas/issues/33878), is that when specifying > {{columns}} to read, this now fails if the order of the columns is not > exactly the same as in the file: > {code:python} > In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', > 'c']) > In [29]: from pyarrow import feather > In [30]: feather.write_feather(table, "test.feather") > # this works fine > In [32]: feather.read_table("test.feather", columns=['a', 'b']) > > > Out[32]: > pyarrow.Table > a: int64 > b: int64 > In [33]: feather.read_table("test.feather", columns=['b', 'a']) > > > --------------------------------------------------------------------------- > ArrowInvalid Traceback (most recent call last) > <ipython-input-33-e01caeabb389> in <module> > ----> 1 feather.read_table("test.feather", columns=['b', 'a']) > ~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, > memory_map) > 237 return reader.read_indices(columns) > 238 elif all(map(lambda t: t == str, column_types)): > --> 239 return reader.read_names(columns) > 240 > 241 column_type_names = [t.__name__ for t in column_types] > ~/scipy/repos/arrow/python/pyarrow/feather.pxi in > pyarrow.lib.FeatherReader.read_names() > ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() > ArrowInvalid: Schema at index 0 was different: > b: int64 > a: int64 > vs > a: int64 > b: int64 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)