[jira] [Updated] (ARROW-8641) [Python] Regression in feather: no longer supports permutation in column selection

Joris Van den Bossche (Jira) Wed, 06 May 2020 01:04:43 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joris Van den Bossche updated ARROW-8641:
-----------------------------------------
    Description: 
A quite annoying regression (original report from 
https://github.com/pandas-dev/pandas/issues/33878), is that when specifying 
{{columns}} to read, this now fails if the order of the columns is not exactly 
the same as in the file:

{code:python}
In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 
'c'])    

In [29]: from pyarrow import feather 

In [30]: feather.write_feather(table, "test.feather")   

# this works fine
In [32]: feather.read_table("test.feather", columns=['a', 'b'])                 
                                                                                
                                                   
Out[32]: 
pyarrow.Table
a: int64
b: int64

In [33]: feather.read_table("test.feather", columns=['b', 'a'])                 
                                                                                
                                                   
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-33-e01caeabb389> in <module>
----> 1 feather.read_table("test.feather", columns=['b', 'a'])

~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, 
memory_map)
    237         return reader.read_indices(columns)
    238     elif all(map(lambda t: t == str, column_types)):
--> 239         return reader.read_names(columns)
    240 
    241     column_type_names = [t.__name__ for t in column_types]

~/scipy/repos/arrow/python/pyarrow/feather.pxi in 
pyarrow.lib.FeatherReader.read_names()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Schema at index 0 was different: 
b: int64
a: int64
vs
a: int64
b: int64
{code}

  was:
A quite annoying regression (original report from 
https://github.com/pandas-dev/pandas/issues/33878), is that when specifying 
{{columns}} to read, this now fails if the order of the columns is not exactly 
the same as in the file:

{code: python}
In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 
'c'])    

In [29]: from pyarrow import feather 

In [30]: feather.write_feather(table, "test.feather")   

# this works fine
In [32]: feather.read_table("test.feather", columns=['a', 'b'])                 
                                                                                
                                                   
Out[32]: 
pyarrow.Table
a: int64
b: int64

In [33]: feather.read_table("test.feather", columns=['b', 'a'])                 
                                                                                
                                                   
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-33-e01caeabb389> in <module>
----> 1 feather.read_table("test.feather", columns=['b', 'a'])

~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, 
memory_map)
    237         return reader.read_indices(columns)
    238     elif all(map(lambda t: t == str, column_types)):
--> 239         return reader.read_names(columns)
    240 
    241     column_type_names = [t.__name__ for t in column_types]

~/scipy/repos/arrow/python/pyarrow/feather.pxi in 
pyarrow.lib.FeatherReader.read_names()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Schema at index 0 was different: 
b: int64
a: int64
vs
a: int64
b: int64
{code}


> [Python] Regression in feather: no longer supports permutation in column 
> selection
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-8641
>                 URL: https://issues.apache.org/jira/browse/ARROW-8641
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 1.0.0
>
>
> A quite annoying regression (original report from 
> https://github.com/pandas-dev/pandas/issues/33878), is that when specifying 
> {{columns}} to read, this now fails if the order of the columns is not 
> exactly the same as in the file:
> {code:python}
> In [27]: table = pa.table([[1, 2, 3], [4, 5, 6], [7, 8, 9]], names=['a', 'b', 
> 'c'])    
> In [29]: from pyarrow import feather 
> In [30]: feather.write_feather(table, "test.feather")   
> # this works fine
> In [32]: feather.read_table("test.feather", columns=['a', 'b'])               
>                                                                               
>                                                        
> Out[32]: 
> pyarrow.Table
> a: int64
> b: int64
> In [33]: feather.read_table("test.feather", columns=['b', 'a'])               
>                                                                               
>                                                        
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-33-e01caeabb389> in <module>
> ----> 1 feather.read_table("test.feather", columns=['b', 'a'])
> ~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns, 
> memory_map)
>     237         return reader.read_indices(columns)
>     238     elif all(map(lambda t: t == str, column_types)):
> --> 239         return reader.read_names(columns)
>     240 
>     241     column_type_names = [t.__name__ for t in column_types]
> ~/scipy/repos/arrow/python/pyarrow/feather.pxi in 
> pyarrow.lib.FeatherReader.read_names()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Schema at index 0 was different: 
> b: int64
> a: int64
> vs
> a: int64
> b: int64
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8641) [Python] Regression in feather: no longer supports permutation in column selection

Reply via email to