[I] [Python] pa.Table.to_pandas(zero_copy_only=True) never succeeds [arrow]

via GitHub Tue, 12 Dec 2023 02:41:35 -0800


zblz opened a new issue, #39194:
URL: https://github.com/apache/arrow/issues/39194


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When converting a Table to pandas, adding the `zero_copy_only=True` argument 
makes it always fail with ArrowInvalid.
   
   ```python
   In [17]: table = pa.table({'a': [1.0, 2.0, 3.0], 'b': ['x','y','z']})
   
   In [19]: table.schema
   Out[19]: 
   a: double
   b: string
   
   In [20]: table.to_pandas(zero_copy_only=True)
   ...
   ArrowInvalid: Cannot do zero copy conversion into multi-column DataFrame 
block
   ```
   Note that this is a similar bug report to  #38644, but in that case the 
interaction with `types_mapper` is considered to be the reason for the 
Exception, whereas I found it is happening even without setting type mapping.
   
   The keyword works fine for ChunkedArray conversions:
   ```python
   In [23]: table['a'].to_pandas(zero_copy_only=True)
   Out[23]: 
   0    1.0
   1    2.0
   2    3.0
   Name: a, dtype: float64
   ```
   even when adding type mapping to pandas Arrow types:
   ```python
   In [22]: table['a'].to_pandas(zero_copy_only=True, 
types_mapper=pd.ArrowDtype)
   Out[22]: 
   0    1.0
   1    2.0
   2    3.0
   Name: a, dtype: double[pyarrow]
   ```
   and fails on non-zero copy operations, like string to categorical conversion:
   ```python
   In [28]: table['b'].to_pandas(zero_copy_only=True, 
strings_to_categorical=True)
   ArrowInvalid: Need to dictionary encode a column, but only zero-copy 
conversions allowed
   ```
   
   Is this expected behaviour? I.e., should the argument always flag `Table -> 
DataFrame` conversions as not being zero copy? In that case, it might make 
sense to remove the argument from `pa.Table.to_pandas` altogether, as it will 
always result in an exception thrown.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Python] pa.Table.to_pandas(zero_copy_only=True) never succeeds [arrow]

Reply via email to