zblz opened a new issue, #39194:
URL: https://github.com/apache/arrow/issues/39194
### Describe the bug, including details regarding any error messages,
version, and platform.
When converting a Table to pandas, adding the `zero_copy_only=True` argument
makes it always fail with ArrowInvalid.
```python
In [17]: table = pa.table({'a': [1.0, 2.0, 3.0], 'b': ['x','y','z']})
In [19]: table.schema
Out[19]:
a: double
b: string
In [20]: table.to_pandas(zero_copy_only=True)
...
ArrowInvalid: Cannot do zero copy conversion into multi-column DataFrame
block
```
Note that this is a similar bug report to #38644, but in that case the
interaction with `types_mapper` is considered to be the reason for the
Exception, whereas I found it is happening even without setting type mapping.
The keyword works fine for ChunkedArray conversions:
```python
In [23]: table['a'].to_pandas(zero_copy_only=True)
Out[23]:
0 1.0
1 2.0
2 3.0
Name: a, dtype: float64
```
even when adding type mapping to pandas Arrow types:
```python
In [22]: table['a'].to_pandas(zero_copy_only=True,
types_mapper=pd.ArrowDtype)
Out[22]:
0 1.0
1 2.0
2 3.0
Name: a, dtype: double[pyarrow]
```
and fails on non-zero copy operations, like string to categorical conversion:
```python
In [28]: table['b'].to_pandas(zero_copy_only=True,
strings_to_categorical=True)
ArrowInvalid: Need to dictionary encode a column, but only zero-copy
conversions allowed
```
Is this expected behaviour? I.e., should the argument always flag `Table ->
DataFrame` conversions as not being zero copy? In that case, it might make
sense to remove the argument from `pa.Table.to_pandas` altogether, as it will
always result in an exception thrown.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]