agkphysics commented on issue #39914:
URL: https://github.com/apache/arrow/issues/39914#issuecomment-2809153621
I think this still fails when specifying a subset of columns to load that
doesn't include the list column:
```python
import pandas as pd
import pyarrow as pa
a = pd.Series(pa.array([[1, 2, 3]]),
dtype=pd.ArrowDtype(pa.list_(pa.int64())))
b = pd.Series(pa.array([1]), dtype=pd.ArrowDtype(pa.int64()))
df = pd.DataFrame({"a": a, "b": b})
df.to_parquet("test.parquet", index=False)
pd.read_parquet("test.parquet", dtype_backend="pyarrow") # Works
pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["a"]) #
Works
pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"]) #
Fails
```
Fails with
```
Traceback (most recent call last):
File "/.../test.py", line 10, in <module>
pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"])
# Fails
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line
667, in read_parquet
return impl.read(
^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line
281, in read
result = pa_table.to_pandas(**to_pandas_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 889, in
pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 5132, in pyarrow.lib.Table._to_pandas
File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py",
line 796, in table_to_dataframe
ext_columns_dtypes = _get_extension_dtypes(
^^^^^^^^^^^^^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py",
line 899, in _get_extension_dtypes
pandas_dtype = _pandas_api.pandas_dtype(dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/pandas-shim.pxi", line 150, in
pyarrow.lib._PandasAPIShim.pandas_dtype
File "pyarrow/pandas-shim.pxi", line 153, in
pyarrow.lib._PandasAPIShim.pandas_dtype
File
"/.../.venv/lib/python3.12/site-packages/pandas/core/dtypes/common.py", line
1645, in pandas_dtype
npdtype = np.dtype(dtype)
^^^^^^^^^^^^^^^
TypeError: data type 'list<item: int64>[pyarrow]' not understood
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]