agkphysics commented on issue #39914: URL: https://github.com/apache/arrow/issues/39914#issuecomment-2809153621
I think this still fails when specifying a subset of columns to load that doesn't include the list column: ```python import pandas as pd import pyarrow as pa a = pd.Series(pa.array([[1, 2, 3]]), dtype=pd.ArrowDtype(pa.list_(pa.int64()))) b = pd.Series(pa.array([1]), dtype=pd.ArrowDtype(pa.int64())) df = pd.DataFrame({"a": a, "b": b}) df.to_parquet("test.parquet", index=False) pd.read_parquet("test.parquet", dtype_backend="pyarrow") # Works pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["a"]) # Works pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"]) # Fails ``` Fails with ``` Traceback (most recent call last): File "/.../test.py", line 10, in <module> pd.read_parquet("test.parquet", dtype_backend="pyarrow", columns=["b"]) # Fails ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 667, in read_parquet return impl.read( ^^^^^^^^^^ File "/.../.venv/lib/python3.12/site-packages/pandas/io/parquet.py", line 281, in read result = pa_table.to_pandas(**to_pandas_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 889, in pyarrow.lib._PandasConvertible.to_pandas File "pyarrow/table.pxi", line 5132, in pyarrow.lib.Table._to_pandas File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 796, in table_to_dataframe ext_columns_dtypes = _get_extension_dtypes( ^^^^^^^^^^^^^^^^^^^^^^ File "/.../.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 899, in _get_extension_dtypes pandas_dtype = _pandas_api.pandas_dtype(dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/pandas-shim.pxi", line 150, in pyarrow.lib._PandasAPIShim.pandas_dtype File "pyarrow/pandas-shim.pxi", line 153, in pyarrow.lib._PandasAPIShim.pandas_dtype File "/.../.venv/lib/python3.12/site-packages/pandas/core/dtypes/common.py", line 1645, in pandas_dtype npdtype = np.dtype(dtype) ^^^^^^^^^^^^^^^ TypeError: data type 'list<item: int64>[pyarrow]' not understood ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org