Ian Rose created ARROW-16838: -------------------------------- Summary: Schema inference for pandas extension dtypes fails on indexes Key: ARROW-16838 URL: https://issues.apache.org/jira/browse/ARROW-16838 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: Ian Rose
Hi! pa.Schema.from_pandas called on a dataframe whose index is a pandas extension dtype (e.g., string[python]) results in an error: {code:python} import pyarrow as pa df = pd.DataFrame({"a": [1, 2]}, index=pd.Index(["A", "B"], dtype="string")) pa.Schema.from_pandas(df) {code} produces {code:python} AttributeError Traceback (most recent call last) /tmp/ipykernel_1827952/3691394220.py in <module> 1 import pyarrow as pa 2 df = pd.DataFrame({"a": [1, 2]}, index=pd.Index(["A", "B"], dtype="string")) ----> 3 pa.Schema.from_pandas(df) ~/miniconda3/envs/dask/lib/python3.8/site-packages/pyarrow/types.pxi in pyarrow.lib.Schema.from_pandas() ~/miniconda3/envs/dask/lib/python3.8/site-packages/pyarrow/pandas_compat.py in dataframe_to_types(df, preserve_index, columns) 527 type_ = pa.array(c, from_pandas=True).type 528 elif _pandas_api.is_extension_array_dtype(values): --> 529 type_ = pa.array(c.head(0), from_pandas=True).type 530 else: 531 values, type_ = get_datetimetz_type(values, c.dtype, None) AttributeError: 'Index' object has no attribute 'head' {code} If I remove the `head` call, or convert the index to a series manually, things work. Reported downstream in https://github.com/dask/dask/issues/9186 Related issue from a couple of years ago: https://issues.apache.org/jira/browse/ARROW-8159 -- This message was sent by Atlassian Jira (v8.20.7#820007)