[ https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche resolved ARROW-8498. ------------------------------------------ Fix Version/s: 0.17.0 Assignee: Uwe Korn Resolution: Fixed > [Python] Schema.from_pandas fails on extension type, while Table.from_pandas > works > ---------------------------------------------------------------------------------- > > Key: ARROW-8498 > URL: https://issues.apache.org/jira/browse/ARROW-8498 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Reporter: Thomas Buhrmann > Assignee: Uwe Korn > Priority: Major > Fix For: 0.17.0 > > > While Table.from_pandas() seems to work as expected with extension types, > Schema.from_pandas() raises an ArrowTypeError: > {code:python} > df = pd.DataFrame({ > "x": pd.Series([1, 2, None], dtype="Int8"), > "y": pd.Series(["a", "b", None], dtype="category"), > "z": pd.Series(["ab", "bc", None], dtype="string"), > }) > print(pa.Table.from_pandas(df).schema) > print(pa.Schema.from_pandas(df)) > {code} > > Results in: > {noformat} > x: int8 > y: dictionary<values=string, indices=int8, ordered=0> > z: string > metadata > -------- > {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, > "' > b'stop": 3, "step": 1}], "column_indexes": [{"name": null, > "field_' > b'name": null, "pandas_type": "unicode", "numpy_type": "object", > "' > b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", > "f' > b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", > "m' > b'etadata": null}, {"name": "y", "field_name": "y", > "pandas_type":' > b' "categorical", "numpy_type": "int8", "metadata": > {"num_categori' > b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", > "pa' > b'ndas_type": "unicode", "numpy_type": "string", "metadata": > null}' > b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, > "pand' > b'as_version": "1.0.3"}'} > --------------------------------------------------------------------------- > ArrowTypeError Traceback (most recent call last) > ... > ArrowTypeError: Did not pass numpy.dtype object > {noformat} > I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should > result in the exact same object? -- This message was sent by Atlassian Jira (v8.3.4#803005)