[ https://issues.apache.org/jira/browse/ARROW-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17660675#comment-17660675 ]
Rok Mihevc commented on ARROW-3652: ----------------------------------- This issue has been migrated to [issue #19959|https://github.com/apache/arrow/issues/19959] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] CategoricalIndex is lost after reading back > ---------------------------------------------------- > > Key: ARROW-3652 > URL: https://issues.apache.org/jira/browse/ARROW-3652 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.11.1 > Reporter: Armin Berres > Assignee: Wes McKinney > Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a {{CategoricalIndex}} is written and read back the resulting index is > not more categorical. > {code} > df = pd.DataFrame([['a', 'b'], ['c', 'd']], columns=['c1', 'c2']) > df['c1'] = df['c1'].astype('category') > df = df.set_index(['c1']) > table = pa.Table.from_pandas(df) > pq.write_table(table, 'test.parquet') > ref_df = pq.read_pandas('test.parquet').to_pandas() > print(df.index) > # CategoricalIndex(['a', 'c'], categories=['a', 'c'], ordered=False, > name='c1', dtype='category') > print(ref_df.index) > # Index(['a', 'c'], dtype='object', name='c1') > {code} > In the metadata the information is correctly contained: > {code:java} > {"name": "c1", "field_name": "c1", "p' > b'andas_type": "categorical", "numpy_type": "int8", "metadata": > {"' > b'num_categories": 2, "ordered": false} > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)