Galuh Sahid created ARROW-6302: ---------------------------------- Summary: [Python] parquet categorical support doesn't preserve order Key: ARROW-6302 URL: https://issues.apache.org/jira/browse/ARROW-6302 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.0 Reporter: Galuh Sahid
In pandas, I tried roundtripping to parquet with {{to_parquet}} and {{read_parquet}}. It preserves categorical dtypes but does not preserve their order. {code:python} import pandas as pd from pandas.io.parquet import read_parquet, to_parquet df = pd.DataFrame() df["a"] = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=True) df.to_parquet(<path>) actual = read_parquet(<path>) df["a"] 0 NaN 1 b 2 c 3 NaN Name: a, dtype: category Categories (3, object): [b < c < d] actual["a"] 0 NaN 1 b 2 c 3 NaN Name: a, dtype: category Categories (3, object): [b, c, d] {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)