Thomas Buhrmann created ARROW-3801: -------------------------------------- Summary: Pandas-Arrow roundtrip makes pd categorical index not writeable Key: ARROW-3801 URL: https://issues.apache.org/jira/browse/ARROW-3801 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.10.0 Reporter: Thomas Buhrmann
Serializing and deserializing a pandas series with categorical dtype will make the categorical index non-writeable, which in turn trips up pandas when e.g. reordering the categories, raising "ValueError: buffer source array is read-only" : {code} import pandas as pd import pyarrow as pa df = pd.Series([1,2,3], dtype='category', name="c1").to_frame() print("DType before:", repr(df.c1.dtype)) print("Writeable:", df.c1.cat.categories.values.flags.writeable) ro = df.c1.cat.reorder_categories([3,2,1]) print("DType reordered:", repr(ro.dtype), "\n") tbl = pa.Table.from_pandas(df) df2 = tbl.to_pandas() print("DType after:", repr(df2.c1.dtype)) print("Writeable:", df2.c1.cat.categories.values.flags.writeable) ro = df2.c1.cat.reorder_categories([3,2,1]) print("DType reordered:", repr(ro.dtype), "\n") {code} Outputs: {code:java} DType before: CategoricalDtype(categories=[1, 2, 3], ordered=False) Writeable: True DType reordered: CategoricalDtype(categories=[3, 2, 1], ordered=False) DType after: CategoricalDtype(categories=[1, 2, 3], ordered=False) Writeable: False --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-365-85b439586c1a> in <module> 12 print("DType after:", repr(df2.c1.dtype)) 13 print("Writeable:", df2.c1.cat.categories.values.flags.writeable) ---> 14 ro = df2.c1.cat.reorder_categories([3,2,1]) 15 print("DType reordered:", repr(ro.dtype), "\n") {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)