Thomas Buhrmann created ARROW-3801:
--------------------------------------

             Summary: Pandas-Arrow roundtrip makes pd categorical index not 
writeable
                 Key: ARROW-3801
                 URL: https://issues.apache.org/jira/browse/ARROW-3801
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.10.0
            Reporter: Thomas Buhrmann


Serializing and deserializing a pandas series with categorical dtype will make 
the categorical index non-writeable, which in turn trips up pandas when e.g. 
reordering the categories, raising "ValueError: buffer source array is 
read-only" :
{code}
import pandas as pd
import pyarrow as pa

df = pd.Series([1,2,3], dtype='category', name="c1").to_frame()
print("DType before:", repr(df.c1.dtype))
print("Writeable:", df.c1.cat.categories.values.flags.writeable)
ro = df.c1.cat.reorder_categories([3,2,1])
print("DType reordered:", repr(ro.dtype), "\n")

tbl = pa.Table.from_pandas(df)
df2 = tbl.to_pandas()
print("DType after:", repr(df2.c1.dtype))
print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
ro = df2.c1.cat.reorder_categories([3,2,1])
print("DType reordered:", repr(ro.dtype), "\n")
{code}
 

Outputs:



 
{code:java}
DType before: CategoricalDtype(categories=[1, 2, 3], ordered=False)
Writeable: True
DType reordered: CategoricalDtype(categories=[3, 2, 1], ordered=False)
DType after: CategoricalDtype(categories=[1, 2, 3], ordered=False)
Writeable: False
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-365-85b439586c1a> in <module>
 12 print("DType after:", repr(df2.c1.dtype))
 13 print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
---> 14 ro = df2.c1.cat.reorder_categories([3,2,1])
 15 print("DType reordered:", repr(ro.dtype), "\n")
{code}
 

 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to