Jeff Reback created ARROW-1286:
----------------------------------
Summary: PYTHON: support Categorical serialization to/from parquet
Key: ARROW-1286
URL: https://issues.apache.org/jira/browse/ARROW-1286
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Jeff Reback
related to https://issues.apache.org/jira/browse/ARROW-439
pandas Categorical types are not NotImplemented. minimal example.
pandas 0.20.3 & pyarrow 0.5.0
{code}
In [1]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))})
In [2]: df.dtypes
Out[2]:
a category
dtype: object
In [4]: import pyarrow
In [5]: import pyarrow.parquet
In [6]: table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
...: pyarrow.parquet.write_table(
...: table, 'foo.pq')
...:
...:
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-6-4512e9a2e15e> in <module>()
1 table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
2 pyarrow.parquet.write_table(
----> 3 table, 'foo.pq')
4
/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py
in write_table(table, where, row_group_size, version, use_dictionary,
compression, use_deprecated_int96_timestamps, **kwargs)
770 version=version,
771 use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
--> 772 writer = ParquetWriter(where, table.schema, **options)
773 writer.write_table(table, row_group_size=row_group_size)
774 writer.close()
_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
error.pxi in pyarrow.lib.check_status()
ArrowNotImplementedError: NotImplemented: unhandled type
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)