Tom Augspurger created ARROW-1593:
-------------------------------------
Summary: [PYTHON] serialize_pandas should pass through the
preserve_index keyword
Key: ARROW-1593
URL: https://issues.apache.org/jira/browse/ARROW-1593
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Assignee: Tom Augspurger
Priority: Minor
Fix For: 0.8.0
I'm doing some benchmarking of Arrow serialization for dask.distributed to
serialize dataframes.
Overall things look good compared to the current implementation (using pickle).
The biggest difference was pickle's ability to use pandas' RangeIndex to avoid
serializing the entire Index of values when possible.
I suspect that a "range type" isn't in scope for arrow, but in the meantime
applications using Arrow could detect the `RangeIndex`, and pass {{
pyarrow.serialize_pandas(df, preserve_index=False) }}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)