Tom Augspurger created ARROW-1593: ------------------------------------- Summary: [PYTHON] serialize_pandas should pass through the preserve_index keyword Key: ARROW-1593 URL: https://issues.apache.org/jira/browse/ARROW-1593 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.7.0 Reporter: Tom Augspurger Assignee: Tom Augspurger Priority: Minor Fix For: 0.8.0
I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes. Overall things look good compared to the current implementation (using pickle). The biggest difference was pickle's ability to use pandas' RangeIndex to avoid serializing the entire Index of values when possible. I suspect that a "range type" isn't in scope for arrow, but in the meantime applications using Arrow could detect the `RangeIndex`, and pass {{ pyarrow.serialize_pandas(df, preserve_index=False) }} -- This message was sent by Atlassian JIRA (v6.4.14#64029)