This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push: new e12d52f ARROW-5138: [Python] Add documentation about pandas preserve_index option e12d52f is described below commit e12d52fbede8e22c83d167b91b958360ac662747 Author: Wes McKinney <wesm+...@apache.org> AuthorDate: Thu Jun 27 15:55:09 2019 -0500 ARROW-5138: [Python] Add documentation about pandas preserve_index option The underlying issue reported in ARROW-5138 can now be addressed by passing `preserve_index=True` when using `Table.from_pandas` Author: Wes McKinney <wesm+...@apache.org> Closes #4728 from wesm/ARROW-5138 and squashes the following commits: 27451dbd4 <Wes McKinney> Add documentation about pandas preserve_index option --- docs/source/python/pandas.rst | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/source/python/pandas.rst b/docs/source/python/pandas.rst index aafbf57..83d997c 100644 --- a/docs/source/python/pandas.rst +++ b/docs/source/python/pandas.rst @@ -62,6 +62,10 @@ Conversion from a Table to a DataFrame is done by calling # Infer Arrow schema from pandas schema = pa.Schema.from_pandas(df) +By default ``pyarrow`` tries to preserve and restore the ``.index`` +data as accurately as possible. See the section below for more about +this, and how to disable this logic. + Series ------ @@ -71,6 +75,29 @@ convert a pandas Series to an Arrow Array using :meth:`pyarrow.Array.from_pandas As Arrow Arrays are always nullable, you can supply an optional mask using the ``mask`` parameter to mark all null-entries. +Handling pandas Indexes +----------------------- + +Methods like :meth:`pyarrow.Table.from_pandas` have a +``preserve_index`` option which defines how to preserve (store) or not +to preserve (to not store) the data in the ``index`` member of the +corresponding pandas object. This data is tracked using schema-level +metadata in the internal ``arrow::Schema`` object. + +The default of ``preserve_index`` is ``None``, which behaves as +follows: + +* ``RangeIndex`` is stored as metadata-only, not requiring any extra + storage. +* Other index types are stored as one or more physical data columns in + the resulting :class:`Table` + +To not store the index at all pass ``preserve_index=False``. Since +storing a ``RangeIndex`` can cause issues in some limited scenarios +(such as storing multiple DataFrame objects in a Parquet file), to +force all index data to be serialized in the resulting table, pass +``preserve_index=True``. + Type differences ----------------