[jira] [Commented] (ARROW-3651) [Python] Datetimes from non-DateTimeIndex cannot be deserialized

Armin Berres (JIRA) Tue, 30 Oct 2018 02:44:20 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668434#comment-16668434
 ]


Armin Berres commented on ARROW-3651:
-------------------------------------

Not sure but maybe Pandas should behave different in this case as well and 
create a {{DateTimeIndex}} index in this case as the complete index consists of 
{{Timestamp}} objects?

{{df.columns = pd.to_datetime(df.columns)}} in the code above mitigates the 
problem.

 

> [Python] Datetimes from non-DateTimeIndex cannot be deserialized
> ----------------------------------------------------------------
>
>                 Key: ARROW-3651
>                 URL: https://issues.apache.org/jira/browse/ARROW-3651
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>            Reporter: Armin Berres
>            Priority: Major
>
> Given an index which contains datetimes but is no DateTimeIndex writing the 
> file works but reading back fails.
> {code:python}
> df = pd.DataFrame(1, index=pd.MultiIndex.from_arrays([[1,2],[3,4]]), 
> columns=[pd.to_datetime("2018/01/01")])
> # columns index is no DateTimeIndex anymore
> df = df.reset_index().set_index(['level_0', 'level_1'])
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'test.parquet')
> pq.read_pandas('test.parquet').to_pandas()
> {code}
> results in 
> {code}
> KeyError                                  Traceback (most recent call last)
> ~/venv/mpptool/lib/python3.7/site-packages/pyarrow/pandas_compat.py in 
> _pandas_type_to_numpy_type(pandas_type)
>     676     try:
> --> 677         return _pandas_logical_type_map[pandas_type]
>     678     except KeyError:
> KeyError: 'datetime'
> {code}
> The created schema:
> {code}
> 2018-01-01 00:00:00: int64
> level_0: int64
> level_1: int64
> metadata
> --------
> {b'pandas': b'{"index_columns": ["level_0", "level_1"], "column_indexes": 
> [{"n'
>             b'ame": null, "field_name": null, "pandas_type": "datetime", 
> "nump'
>             b'y_type": "object", "metadata": null}], "columns": [{"name": 
> "201'
>             b'8-01-01 00:00:00", "field_name": "2018-01-01 00:00:00", 
> "pandas_'
>             b'type": "int64", "numpy_type": "int64", "metadata": null}, 
> {"name'
>             b'": "level_0", "field_name": "level_0", "pandas_type": "int64", 
> "'
>             b'numpy_type": "int64", "metadata": null}, {"name": "level_1", 
> "fi'
>             b'eld_name": "level_1", "pandas_type": "int64", "numpy_type": 
> "int'
>             b'64", "metadata": null}], "pandas_version": "0.23.4"}'}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3651) [Python] Datetimes from non-DateTimeIndex cannot be deserialized

Reply via email to