[ 
https://issues.apache.org/jira/browse/ARROW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3650:
-----------------------------------

    Assignee: Joris Van den Bossche

> [Python] Mixed column indexes are read back as strings 
> -------------------------------------------------------
>
>                 Key: ARROW-3650
>                 URL: https://issues.apache.org/jira/browse/ARROW-3650
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>            Reporter: Armin Berres
>            Assignee: Joris Van den Bossche
>            Priority: Major
>              Labels: parquet, pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Consider the following example: 
> {code:java}
> df = pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['a 
> string', pd.to_datetime('2018/01/02')])
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'test.parquet')
> ref_df = pq.read_pandas('test.parquet').to_pandas()
> print(df.columns)
> # Index(['a string', 2018-01-02 00:00:00], dtype='object')
> print(ref_df.columns)
> # Index(['a string', '2018-01-02 00:00:00'], dtype='object')
> {code}
> The serialized data frame has an index with a string and a datetime field 
> (happened when resetting the index of a formerly datetime only column).
> When reading the string back the datetime is converted into a string.
> When looking at the schema I find {{"pandas_type": "mixed", "numpy_ty'
>             b'pe": "object"}} before serializing and {{"pandas_type": 
> "unicode", "numpy_'
>             b'type": "object"}} after reading back. So the schema was aware 
> of the mixed type but did not store the actual types.
> The same happens with other types like numbers as well. One can produce 
> interesting situations:
> {{pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['1', 1])}} 
> can be written but fails to be read back as the index is no more unique with 
> '1' showing up two times.
> IIf this is not a bug but expected maybe the user should be somehow warned 
> that information is lost? Like a {{NotImplemented}} exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to