[ https://issues.apache.org/jira/browse/ARROW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reassigned ARROW-3650: ----------------------------------- Assignee: Joris Van den Bossche > [Python] Mixed column indexes are read back as strings > ------------------------------------------------------- > > Key: ARROW-3650 > URL: https://issues.apache.org/jira/browse/ARROW-3650 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.11.1 > Reporter: Armin Berres > Assignee: Joris Van den Bossche > Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Consider the following example: > {code:java} > df = pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['a > string', pd.to_datetime('2018/01/02')]) > table = pa.Table.from_pandas(df) > pq.write_table(table, 'test.parquet') > ref_df = pq.read_pandas('test.parquet').to_pandas() > print(df.columns) > # Index(['a string', 2018-01-02 00:00:00], dtype='object') > print(ref_df.columns) > # Index(['a string', '2018-01-02 00:00:00'], dtype='object') > {code} > The serialized data frame has an index with a string and a datetime field > (happened when resetting the index of a formerly datetime only column). > When reading the string back the datetime is converted into a string. > When looking at the schema I find {{"pandas_type": "mixed", "numpy_ty' > b'pe": "object"}} before serializing and {{"pandas_type": > "unicode", "numpy_' > b'type": "object"}} after reading back. So the schema was aware > of the mixed type but did not store the actual types. > The same happens with other types like numbers as well. One can produce > interesting situations: > {{pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['1', 1])}} > can be written but fails to be read back as the index is no more unique with > '1' showing up two times. > IIf this is not a bug but expected maybe the user should be somehow warned > that information is lost? Like a {{NotImplemented}} exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)