[ https://issues.apache.org/jira/browse/ARROW-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche resolved ARROW-13756. ------------------------------------------- Fix Version/s: 7.0.0 Resolution: Fixed Issue resolved by pull request 11619 [https://github.com/apache/arrow/pull/11619] > [Python] Error in pandas conversion for datetimetz column index > --------------------------------------------------------------- > > Key: ARROW-13756 > URL: https://issues.apache.org/jira/browse/ARROW-13756 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 5.0.0 > Environment: Ubuntu 21.04 > Reporter: Andreas Wolf > Assignee: Alenka Frim > Priority: Major > Labels: pandas, pull-request-available > Fix For: 7.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following code fails with: > {code:java} > File "[...]/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1052, > in _pandas_type_to_numpy_type > return np.dtype(pandas_type) > TypeError: data type 'datetimetz' not understood{code} > Sample: > {code:java} > def run(): > filename = "test.parquet" > df = pd.DataFrame( > data=range(31), > columns=list("A"), > index=pd.date_range("2021-01-01", "2021-01-31", freq="D", tz="CET"), > ).T > table = pa.Table.from_pandas(df) > pq.write_to_dataset(table, root_path=filename) > result = pq.read_table(filename).to_pandas() > return result > if __name__ == "__main__": > run() > {code} > The code tries to store a dataframe where the columns are timezone aware > datetimes. > _Observations_: > If I remove the *.T* at the end of the dataframe, so that the datatime index > are rows it is working (but not what I want). > If I remove the timezone information *tz="CET"* the code is working. > I assume this bug is related to [Error in pandas conversion for datetimetz > row index|https://issues.apache.org/jira/browse/ARROW-1958] -- This message was sent by Atlassian Jira (v8.20.1#820001)