Diego Argueta created ARROW-3703:
------------------------------------
Summary: [Python] DataFrame.to_parquet crashes if datetime column
has time zones
Key: ARROW-3703
URL: https://issues.apache.org/jira/browse/ARROW-3703
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.11.1
Environment: pandas 0.23.4
pyarrow 0.11.1
Python 3.5 - 3.7
MacOS High Sierra (10.13.6)
Reporter: Diego Argueta
On CPython 3.5.6, 3.6.6, and 3.7.0, creating a Pandas DataFrame with a
{{datetime.datetime}} object serializes to Parquet just fine, but crashes with
an {{AttributeError}} if you try to use the built-in {{timezone}} objects.
To reproduce:
{code:java}
import datetime as dt
import pandas as pd
df = pd.DataFrame({'foo': [dt.datetime(2018, 1, 1, 1, 23, 45,
tzinfo=dt.timezone.utc)]})
df.to_parquet('data.parq')
{code}
The following exception results:
{noformat}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/core/frame.py",
line 1945, in to_parquet
compression=compression, **kwargs)
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py",
line 257, in to_parquet
return impl.write(df, path, compression=compression, **kwargs)
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py",
line 118, in write
table = self.api.Table.from_pandas(df)
File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
line 381, in dataframe_to_arrays
convert_types)]
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
line 380, in <listcomp>
for c, t in zip(columns_to_convert,
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
line 370, in convert_column
return pa.array(col, type=ty, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 167, in pyarrow.lib.array
File
"/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py",
line 409, in get_datetimetz_type
type_ = pa.timestamp(unit, tz)
File "pyarrow/types.pxi", line 1038, in pyarrow.lib.timestamp
File "pyarrow/types.pxi", line 955, in pyarrow.lib.tzinfo_to_string
AttributeError: 'datetime.timezone' object has no attribute 'zone'
'datetime.timezone' object has no attribute 'zone'
{noformat}
This doesn't happen if you use {{pytz.UTC}} as the timezone object.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)