[ https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-8066: -------------------------------- Summary: [Python] Specify behavior for converting tz-aware datetime.datetime objects (was: PyArrow discards timezones) > [Python] Specify behavior for converting tz-aware datetime.datetime objects > --------------------------------------------------------------------------- > > Key: ARROW-8066 > URL: https://issues.apache.org/jira/browse/ARROW-8066 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Reporter: Markovtsev Vadim > Priority: Major > > The original description is at > [https://github.com/pandas-dev/pandas/issues/32587] > h3. Code Sample, a copy-pastable example if possible > {code:python} > import pandas as pd > from datetime import datetime, timezone > df = pd.DataFrame.from_records([ > (1, datetime.now().replace(tzinfo=timezone.utc)), > (2, datetime.now().replace(tzinfo=timezone.min))], > columns=["1", "2"]) > print(df["2"]) > print() > df.to_feather("/tmp/1") > df2 = pd.read_feather("/tmp/1") > print(df2["2"]) > {code} > This code will output: > {noformat} > 0 2020-03-10 18:13:49.405598+00:00 > 1 2020-03-10 18:13:49.405626-23:59 > Name: 2, dtype: object > 0 2020-03-10 18:13:49.405598 > 1 2020-03-10 18:13:49.405626 > Name: 2, dtype: datetime64[ns] > {noformat} > h3. Problem description > The round-trip dtype changed from the correct `object` to incorrect > `datetime64`. Thus the timezones were discarded in Arrow and the timestamps > became invalid. > h3. Expected Output > (identical) > {noformat} > 0 2020-03-10 18:13:49.405598+00:00 > 1 2020-03-10 18:13:49.405626-23:59 > Name: 2, dtype: object > 0 2020-03-10 18:13:49.405598+00:00 > 1 2020-03-10 18:13:49.405626-23:59 > Name: 2, dtype: object > {noformat} > h3. Output of ``pd.show_versions()`` > {noformat} > INSTALLED VERSIONS > ------------------ > commit : None > python : 3.7.5.final.0 > python-bits : 64 > OS : Linux > OS-release : 5.3.0-40-generic > machine : x86_64 > processor : x86_64 > byteorder : little > LC_ALL : None > LANG : en_US.UTF-8 > LOCALE : en_US.UTF-8 > pandas : 1.0.1 > numpy : 1.17.4 > pytz : 2019.2 > dateutil : 2.7.3 > pip : 19.3.1 > setuptools : 42.0.1 > Cython : 0.29.14 > pytest : 5.3.1 > hypothesis : None > sphinx : None > blosc : None > feather : None > xlsxwriter : None > lxml.etree : 4.5.0 > html5lib : None > pymysql : None > psycopg2 : 2.8.4 (dt dec pq3 ext lo64) > jinja2 : 2.10.3 > IPython : 7.10.0 > pandas_datareader: None > bs4 : 4.8.1 > bottleneck : None > fastparquet : None > gcsfs : None > lxml.etree : 4.5.0 > matplotlib : 3.1.2 > numexpr : None > odfpy : None > openpyxl : None > pandas_gbq : None > pyarrow : 0.16.0 > pytables : None > pytest : 5.3.1 > pyxlsb : None > s3fs : None > scipy : 1.2.1 > sqlalchemy : 1.3.12 > tables : None > tabulate : None > xarray : None > xlrd : None > xlwt : None > xlsxwriter : None > numba : None > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)