[ https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markovtsev Vadim updated ARROW-8066: ------------------------------------ Description: The original description is at [https://github.com/pandas-dev/pandas/issues/32587] #### Code Sample, a copy-pastable example if possible {code:python} import pandas as pd from datetime import datetime, timezone df = pd.DataFrame.from_records([ (1, datetime.now().replace(tzinfo=timezone.utc)), (2, datetime.now().replace(tzinfo=timezone.min))], columns=["1", "2"]) print(df["2"]) print() df.to_feather("/tmp/1") df2 = pd.read_feather("/tmp/1") print(df2["2"]) {code} This code will output: {noformat} 0 2020-03-10 18:13:49.405598+00:00 1 2020-03-10 18:13:49.405626-23:59 Name: 2, dtype: object 0 2020-03-10 18:13:49.405598 1 2020-03-10 18:13:49.405626 Name: 2, dtype: datetime64[ns] {noformat} #### Problem description The round-trip dtype changed from the correct `object` to incorrect `datetime64`. Thus the timezones were discarded in Arrow and the timestamps became invalid. #### Expected Output (identical) {noformat} 0 2020-03-10 18:13:49.405598+00:00 1 2020-03-10 18:13:49.405626-23:59 Name: 2, dtype: object 0 2020-03-10 18:13:49.405598+00:00 1 2020-03-10 18:13:49.405626-23:59 Name: 2, dtype: object {noformat} #### Output of ``pd.show_versions()`` {noformat} INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-40-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.1 numpy : 1.17.4 pytz : 2019.2 dateutil : 2.7.3 pip : 19.3.1 setuptools : 42.0.1 Cython : 0.29.14 pytest : 5.3.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.10.3 IPython : 7.10.0 pandas_datareader: None bs4 : 4.8.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.3.1 pyxlsb : None s3fs : None scipy : 1.2.1 sqlalchemy : 1.3.12 tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None {noformat} was: The original description is at [https://github.com/pandas-dev/pandas/issues/32587] > PyArrow discards timezones > -------------------------- > > Key: ARROW-8066 > URL: https://issues.apache.org/jira/browse/ARROW-8066 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Reporter: Markovtsev Vadim > Priority: Major > > The original description is at > [https://github.com/pandas-dev/pandas/issues/32587] > #### Code Sample, a copy-pastable example if possible > {code:python} > import pandas as pd > from datetime import datetime, timezone > df = pd.DataFrame.from_records([ > (1, datetime.now().replace(tzinfo=timezone.utc)), > (2, datetime.now().replace(tzinfo=timezone.min))], > columns=["1", "2"]) > print(df["2"]) > print() > df.to_feather("/tmp/1") > df2 = pd.read_feather("/tmp/1") > print(df2["2"]) > {code} > This code will output: > {noformat} > 0 2020-03-10 18:13:49.405598+00:00 > 1 2020-03-10 18:13:49.405626-23:59 > Name: 2, dtype: object > 0 2020-03-10 18:13:49.405598 > 1 2020-03-10 18:13:49.405626 > Name: 2, dtype: datetime64[ns] > {noformat} > #### Problem description > The round-trip dtype changed from the correct `object` to incorrect > `datetime64`. Thus the timezones were discarded in Arrow and the timestamps > became invalid. > #### Expected Output > (identical) > {noformat} > 0 2020-03-10 18:13:49.405598+00:00 > 1 2020-03-10 18:13:49.405626-23:59 > Name: 2, dtype: object > 0 2020-03-10 18:13:49.405598+00:00 > 1 2020-03-10 18:13:49.405626-23:59 > Name: 2, dtype: object > {noformat} > #### Output of ``pd.show_versions()`` > {noformat} > INSTALLED VERSIONS > ------------------ > commit : None > python : 3.7.5.final.0 > python-bits : 64 > OS : Linux > OS-release : 5.3.0-40-generic > machine : x86_64 > processor : x86_64 > byteorder : little > LC_ALL : None > LANG : en_US.UTF-8 > LOCALE : en_US.UTF-8 > pandas : 1.0.1 > numpy : 1.17.4 > pytz : 2019.2 > dateutil : 2.7.3 > pip : 19.3.1 > setuptools : 42.0.1 > Cython : 0.29.14 > pytest : 5.3.1 > hypothesis : None > sphinx : None > blosc : None > feather : None > xlsxwriter : None > lxml.etree : 4.5.0 > html5lib : None > pymysql : None > psycopg2 : 2.8.4 (dt dec pq3 ext lo64) > jinja2 : 2.10.3 > IPython : 7.10.0 > pandas_datareader: None > bs4 : 4.8.1 > bottleneck : None > fastparquet : None > gcsfs : None > lxml.etree : 4.5.0 > matplotlib : 3.1.2 > numexpr : None > odfpy : None > openpyxl : None > pandas_gbq : None > pyarrow : 0.16.0 > pytables : None > pytest : 5.3.1 > pyxlsb : None > s3fs : None > scipy : 1.2.1 > sqlalchemy : 1.3.12 > tables : None > tabulate : None > xarray : None > xlrd : None > xlwt : None > xlsxwriter : None > numba : None > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)