[ 
https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8066:
--------------------------------
    Summary: [Python] Specify behavior for converting tz-aware 
datetime.datetime objects  (was: PyArrow discards timezones)

> [Python] Specify behavior for converting tz-aware datetime.datetime objects
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-8066
>                 URL: https://issues.apache.org/jira/browse/ARROW-8066
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>            Reporter: Markovtsev Vadim
>            Priority: Major
>
> The original description is at 
> [https://github.com/pandas-dev/pandas/issues/32587]
> h3. Code Sample, a copy-pastable example if possible
> {code:python}
> import pandas as pd
> from datetime import datetime, timezone
> df = pd.DataFrame.from_records([
>     (1, datetime.now().replace(tzinfo=timezone.utc)),
>     (2, datetime.now().replace(tzinfo=timezone.min))],
>     columns=["1", "2"])
> print(df["2"])
> print()
> df.to_feather("/tmp/1") 
> df2 = pd.read_feather("/tmp/1")
> print(df2["2"])
> {code}
> This code will output:
> {noformat}
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0   2020-03-10 18:13:49.405598
> 1   2020-03-10 18:13:49.405626
> Name: 2, dtype: datetime64[ns]
> {noformat}
> h3. Problem description
> The round-trip dtype changed from the correct `object` to incorrect 
> `datetime64`. Thus the timezones were discarded in Arrow and the timestamps 
> became invalid.
> h3. Expected Output
> (identical)
> {noformat}
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> {noformat}
> h3. Output of ``pd.show_versions()``
> {noformat}
> INSTALLED VERSIONS
> ------------------
> commit           : None
> python           : 3.7.5.final.0
> python-bits      : 64
> OS               : Linux
> OS-release       : 5.3.0-40-generic
> machine          : x86_64
> processor        : x86_64
> byteorder        : little
> LC_ALL           : None
> LANG             : en_US.UTF-8
> LOCALE           : en_US.UTF-8
> pandas           : 1.0.1
> numpy            : 1.17.4
> pytz             : 2019.2
> dateutil         : 2.7.3
> pip              : 19.3.1
> setuptools       : 42.0.1
> Cython           : 0.29.14
> pytest           : 5.3.1
> hypothesis       : None
> sphinx           : None
> blosc            : None
> feather          : None
> xlsxwriter       : None
> lxml.etree       : 4.5.0
> html5lib         : None
> pymysql          : None
> psycopg2         : 2.8.4 (dt dec pq3 ext lo64)
> jinja2           : 2.10.3
> IPython          : 7.10.0
> pandas_datareader: None
> bs4              : 4.8.1
> bottleneck       : None
> fastparquet      : None
> gcsfs            : None
> lxml.etree       : 4.5.0
> matplotlib       : 3.1.2
> numexpr          : None
> odfpy            : None
> openpyxl         : None
> pandas_gbq       : None
> pyarrow          : 0.16.0
> pytables         : None
> pytest           : 5.3.1
> pyxlsb           : None
> s3fs             : None
> scipy            : 1.2.1
> sqlalchemy       : 1.3.12
> tables           : None
> tabulate         : None
> xarray           : None
> xlrd             : None
> xlwt             : None
> xlsxwriter       : None
> numba            : None
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to