[ 
https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057712#comment-17057712
 ] 

Joris Van den Bossche commented on ARROW-8066:
----------------------------------------------

At least we should normalize to UTC, I think (now it is just taking the "local" 
time with discarding the time zone, it seems). 

BTW,I don't think that such roundtrip can ever work (automatically). In 
addition to not being able to support different timezones within an array as 
Wes mentioned, converting TimestampArray back to pandas will also prefer 
datetime64 (and I don't think we have an option to specify you want 
datetime.datetime objects?) 

> [Python] Specify behavior for converting tz-aware datetime.datetime objects 
> to Arrow format
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8066
>                 URL: https://issues.apache.org/jira/browse/ARROW-8066
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>            Reporter: Markovtsev Vadim
>            Priority: Major
>
> The original description is at 
> [https://github.com/pandas-dev/pandas/issues/32587]
> h3. Code Sample, a copy-pastable example if possible
> {code:python}
> import pandas as pd
> from datetime import datetime, timezone
> df = pd.DataFrame.from_records([
>     (1, datetime.now().replace(tzinfo=timezone.utc)),
>     (2, datetime.now().replace(tzinfo=timezone.min))],
>     columns=["1", "2"])
> print(df["2"])
> print()
> df.to_feather("/tmp/1") 
> df2 = pd.read_feather("/tmp/1")
> print(df2["2"])
> {code}
> This code will output:
> {noformat}
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0   2020-03-10 18:13:49.405598
> 1   2020-03-10 18:13:49.405626
> Name: 2, dtype: datetime64[ns]
> {noformat}
> h3. Problem description
> The round-trip dtype changed from the correct `object` to incorrect 
> `datetime64`. Thus the timezones were discarded in Arrow and the timestamps 
> became invalid.
> h3. Expected Output
> (identical)
> {noformat}
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> {noformat}
> h3. Output of ``pd.show_versions()``
> {noformat}
> INSTALLED VERSIONS
> ------------------
> commit           : None
> python           : 3.7.5.final.0
> python-bits      : 64
> OS               : Linux
> OS-release       : 5.3.0-40-generic
> machine          : x86_64
> processor        : x86_64
> byteorder        : little
> LC_ALL           : None
> LANG             : en_US.UTF-8
> LOCALE           : en_US.UTF-8
> pandas           : 1.0.1
> numpy            : 1.17.4
> pytz             : 2019.2
> dateutil         : 2.7.3
> pip              : 19.3.1
> setuptools       : 42.0.1
> Cython           : 0.29.14
> pytest           : 5.3.1
> hypothesis       : None
> sphinx           : None
> blosc            : None
> feather          : None
> xlsxwriter       : None
> lxml.etree       : 4.5.0
> html5lib         : None
> pymysql          : None
> psycopg2         : 2.8.4 (dt dec pq3 ext lo64)
> jinja2           : 2.10.3
> IPython          : 7.10.0
> pandas_datareader: None
> bs4              : 4.8.1
> bottleneck       : None
> fastparquet      : None
> gcsfs            : None
> lxml.etree       : 4.5.0
> matplotlib       : 3.1.2
> numexpr          : None
> odfpy            : None
> openpyxl         : None
> pandas_gbq       : None
> pyarrow          : 0.16.0
> pytables         : None
> pytest           : 5.3.1
> pyxlsb           : None
> s3fs             : None
> scipy            : 1.2.1
> sqlalchemy       : 1.3.12
> tables           : None
> tabulate         : None
> xarray           : None
> xlrd             : None
> xlwt             : None
> xlsxwriter       : None
> numba            : None
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to