[ 
https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056405#comment-17056405
 ] 

Wes McKinney commented on ARROW-8066:
-------------------------------------

Thanks. 

I believe we don't have any handling of tz-aware datetime.datetime objects when 
converting to the Arrow format. Best case scenario all values have the same 
time zone, but we will need to decide what happens when they have different 
time zones. We won't be able to have each value have its own different time 
zone at the moment, though

Options for what should take place:

1. UTC-normalize tz-aware datetime.datetime
2. When all values have the same time zone, convert the tzinfo to a timezone 
string for storage in the Arrow metadata
3. When there are distinct tzinfos, we should do one of two things: either 
UTC-normalize (and set timezone utc) or raise an exception

Either way there's a fair bit of work to do to accomplish these

> [Python] Specify behavior for converting tz-aware datetime.datetime objects
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-8066
>                 URL: https://issues.apache.org/jira/browse/ARROW-8066
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>            Reporter: Markovtsev Vadim
>            Priority: Major
>
> The original description is at 
> [https://github.com/pandas-dev/pandas/issues/32587]
> h3. Code Sample, a copy-pastable example if possible
> {code:python}
> import pandas as pd
> from datetime import datetime, timezone
> df = pd.DataFrame.from_records([
>     (1, datetime.now().replace(tzinfo=timezone.utc)),
>     (2, datetime.now().replace(tzinfo=timezone.min))],
>     columns=["1", "2"])
> print(df["2"])
> print()
> df.to_feather("/tmp/1") 
> df2 = pd.read_feather("/tmp/1")
> print(df2["2"])
> {code}
> This code will output:
> {noformat}
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0   2020-03-10 18:13:49.405598
> 1   2020-03-10 18:13:49.405626
> Name: 2, dtype: datetime64[ns]
> {noformat}
> h3. Problem description
> The round-trip dtype changed from the correct `object` to incorrect 
> `datetime64`. Thus the timezones were discarded in Arrow and the timestamps 
> became invalid.
> h3. Expected Output
> (identical)
> {noformat}
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0    2020-03-10 18:13:49.405598+00:00
> 1    2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> {noformat}
> h3. Output of ``pd.show_versions()``
> {noformat}
> INSTALLED VERSIONS
> ------------------
> commit           : None
> python           : 3.7.5.final.0
> python-bits      : 64
> OS               : Linux
> OS-release       : 5.3.0-40-generic
> machine          : x86_64
> processor        : x86_64
> byteorder        : little
> LC_ALL           : None
> LANG             : en_US.UTF-8
> LOCALE           : en_US.UTF-8
> pandas           : 1.0.1
> numpy            : 1.17.4
> pytz             : 2019.2
> dateutil         : 2.7.3
> pip              : 19.3.1
> setuptools       : 42.0.1
> Cython           : 0.29.14
> pytest           : 5.3.1
> hypothesis       : None
> sphinx           : None
> blosc            : None
> feather          : None
> xlsxwriter       : None
> lxml.etree       : 4.5.0
> html5lib         : None
> pymysql          : None
> psycopg2         : 2.8.4 (dt dec pq3 ext lo64)
> jinja2           : 2.10.3
> IPython          : 7.10.0
> pandas_datareader: None
> bs4              : 4.8.1
> bottleneck       : None
> fastparquet      : None
> gcsfs            : None
> lxml.etree       : 4.5.0
> matplotlib       : 3.1.2
> numexpr          : None
> odfpy            : None
> openpyxl         : None
> pandas_gbq       : None
> pyarrow          : 0.16.0
> pytables         : None
> pytest           : 5.3.1
> pyxlsb           : None
> s3fs             : None
> scipy            : 1.2.1
> sqlalchemy       : 1.3.12
> tables           : None
> tabulate         : None
> xarray           : None
> xlrd             : None
> xlwt             : None
> xlsxwriter       : None
> numba            : None
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to