[ https://issues.apache.org/jira/browse/ARROW-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-4965: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/21470 > [Python] Timestamp array type detection should use tzname of > datetime.datetime objects > -------------------------------------------------------------------------------------- > > Key: ARROW-4965 > URL: https://issues.apache.org/jira/browse/ARROW-4965 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: $ python --version > Python 3.7.2 > $ pip freeze > numpy==1.16.2 > pyarrow==0.12.1 > pytz==2018.9 > six==1.12.0 > $ sw_vers > ProductName: Mac OS X > ProductVersion: 10.14.3 > BuildVersion: 18D109 > (pyarrow) > Reporter: Tim Swast > Assignee: Krisztian Szucs > Priority: Major > Fix For: 2.0.0 > > > The type detection from datetime objects to array appears to ignore the > presence of a tzinfo on the datetime object, instead storing them as naive > timestamp columns. > Python code: > {code:python} > import datetime > import pytz > import pyarrow as pa > naive_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10) > utc_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10, tzinfo=pytz.utc) > tzaware_datetime = > utc_datetime.astimezone(pytz.timezone('America/Los_Angeles')) > def inspect(varname): > print(varname) > arr = globals()[varname] > print(arr.type) > print(arr) > print() > auto_naive_arr = pa.array([naive_datetime]) > inspect("auto_naive_arr") > auto_utc_arr = pa.array([utc_datetime]) > inspect("auto_utc_arr") > auto_tzaware_arr = pa.array([tzaware_datetime]) > inspect("auto_tzaware_arr") > auto_mixed_arr = pa.array([utc_datetime, tzaware_datetime]) > inspect("auto_mixed_arr") > naive_type = pa.timestamp("us", naive_datetime.tzname()) > utc_type = pa.timestamp("us", utc_datetime.tzname()) > tzaware_type = pa.timestamp("us", tzaware_datetime.tzname()) > naive_arr = pa.array([naive_datetime], type=naive_type) > inspect("naive_arr") > utc_arr = pa.array([utc_datetime], type=utc_type) > inspect("utc_arr") > tzaware_arr = pa.array([tzaware_datetime], type=tzaware_type) > inspect("tzaware_arr") > mixed_arr = pa.array([utc_datetime, tzaware_datetime], type=utc_type) > inspect("mixed_arr") > {code} > This prints: > {noformat} > $ python detect_timezone.py > auto_naive_arr > timestamp[us] > [ > 1547381470000000 > ] > auto_utc_arr > timestamp[us] > [ > 1547381470000000 > ] > auto_tzaware_arr > timestamp[us] > [ > 1547352670000000 > ] > auto_mixed_arr > timestamp[us] > [ > 1547381470000000, > 1547352670000000 > ] > naive_arr > timestamp[us] > [ > 1547381470000000 > ] > utc_arr > timestamp[us, tz=UTC] > [ > 1547381470000000 > ] > tzaware_arr > timestamp[us, tz=PST] > [ > 1547352670000000 > ] > mixed_arr > timestamp[us, tz=UTC] > [ > 1547381470000000, > 1547352670000000 > ] > {noformat} > But I would expect the following types instead: > * {{naive_datetime}}: {{timestamp[us]}} > * {{auto_utc_arr}}: {{timestamp[us, tz=UTC]}} > * {{auto_tzaware_arr}}: {{timestamp[us, tz=PST]}} (Or maybe > {{tz='America/Los_Angeles'}}. I'm not sure why {{pytz}} returns {{PST}} as > the {{tzname}}) > * {{auto_mixed_arr}}: {{timestamp[us, tz=UTC]}} > Also, in the "mixed" case, I'd expect the actual stored microseconds to be > the same for both rows, since {{utc_datetime}} and {{tzaware_datetime}} both > refer to the same point in time. It seems reasonable for any naive datetime > objects mixed in with tz-aware datetimes to be interpreted as UTC. -- This message was sent by Atlassian Jira (v8.20.10#820010)