[ https://issues.apache.org/jira/browse/ARROW-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518650#comment-17518650 ]
Joris Van den Bossche commented on ARROW-16022: ----------------------------------------------- > If they must fail, it should be when the pyarrow.Timestamp is created. I would like to point out that Arrow actually _does_ "validate" the time upon creation. In the sense that we do convert the timezone-aware python datetime object into an unambiguous UTC value (which is guaranteed to exist) when creating the pyarrow timestamp array. It is only that the current implementation of temporal rounding does this in local time, and in your case the unambiguous UTC timestamp is converted to a local time that is ambiguous (in local time), and then the conversion back to unambiguous UTC timestamp after rounding fails. We can solve this specific issue by improving the implementation of the temporal rounding algorithm, and that is what ARROW-15251([https://github.com/apache/arrow/pull/12528]) is about. To illustrate my first point, let me get back to your example of an ambiguous datetime: {code:python} tz = zoneinfo.ZoneInfo(key='America/New_York') # In the US, the 1:00am hour is the ambiguous because the minute after 1:59am Daylight-Savings Time is 1:00am Standard Time # however, these times exist and date_ambig = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = tz) {code} This is in fact not an ambiguous datetime. As you show when printing this value, "native datetime object defaults to daylight time": {code:python} >>> print(date_ambig) 2013-11-03 01:03:14-04:00 {code} but this is because the actual datetime defaults to {{{}fold=0{}}}, which corresponds to the offset of 04:00. This is something you control when _creating_ the actual datetime.datetime object, so we can explicitly construct the "other" value for this datetime with offset 05:00: {code:python} >>> date_ambig2 = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = tz, fold=1) >>> print(date_ambig2) 2013-11-03 01:03:14-05:00 >>> pa.array([date_ambig, date_ambig2], pa.timestamp("us", >>> tz="America/New_York")) <pyarrow.lib.TimestampArray object at 0x7fa58edb79a0> [ 2013-11-03 05:03:14.000000, 2013-11-03 06:03:14.000000 ] {code} So both datetime.datetime values are actually representing a specific moment in time in this case, and are properly converted to UTC when creating the pyarrow array. See https://peps.python.org/pep-0495/ for more details on this "fold" that was introduced for datetime.datetime to disambiguate local times. > [C++] Temporal floor/ceil/round throws exception for timestamps ambiguous due > to DST > ------------------------------------------------------------------------------------ > > Key: ARROW-16022 > URL: https://issues.apache.org/jira/browse/ARROW-16022 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 7.0.0 > Reporter: Kevin Crouse > Priority: Major > > Running pyarrow.compute.floor_temporal for timestamps that exist will throw > exceptions if the times are ambiguous during the daylight savings time > transitions. > As the *_temporal functions do not fundamentally change the times, it does > not make sense that they would fail due to a timezone issue. If they must > fail, it should be when the pyarrow.Timestamp is created. > > > {code:java} > import pyarrow > import pyarrow.compute as pc > import datetime > import pytz > t = pyarrow.timestamp('s', tz='America/New_York') > dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = > pytz.timezone('America/New_York')) > # if a timestamp must be invalid, this could fail > za = pyarrow.array([dt], t) > # raises an exception, even though this is conceptually an identity function > here > pc.floor_temporal(za, unit = 'second') {code} > > And this actually works just fine (continued from above) > {code:java} > pc.cast( > pc.floor_temporal( > pc.cast(za, pyarrow.timestamp('s', 'UTC')), > unit='second'), > pyarrow.timestamp('s','America/New_York') > ) > {code} > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)