FYI:

I've just added:

https://github.com/apache/airflow/pull/34667

which documents how to use newer timezone information with Pendulum.

Also work seems to be progressing (albeit slowly) on Pendulum 3:

https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677

Bolke



On Thu, 28 Sept 2023 at 15:12, Bolke de Bruin <[email protected]> wrote:

> for serialization I am not too worried about ZoneInfo. We do not use
> pickling by default as we roll our own serialization format. We probably
> just need the key (zoneinfo.key).
>
> I'm not sure what happened about this:
>
> https://github.com/sdispater/pendulum/issues/590
>
> Bolke
>
> On Thu, 28 Sept 2023 at 14:59, Andrey Anshin <[email protected]>
> wrote:
>
>> I agree with all problems that you mention about datetime tz-aware data.
>> I lived for almost 30 years in a country which had in different periods of
>> time up to 10 time zones, and on a regular basis changed it
>> (merge/unmerge)
>> , disable DST, temporarily enable DST. In addition I also worked in a
>> different bank for about 10 years (legacy systems which don't update
>> tzdata
>> for ages) . I think I had most of the bad cases with time zones. And I
>> think everyone somehow has a problem with different time zones: Calendars
>> +
>> events, flight booking systems which don't know about timezones and you
>> might find that your connecting flight flew away an hour ago, etc.
>>
>> In addition the error might happen in different places, databases (not
>> updated tzdata, or db doesn't work correctly), client libraries, OS, etc.
>> The person who finally solves tz-aware data should be granted all awards
>> in
>> the World.
>>
>> > For example, we got recently bitten by datetime.tzname() (which is
>> supposed
>> to 'time zone name') returning short-hand notation timezones (e.g. PST)
>> > instead of full timezone names (e.g. "Europe/Amsterdam") which makes
>> deserialization non deterministic.
>>
>> Yeah, and even ZoneInfo doesn't solve the problem with `datetime.tzname`
>> because final implementation depends on different factors, tzinfo
>> implementation and internals of datetime.
>>
>> > moving to zoneinfo seems to make sense though and will also be in
>> Pendulum 3
>>
>> I've have a look couple days ago about zoneinfo, it also have some
>> "pitfalls", e.g. if timezone created from file it can't be easily
>> serialized
>> https://docs.python.org/3.9/library/zoneinfo.html#the-zoneinfo-class
>>
>> > Pendulum has proven us in the past, maybe we indeed should help the
>> project if possible and if that isn't possible verify formal correctness
>> of
>> any other library
>>
>> I guess all other libraries might have a different kind of issue including
>> compatibility with databases.
>> More close replacement it is dateutil, but it also maintained by one
>> person
>> last release was 2 years ago and contains quite a few issues with
>> timezones/DTS (no blame, that is just a fact)
>>
>>
>> On Thu, 28 Sept 2023 at 15:39, Bolke de Bruin <[email protected]> wrote:
>>
>> > Thanks for starting the discussion Andrey.
>> >
>> > Some background on the choice for Pendulum at the time. In the early
>> days
>> > of Airflow it wasn't timezone aware. Originating from Airbnb which had a
>> > reasonable mature data organization the view was everything needs to be
>> in
>> > UTC. According to Maxime the engineers would dream in UTC ;-). However,
>> in
>> > the real world which also needs to deal with legacy that didn't hold.
>> Often
>> > systems of record did not store timezone information but were localized
>> > nevertheless. Cutoff times in banks happen in localized time and if you
>> > want to meet those, Airflow needed to do better.
>> >
>> > Doing timezones and being timezone aware proved to be exceptionally
>> hard.
>> > Many libraries get it wrong [1] and fail silently (i.e. Arrow) or apply
>> DST
>> > transitions wrongly (pytz). When dealing with payments that stuff cannot
>> > happen. To make things worse, in Python timezone support is pretty
>> > convoluted, while some standardization happened in 3.9 by using IANA
>> > provided timezone information from the local system, its API is messy.
>> For
>> > example, we got recently bitten by datetime.tzname() (which is
>> > supposed to 'time
>> > zone name') returning short-hand notation timezones (e.g. PST) instead
>> of
>> > full timezone names (e.g. "Europe/Amsterdam") which makes
>> deserialization
>> > non deterministic.
>> >
>> > So, what I am trying to say, is tread carefully when doing changes as
>> > proposed in [2] (moving to zoneinfo seems to make sense though and will
>> > also be in Pendulum 3). Make sure those changes are formally correct and
>> > don't assume because they are now part of python itself (pytz was the
>> > defacto standard for a long time). Pendulum has proven us in the past,
>> > maybe we indeed should help the project if possible and if that isn't
>> > possible verify formal correctness of any other library.
>> >
>> > Bolke
>> >
>> > [1] https://pendulum.eustace.io/faq/
>> > [2] https://github.com/apache/airflow/issues/19450
>> >
>> > On Thu, 28 Sept 2023 at 11:03, Andrey Anshin <[email protected]>
>> > wrote:
>> >
>> > > This discussion is more about the known problem of pendulum and how we
>> > > could deal with it and maybe how we (as Community) might help autor.
>> > >
>> > > The library is mostly supported by a single author Sébastien Eustace (
>> > > https://github.com/sdispater) and it seems like we bump into the
>> > situation
>> > > which is described in xkcd #2347 (
>> > > https://imgs.xkcd.com/comics/dependency.png). To be honest it is not
>> > > something new when library mainly supported by one author so there is
>> > > always a risk that the library will no longer be supported / abandoned
>> > > And if takes in account that pendulum provides core functionality in
>> > > Airflow it could have dramatical impact in the future.
>> > >
>> > > Pendulum is a really nice library which helps a lot of developers to
>> work
>> > > with dates/datetimes. However there is one major problem, the last
>> > release
>> > > of this library happened more than 3 years ago (
>> > > https://pypi.org/project/pendulum/#history) in the time when Airflow
>> > > 1.10.11 was released
>> > >
>> > > Fortunately, the project is not abandoned and on a regular basis
>> commits
>> > > add into the master branch. However these commits are not included
>> into
>> > any
>> > > final release and that's why some things related to datetime don't
>> work
>> > as
>> > > expected in Airflow. There are list of known (for me) issues which are
>> > > affect Airflow
>> > >
>> > > *Memory Leak on parse*:
>> > > - https://github.com/sdispater/pendulum/issues/720, this one fixed  2
>> > > years
>> > > ago but not available yet (
>> > https://github.com/sdispater/pendulum/pull/563
>> > > ).
>> > > Since we use parse dates in airflow codebase: datetime parameters and
>> > > datetime in logs this one could be a reason for memory leakage in
>> > Airflow:
>> > > - https://github.com/apache/airflow/discussions/24694
>> > > - https://github.com/apache/airflow/discussions/28597
>> > >
>> > > *Incorrect time zones*, known issues and should be already fixed in
>> > master
>> > > branch
>> > > - https://github.com/sdispater/pendulum/issues/700, Mexico do not use
>> > DST
>> > > anymore
>> > > - https://github.com/sdispater/pendulum/issues/706, Egypt reinstate
>> DST
>> > >
>> > > We add clarification in https://github.com/apache/airflow/pull/30467,
>> > > however it seems like there is no other way rather than patching
>> Pendulum
>> > > right now.
>> > >
>> > > All these issues should be solved as soon as pendulum 3 is released.
>> The
>> > > current announced estimation is end of september/ beginning of
>> October:
>> > >
>> https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677
>> > >
>> > > So in theory we would have a fixed version of pendulum soon, and it
>> might
>> > > break something in Airflow but from my point of view it is better than
>> > > current status.
>> > >
>> > > However there might be a situation where the release of the pendulum
>> > would
>> > > be postponed, so maybe better to have a backup plan. What could we do
>> in
>> > > this case?
>> > >
>> > > Maybe we should start to use zoneinfo.ZoneInfo instead of pendulum
>> > > datetime? https://github.com/apache/airflow/issues/19450
>> > > Pros:
>> > > - stdlib (python 3.9+)
>> > > - In pendulum 3.0 Timezone based on zoneinfo.Zoneinfo
>> > >
>> > > Cons:
>> > > - Current serialization model can't deal with backport packages. E.g.
>> > > timezone which are serialized in backport_zoneinfo can't be
>> deserialized
>> > in
>> > > zoneinfo
>> > >
>> > > Maybe we should replace parse datetime with another solution. Does
>> anyone
>> > > know a good replacement?
>> > >
>> > > Maybe someone from Airflow Community could propose their help with
>> > > maintenance of library:
>> > > - https://github.com/sdispater/pendulum/issues/590
>> > >
>> > > Maybe we should get rid of the pendulum at all, as a last resort
>> > solution.
>> > > I can't imagine how we could do that, because a lot of stuff depends
>> on
>> > the
>> > > pendulum and removing it would be a breaking change.
>> > >
>> > > ----
>> > > Best Wishes
>> > > *Andrey Anshin*
>> > >
>> >
>> >
>> > --
>> >
>> > --
>> > Bolke de Bruin
>> > [email protected]
>> >
>>
>
>
> --
>
> --
> Bolke de Bruin
> [email protected]
>


-- 

--
Bolke de Bruin
[email protected]

Reply via email to