I believe some discussion on this has already occurred: 
https://github.com/apache/airflow/issues/19450 (in fact I commented some 
concerns on it 
https://github.com/apache/airflow/issues/19450#issuecomment-1536494966)

I guess I remain concerned the zoneinfo API isn't sufficient for many places 
where the pendulum is, and it isn't sufficiently mature (in the sense that not 
enough people are relying on it) and that could be doubly bad in the standard 
library because bug will remain permanently in specific versions of Python.

I appreciate this is a balancing issue, if pendulum doesn't make new releases 
then it can't be relied on forever.

Damian

-----Original Message-----
From: Bolke de Bruin <[email protected]>
Sent: Thursday, September 28, 2023 10:03 AM
To: [email protected]
Subject: Re: [DISCUSS] Future of Pendulum in Airflow

FYI:

I've just added:

https://github.com/apache/airflow/pull/34667

which documents how to use newer timezone information with Pendulum.

Also work seems to be progressing (albeit slowly) on Pendulum 3:

https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677

Bolke



On Thu, 28 Sept 2023 at 15:12, Bolke de Bruin <[email protected]> wrote:

> for serialization I am not too worried about ZoneInfo. We do not use
> pickling by default as we roll our own serialization format. We
> probably just need the key (zoneinfo.key).
>
> I'm not sure what happened about this:
>
> https://github.com/sdispater/pendulum/issues/590
>
> Bolke
>
> On Thu, 28 Sept 2023 at 14:59, Andrey Anshin
> <[email protected]>
> wrote:
>
>> I agree with all problems that you mention about datetime tz-aware data.
>> I lived for almost 30 years in a country which had in different
>> periods of time up to 10 time zones, and on a regular basis changed
>> it
>> (merge/unmerge)
>> , disable DST, temporarily enable DST. In addition I also worked in a
>> different bank for about 10 years (legacy systems which don't update
>> tzdata for ages) . I think I had most of the bad cases with time
>> zones. And I think everyone somehow has a problem with different time
>> zones: Calendars
>> +
>> events, flight booking systems which don't know about timezones and
>> you might find that your connecting flight flew away an hour ago, etc.
>>
>> In addition the error might happen in different places, databases
>> (not updated tzdata, or db doesn't work correctly), client libraries, OS, 
>> etc.
>> The person who finally solves tz-aware data should be granted all
>> awards in the World.
>>
>> > For example, we got recently bitten by datetime.tzname() (which is
>> supposed
>> to 'time zone name') returning short-hand notation timezones (e.g.
>> PST)
>> > instead of full timezone names (e.g. "Europe/Amsterdam") which
>> > makes
>> deserialization non deterministic.
>>
>> Yeah, and even ZoneInfo doesn't solve the problem with
>> `datetime.tzname` because final implementation depends on different
>> factors, tzinfo implementation and internals of datetime.
>>
>> > moving to zoneinfo seems to make sense though and will also be in
>> Pendulum 3
>>
>> I've have a look couple days ago about zoneinfo, it also have some
>> "pitfalls", e.g. if timezone created from file it can't be easily
>> serialized
>> https://docs.python.org/3.9/library/zoneinfo.html#the-zoneinfo-class
>>
>> > Pendulum has proven us in the past, maybe we indeed should help the
>> project if possible and if that isn't possible verify formal
>> correctness of any other library
>>
>> I guess all other libraries might have a different kind of issue
>> including compatibility with databases.
>> More close replacement it is dateutil, but it also maintained by one
>> person last release was 2 years ago and contains quite a few issues
>> with timezones/DTS (no blame, that is just a fact)
>>
>>
>> On Thu, 28 Sept 2023 at 15:39, Bolke de Bruin <[email protected]> wrote:
>>
>> > Thanks for starting the discussion Andrey.
>> >
>> > Some background on the choice for Pendulum at the time. In the
>> > early
>> days
>> > of Airflow it wasn't timezone aware. Originating from Airbnb which
>> > had a reasonable mature data organization the view was everything
>> > needs to be
>> in
>> > UTC. According to Maxime the engineers would dream in UTC ;-).
>> > However,
>> in
>> > the real world which also needs to deal with legacy that didn't hold.
>> Often
>> > systems of record did not store timezone information but were
>> > localized nevertheless. Cutoff times in banks happen in localized
>> > time and if you want to meet those, Airflow needed to do better.
>> >
>> > Doing timezones and being timezone aware proved to be exceptionally
>> hard.
>> > Many libraries get it wrong [1] and fail silently (i.e. Arrow) or
>> > apply
>> DST
>> > transitions wrongly (pytz). When dealing with payments that stuff
>> > cannot happen. To make things worse, in Python timezone support is
>> > pretty convoluted, while some standardization happened in 3.9 by
>> > using IANA provided timezone information from the local system, its API is 
>> > messy.
>> For
>> > example, we got recently bitten by datetime.tzname() (which is
>> > supposed to 'time zone name') returning short-hand notation
>> > timezones (e.g. PST) instead
>> of
>> > full timezone names (e.g. "Europe/Amsterdam") which makes
>> deserialization
>> > non deterministic.
>> >
>> > So, what I am trying to say, is tread carefully when doing changes
>> > as proposed in [2] (moving to zoneinfo seems to make sense though
>> > and will also be in Pendulum 3). Make sure those changes are
>> > formally correct and don't assume because they are now part of
>> > python itself (pytz was the defacto standard for a long time).
>> > Pendulum has proven us in the past, maybe we indeed should help the
>> > project if possible and if that isn't possible verify formal correctness 
>> > of any other library.
>> >
>> > Bolke
>> >
>> > [1] https://pendulum.eustace.io/faq/ [2]
>> > https://github.com/apache/airflow/issues/19450
>> >
>> > On Thu, 28 Sept 2023 at 11:03, Andrey Anshin
>> > <[email protected]>
>> > wrote:
>> >
>> > > This discussion is more about the known problem of pendulum and
>> > > how we could deal with it and maybe how we (as Community) might help 
>> > > autor.
>> > >
>> > > The library is mostly supported by a single author Sébastien
>> > > Eustace (
>> > > https://github.com/sdispater) and it seems like we bump into the
>> > situation
>> > > which is described in xkcd #2347 (
>> > > https://imgs.xkcd.com/comics/dependency.png). To be honest it is
>> > > not something new when library mainly supported by one author so
>> > > there is always a risk that the library will no longer be
>> > > supported / abandoned And if takes in account that pendulum
>> > > provides core functionality in Airflow it could have dramatical impact 
>> > > in the future.
>> > >
>> > > Pendulum is a really nice library which helps a lot of developers
>> > > to
>> work
>> > > with dates/datetimes. However there is one major problem, the
>> > > last
>> > release
>> > > of this library happened more than 3 years ago (
>> > > https://pypi.org/project/pendulum/#history) in the time when
>> > > Airflow
>> > > 1.10.11 was released
>> > >
>> > > Fortunately, the project is not abandoned and on a regular basis
>> commits
>> > > add into the master branch. However these commits are not
>> > > included
>> into
>> > any
>> > > final release and that's why some things related to datetime
>> > > don't
>> work
>> > as
>> > > expected in Airflow. There are list of known (for me) issues
>> > > which are affect Airflow
>> > >
>> > > *Memory Leak on parse*:
>> > > - https://github.com/sdispater/pendulum/issues/720, this one
>> > > fixed  2 years ago but not available yet (
>> > https://github.com/sdispater/pendulum/pull/563
>> > > ).
>> > > Since we use parse dates in airflow codebase: datetime parameters
>> > > and datetime in logs this one could be a reason for memory
>> > > leakage in
>> > Airflow:
>> > > - https://github.com/apache/airflow/discussions/24694
>> > > - https://github.com/apache/airflow/discussions/28597
>> > >
>> > > *Incorrect time zones*, known issues and should be already fixed
>> > > in
>> > master
>> > > branch
>> > > - https://github.com/sdispater/pendulum/issues/700, Mexico do not
>> > > use
>> > DST
>> > > anymore
>> > > - https://github.com/sdispater/pendulum/issues/706, Egypt
>> > > reinstate
>> DST
>> > >
>> > > We add clarification in
>> > > https://github.com/apache/airflow/pull/30467,
>> > > however it seems like there is no other way rather than patching
>> Pendulum
>> > > right now.
>> > >
>> > > All these issues should be solved as soon as pendulum 3 is released.
>> The
>> > > current announced estimation is end of september/ beginning of
>> October:
>> > >
>> https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299
>> 677
>> > >
>> > > So in theory we would have a fixed version of pendulum soon, and
>> > > it
>> might
>> > > break something in Airflow but from my point of view it is better
>> > > than current status.
>> > >
>> > > However there might be a situation where the release of the
>> > > pendulum
>> > would
>> > > be postponed, so maybe better to have a backup plan. What could
>> > > we do
>> in
>> > > this case?
>> > >
>> > > Maybe we should start to use zoneinfo.ZoneInfo instead of
>> > > pendulum datetime? https://github.com/apache/airflow/issues/19450
>> > > Pros:
>> > > - stdlib (python 3.9+)
>> > > - In pendulum 3.0 Timezone based on zoneinfo.Zoneinfo
>> > >
>> > > Cons:
>> > > - Current serialization model can't deal with backport packages. E.g.
>> > > timezone which are serialized in backport_zoneinfo can't be
>> deserialized
>> > in
>> > > zoneinfo
>> > >
>> > > Maybe we should replace parse datetime with another solution.
>> > > Does
>> anyone
>> > > know a good replacement?
>> > >
>> > > Maybe someone from Airflow Community could propose their help
>> > > with maintenance of library:
>> > > - https://github.com/sdispater/pendulum/issues/590
>> > >
>> > > Maybe we should get rid of the pendulum at all, as a last resort
>> > solution.
>> > > I can't imagine how we could do that, because a lot of stuff
>> > > depends
>> on
>> > the
>> > > pendulum and removing it would be a breaking change.
>> > >
>> > > ----
>> > > Best Wishes
>> > > *Andrey Anshin*
>> > >
>> >
>> >
>> > --
>> >
>> > --
>> > Bolke de Bruin
>> > [email protected]
>> >
>>
>
>
> --
>
> --
> Bolke de Bruin
> [email protected]
>


--

--
Bolke de Bruin
[email protected]
________________________________
 Strike Technologies, LLC (“Strike”) is part of the GTS family of companies. 
Strike is a technology solutions provider, and is not a broker or dealer and 
does not transact any securities related business directly whatsoever. This 
communication is the property of Strike and its affiliates, and does not 
constitute an offer to sell or the solicitation of an offer to buy any security 
in any jurisdiction. It is intended only for the person to whom it is addressed 
and may contain information that is privileged, confidential, or otherwise 
protected from disclosure. Distribution or copying of this communication, or 
the information contained herein, by anyone other than the intended recipient 
is prohibited. If you have received this communication in error, please 
immediately notify Strike at [email protected], and delete and 
destroy any copies hereof.
________________________________

CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any attachments are 
intended solely for the addressee. This transmission is covered by the 
Electronic Communications Privacy Act, 18 U.S.C ''2510-2521. The information 
contained in this transmission is confidential in nature and protected from 
further use or disclosure under U.S. Pub. L. 106-102, 113 U.S. Stat. 1338 
(1999), and may be subject to attorney-client or other legal privilege. Your 
use or disclosure of this information for any purpose other than that intended 
by its transmittal is strictly prohibited, and may subject you to fines and/or 
penalties under federal and state law. If you are not the intended recipient of 
this transmission, please DESTROY ALL COPIES RECEIVED and confirm destruction 
to the sender via return transmittal.

Reply via email to