FYI: I've just added:
https://github.com/apache/airflow/pull/34667 which documents how to use newer timezone information with Pendulum. Also work seems to be progressing (albeit slowly) on Pendulum 3: https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677 Bolke On Thu, 28 Sept 2023 at 15:12, Bolke de Bruin <[email protected]> wrote: > for serialization I am not too worried about ZoneInfo. We do not use > pickling by default as we roll our own serialization format. We probably > just need the key (zoneinfo.key). > > I'm not sure what happened about this: > > https://github.com/sdispater/pendulum/issues/590 > > Bolke > > On Thu, 28 Sept 2023 at 14:59, Andrey Anshin <[email protected]> > wrote: > >> I agree with all problems that you mention about datetime tz-aware data. >> I lived for almost 30 years in a country which had in different periods of >> time up to 10 time zones, and on a regular basis changed it >> (merge/unmerge) >> , disable DST, temporarily enable DST. In addition I also worked in a >> different bank for about 10 years (legacy systems which don't update >> tzdata >> for ages) . I think I had most of the bad cases with time zones. And I >> think everyone somehow has a problem with different time zones: Calendars >> + >> events, flight booking systems which don't know about timezones and you >> might find that your connecting flight flew away an hour ago, etc. >> >> In addition the error might happen in different places, databases (not >> updated tzdata, or db doesn't work correctly), client libraries, OS, etc. >> The person who finally solves tz-aware data should be granted all awards >> in >> the World. >> >> > For example, we got recently bitten by datetime.tzname() (which is >> supposed >> to 'time zone name') returning short-hand notation timezones (e.g. PST) >> > instead of full timezone names (e.g. "Europe/Amsterdam") which makes >> deserialization non deterministic. >> >> Yeah, and even ZoneInfo doesn't solve the problem with `datetime.tzname` >> because final implementation depends on different factors, tzinfo >> implementation and internals of datetime. >> >> > moving to zoneinfo seems to make sense though and will also be in >> Pendulum 3 >> >> I've have a look couple days ago about zoneinfo, it also have some >> "pitfalls", e.g. if timezone created from file it can't be easily >> serialized >> https://docs.python.org/3.9/library/zoneinfo.html#the-zoneinfo-class >> >> > Pendulum has proven us in the past, maybe we indeed should help the >> project if possible and if that isn't possible verify formal correctness >> of >> any other library >> >> I guess all other libraries might have a different kind of issue including >> compatibility with databases. >> More close replacement it is dateutil, but it also maintained by one >> person >> last release was 2 years ago and contains quite a few issues with >> timezones/DTS (no blame, that is just a fact) >> >> >> On Thu, 28 Sept 2023 at 15:39, Bolke de Bruin <[email protected]> wrote: >> >> > Thanks for starting the discussion Andrey. >> > >> > Some background on the choice for Pendulum at the time. In the early >> days >> > of Airflow it wasn't timezone aware. Originating from Airbnb which had a >> > reasonable mature data organization the view was everything needs to be >> in >> > UTC. According to Maxime the engineers would dream in UTC ;-). However, >> in >> > the real world which also needs to deal with legacy that didn't hold. >> Often >> > systems of record did not store timezone information but were localized >> > nevertheless. Cutoff times in banks happen in localized time and if you >> > want to meet those, Airflow needed to do better. >> > >> > Doing timezones and being timezone aware proved to be exceptionally >> hard. >> > Many libraries get it wrong [1] and fail silently (i.e. Arrow) or apply >> DST >> > transitions wrongly (pytz). When dealing with payments that stuff cannot >> > happen. To make things worse, in Python timezone support is pretty >> > convoluted, while some standardization happened in 3.9 by using IANA >> > provided timezone information from the local system, its API is messy. >> For >> > example, we got recently bitten by datetime.tzname() (which is >> > supposed to 'time >> > zone name') returning short-hand notation timezones (e.g. PST) instead >> of >> > full timezone names (e.g. "Europe/Amsterdam") which makes >> deserialization >> > non deterministic. >> > >> > So, what I am trying to say, is tread carefully when doing changes as >> > proposed in [2] (moving to zoneinfo seems to make sense though and will >> > also be in Pendulum 3). Make sure those changes are formally correct and >> > don't assume because they are now part of python itself (pytz was the >> > defacto standard for a long time). Pendulum has proven us in the past, >> > maybe we indeed should help the project if possible and if that isn't >> > possible verify formal correctness of any other library. >> > >> > Bolke >> > >> > [1] https://pendulum.eustace.io/faq/ >> > [2] https://github.com/apache/airflow/issues/19450 >> > >> > On Thu, 28 Sept 2023 at 11:03, Andrey Anshin <[email protected]> >> > wrote: >> > >> > > This discussion is more about the known problem of pendulum and how we >> > > could deal with it and maybe how we (as Community) might help autor. >> > > >> > > The library is mostly supported by a single author Sébastien Eustace ( >> > > https://github.com/sdispater) and it seems like we bump into the >> > situation >> > > which is described in xkcd #2347 ( >> > > https://imgs.xkcd.com/comics/dependency.png). To be honest it is not >> > > something new when library mainly supported by one author so there is >> > > always a risk that the library will no longer be supported / abandoned >> > > And if takes in account that pendulum provides core functionality in >> > > Airflow it could have dramatical impact in the future. >> > > >> > > Pendulum is a really nice library which helps a lot of developers to >> work >> > > with dates/datetimes. However there is one major problem, the last >> > release >> > > of this library happened more than 3 years ago ( >> > > https://pypi.org/project/pendulum/#history) in the time when Airflow >> > > 1.10.11 was released >> > > >> > > Fortunately, the project is not abandoned and on a regular basis >> commits >> > > add into the master branch. However these commits are not included >> into >> > any >> > > final release and that's why some things related to datetime don't >> work >> > as >> > > expected in Airflow. There are list of known (for me) issues which are >> > > affect Airflow >> > > >> > > *Memory Leak on parse*: >> > > - https://github.com/sdispater/pendulum/issues/720, this one fixed 2 >> > > years >> > > ago but not available yet ( >> > https://github.com/sdispater/pendulum/pull/563 >> > > ). >> > > Since we use parse dates in airflow codebase: datetime parameters and >> > > datetime in logs this one could be a reason for memory leakage in >> > Airflow: >> > > - https://github.com/apache/airflow/discussions/24694 >> > > - https://github.com/apache/airflow/discussions/28597 >> > > >> > > *Incorrect time zones*, known issues and should be already fixed in >> > master >> > > branch >> > > - https://github.com/sdispater/pendulum/issues/700, Mexico do not use >> > DST >> > > anymore >> > > - https://github.com/sdispater/pendulum/issues/706, Egypt reinstate >> DST >> > > >> > > We add clarification in https://github.com/apache/airflow/pull/30467, >> > > however it seems like there is no other way rather than patching >> Pendulum >> > > right now. >> > > >> > > All these issues should be solved as soon as pendulum 3 is released. >> The >> > > current announced estimation is end of september/ beginning of >> October: >> > > >> https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677 >> > > >> > > So in theory we would have a fixed version of pendulum soon, and it >> might >> > > break something in Airflow but from my point of view it is better than >> > > current status. >> > > >> > > However there might be a situation where the release of the pendulum >> > would >> > > be postponed, so maybe better to have a backup plan. What could we do >> in >> > > this case? >> > > >> > > Maybe we should start to use zoneinfo.ZoneInfo instead of pendulum >> > > datetime? https://github.com/apache/airflow/issues/19450 >> > > Pros: >> > > - stdlib (python 3.9+) >> > > - In pendulum 3.0 Timezone based on zoneinfo.Zoneinfo >> > > >> > > Cons: >> > > - Current serialization model can't deal with backport packages. E.g. >> > > timezone which are serialized in backport_zoneinfo can't be >> deserialized >> > in >> > > zoneinfo >> > > >> > > Maybe we should replace parse datetime with another solution. Does >> anyone >> > > know a good replacement? >> > > >> > > Maybe someone from Airflow Community could propose their help with >> > > maintenance of library: >> > > - https://github.com/sdispater/pendulum/issues/590 >> > > >> > > Maybe we should get rid of the pendulum at all, as a last resort >> > solution. >> > > I can't imagine how we could do that, because a lot of stuff depends >> on >> > the >> > > pendulum and removing it would be a breaking change. >> > > >> > > ---- >> > > Best Wishes >> > > *Andrey Anshin* >> > > >> > >> > >> > -- >> > >> > -- >> > Bolke de Bruin >> > [email protected] >> > >> > > > -- > > -- > Bolke de Bruin > [email protected] > -- -- Bolke de Bruin [email protected]
