Good news, new pre-release of pendulum available: https://pypi.org/project/pendulum/3.0.0b1/ , https://github.com/sdispater/pendulum/releases/tag/3.0.0b1
1. I checked and there is no memory leak on the unsuccessful parse. 2. Also check some timezone which previously doesn't work well, and as expected it work now fine I do not check on Airflow codebase yet ---- Best Wishes *Andrey Anshin* On Thu, 28 Sept 2023 at 18:03, Bolke de Bruin <[email protected]> wrote: > FYI: > > I've just added: > > https://github.com/apache/airflow/pull/34667 > > which documents how to use newer timezone information with Pendulum. > > Also work seems to be progressing (albeit slowly) on Pendulum 3: > > https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677 > > Bolke > > > > On Thu, 28 Sept 2023 at 15:12, Bolke de Bruin <[email protected]> wrote: > > > for serialization I am not too worried about ZoneInfo. We do not use > > pickling by default as we roll our own serialization format. We probably > > just need the key (zoneinfo.key). > > > > I'm not sure what happened about this: > > > > https://github.com/sdispater/pendulum/issues/590 > > > > Bolke > > > > On Thu, 28 Sept 2023 at 14:59, Andrey Anshin <[email protected]> > > wrote: > > > >> I agree with all problems that you mention about datetime tz-aware data. > >> I lived for almost 30 years in a country which had in different periods > of > >> time up to 10 time zones, and on a regular basis changed it > >> (merge/unmerge) > >> , disable DST, temporarily enable DST. In addition I also worked in a > >> different bank for about 10 years (legacy systems which don't update > >> tzdata > >> for ages) . I think I had most of the bad cases with time zones. And I > >> think everyone somehow has a problem with different time zones: > Calendars > >> + > >> events, flight booking systems which don't know about timezones and you > >> might find that your connecting flight flew away an hour ago, etc. > >> > >> In addition the error might happen in different places, databases (not > >> updated tzdata, or db doesn't work correctly), client libraries, OS, > etc. > >> The person who finally solves tz-aware data should be granted all awards > >> in > >> the World. > >> > >> > For example, we got recently bitten by datetime.tzname() (which is > >> supposed > >> to 'time zone name') returning short-hand notation timezones (e.g. PST) > >> > instead of full timezone names (e.g. "Europe/Amsterdam") which makes > >> deserialization non deterministic. > >> > >> Yeah, and even ZoneInfo doesn't solve the problem with `datetime.tzname` > >> because final implementation depends on different factors, tzinfo > >> implementation and internals of datetime. > >> > >> > moving to zoneinfo seems to make sense though and will also be in > >> Pendulum 3 > >> > >> I've have a look couple days ago about zoneinfo, it also have some > >> "pitfalls", e.g. if timezone created from file it can't be easily > >> serialized > >> https://docs.python.org/3.9/library/zoneinfo.html#the-zoneinfo-class > >> > >> > Pendulum has proven us in the past, maybe we indeed should help the > >> project if possible and if that isn't possible verify formal correctness > >> of > >> any other library > >> > >> I guess all other libraries might have a different kind of issue > including > >> compatibility with databases. > >> More close replacement it is dateutil, but it also maintained by one > >> person > >> last release was 2 years ago and contains quite a few issues with > >> timezones/DTS (no blame, that is just a fact) > >> > >> > >> On Thu, 28 Sept 2023 at 15:39, Bolke de Bruin <[email protected]> > wrote: > >> > >> > Thanks for starting the discussion Andrey. > >> > > >> > Some background on the choice for Pendulum at the time. In the early > >> days > >> > of Airflow it wasn't timezone aware. Originating from Airbnb which > had a > >> > reasonable mature data organization the view was everything needs to > be > >> in > >> > UTC. According to Maxime the engineers would dream in UTC ;-). > However, > >> in > >> > the real world which also needs to deal with legacy that didn't hold. > >> Often > >> > systems of record did not store timezone information but were > localized > >> > nevertheless. Cutoff times in banks happen in localized time and if > you > >> > want to meet those, Airflow needed to do better. > >> > > >> > Doing timezones and being timezone aware proved to be exceptionally > >> hard. > >> > Many libraries get it wrong [1] and fail silently (i.e. Arrow) or > apply > >> DST > >> > transitions wrongly (pytz). When dealing with payments that stuff > cannot > >> > happen. To make things worse, in Python timezone support is pretty > >> > convoluted, while some standardization happened in 3.9 by using IANA > >> > provided timezone information from the local system, its API is messy. > >> For > >> > example, we got recently bitten by datetime.tzname() (which is > >> > supposed to 'time > >> > zone name') returning short-hand notation timezones (e.g. PST) instead > >> of > >> > full timezone names (e.g. "Europe/Amsterdam") which makes > >> deserialization > >> > non deterministic. > >> > > >> > So, what I am trying to say, is tread carefully when doing changes as > >> > proposed in [2] (moving to zoneinfo seems to make sense though and > will > >> > also be in Pendulum 3). Make sure those changes are formally correct > and > >> > don't assume because they are now part of python itself (pytz was the > >> > defacto standard for a long time). Pendulum has proven us in the past, > >> > maybe we indeed should help the project if possible and if that isn't > >> > possible verify formal correctness of any other library. > >> > > >> > Bolke > >> > > >> > [1] https://pendulum.eustace.io/faq/ > >> > [2] https://github.com/apache/airflow/issues/19450 > >> > > >> > On Thu, 28 Sept 2023 at 11:03, Andrey Anshin < > [email protected]> > >> > wrote: > >> > > >> > > This discussion is more about the known problem of pendulum and how > we > >> > > could deal with it and maybe how we (as Community) might help autor. > >> > > > >> > > The library is mostly supported by a single author Sébastien > Eustace ( > >> > > https://github.com/sdispater) and it seems like we bump into the > >> > situation > >> > > which is described in xkcd #2347 ( > >> > > https://imgs.xkcd.com/comics/dependency.png). To be honest it is > not > >> > > something new when library mainly supported by one author so there > is > >> > > always a risk that the library will no longer be supported / > abandoned > >> > > And if takes in account that pendulum provides core functionality in > >> > > Airflow it could have dramatical impact in the future. > >> > > > >> > > Pendulum is a really nice library which helps a lot of developers to > >> work > >> > > with dates/datetimes. However there is one major problem, the last > >> > release > >> > > of this library happened more than 3 years ago ( > >> > > https://pypi.org/project/pendulum/#history) in the time when > Airflow > >> > > 1.10.11 was released > >> > > > >> > > Fortunately, the project is not abandoned and on a regular basis > >> commits > >> > > add into the master branch. However these commits are not included > >> into > >> > any > >> > > final release and that's why some things related to datetime don't > >> work > >> > as > >> > > expected in Airflow. There are list of known (for me) issues which > are > >> > > affect Airflow > >> > > > >> > > *Memory Leak on parse*: > >> > > - https://github.com/sdispater/pendulum/issues/720, this one > fixed 2 > >> > > years > >> > > ago but not available yet ( > >> > https://github.com/sdispater/pendulum/pull/563 > >> > > ). > >> > > Since we use parse dates in airflow codebase: datetime parameters > and > >> > > datetime in logs this one could be a reason for memory leakage in > >> > Airflow: > >> > > - https://github.com/apache/airflow/discussions/24694 > >> > > - https://github.com/apache/airflow/discussions/28597 > >> > > > >> > > *Incorrect time zones*, known issues and should be already fixed in > >> > master > >> > > branch > >> > > - https://github.com/sdispater/pendulum/issues/700, Mexico do not > use > >> > DST > >> > > anymore > >> > > - https://github.com/sdispater/pendulum/issues/706, Egypt reinstate > >> DST > >> > > > >> > > We add clarification in > https://github.com/apache/airflow/pull/30467, > >> > > however it seems like there is no other way rather than patching > >> Pendulum > >> > > right now. > >> > > > >> > > All these issues should be solved as soon as pendulum 3 is released. > >> The > >> > > current announced estimation is end of september/ beginning of > >> October: > >> > > > >> > https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677 > >> > > > >> > > So in theory we would have a fixed version of pendulum soon, and it > >> might > >> > > break something in Airflow but from my point of view it is better > than > >> > > current status. > >> > > > >> > > However there might be a situation where the release of the pendulum > >> > would > >> > > be postponed, so maybe better to have a backup plan. What could we > do > >> in > >> > > this case? > >> > > > >> > > Maybe we should start to use zoneinfo.ZoneInfo instead of pendulum > >> > > datetime? https://github.com/apache/airflow/issues/19450 > >> > > Pros: > >> > > - stdlib (python 3.9+) > >> > > - In pendulum 3.0 Timezone based on zoneinfo.Zoneinfo > >> > > > >> > > Cons: > >> > > - Current serialization model can't deal with backport packages. > E.g. > >> > > timezone which are serialized in backport_zoneinfo can't be > >> deserialized > >> > in > >> > > zoneinfo > >> > > > >> > > Maybe we should replace parse datetime with another solution. Does > >> anyone > >> > > know a good replacement? > >> > > > >> > > Maybe someone from Airflow Community could propose their help with > >> > > maintenance of library: > >> > > - https://github.com/sdispater/pendulum/issues/590 > >> > > > >> > > Maybe we should get rid of the pendulum at all, as a last resort > >> > solution. > >> > > I can't imagine how we could do that, because a lot of stuff depends > >> on > >> > the > >> > > pendulum and removing it would be a breaking change. > >> > > > >> > > ---- > >> > > Best Wishes > >> > > *Andrey Anshin* > >> > > > >> > > >> > > >> > -- > >> > > >> > -- > >> > Bolke de Bruin > >> > [email protected] > >> > > >> > > > > > > -- > > > > -- > > Bolke de Bruin > > [email protected] > > > > > -- > > -- > Bolke de Bruin > [email protected] >
