Coming back to it - what about the "polypill" :)? What's different vs the "common.sql" way of doing it ? How do we think it can work ?
On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > The symbolic link approach seems to disregard all the external > providers, unless I misunderstand it. > > Not really. It just does not make it easy for the external providers to > use it "fast". They can still - if they want to just manually copy those > utils from the latest version of Airflow if they want to use it. Almost by > definition, those will be small, independent modules that can be simply > copied as needed by whoever releases external providers - and they are also > free to copy any older version if they want. That is a nice feature that > makes them fully decoupled from the version of Airflow they are installed > in (same as community providers). Or - if they want they can just import > them from "airflow.provider_utils" - but then they have to add >= Airflow > 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want > to use symbolic links - because due to our policies and promises, we do not > want community providers to depend on latest version of Airflow in vast > majority of cases. > > So this approach is also fully usable by external providers, but it > requires some manual effort to copy the modules to their providers. > > > I like the polypill idea. A backport provider that brings new interfaces > to providers without the actual functionalities. > > I would love to hear more about this, I think the "common.util" was > exactly the kind of polyfill approach (with its own versioning > complexities) but maybe I do not understand how such a polyfill provider > would work. Say we want to add a new "urlparse" method usable for all > providers. Could you explain how it would work - say: > > * we add "urlparse" in Airflow 2.9 > * some provider wants to use it in Airflow 2.7 > > What providers, with what code/interfaces we would have to release in this > case and what dependencies such providers that want to use it (both > community and Airflow should have)? I **think** that would mean exactly the > "common.<something>" approach we already have with "io" and "sql", but > maybe I do not understand it :) > > On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <t...@astronomer.io.invalid> > wrote: > >> I like the polypill idea. A backport provider that brings new interfaces >> to providers without the actual functionalities. >> >> >> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org> >> wrote: >> > >> >> That's why I generally do >> > not like the "util" approach because common packaging introduces >> > unnecessary coupling (you have to upgrade independent utils together). >> > >> > From my experience with releasing OpenLineage where we do things >> similarly: >> > I think that's the advantage of it, but only _if_ you can release those >> > together. >> > With "build-in" providers it makes sense, but could be burdensome if >> > "external" >> > ones would depend on that functionality. >> > >> >> I know it's not been the original idea behind providers, but - after >> > testing common.sql and now also having common.io, seems like more and >> more >> > we would like to extract some common code that we would like providers >> to >> > use, but we refrain from it, because it will only be actually usable 6 >> > months after we introduce some common code. >> > >> > So, maybe better approach would be to introduce the functionality into >> > core, >> > and use common.X provider as "polyfill" (to borrow JS nomenclature) >> > to make sure providers could use that functionality now, where external >> > ones could depend on the Airflow ones? >> > >> > The symbolic link approach seems to disregard all the external >> providers, >> > unless >> > I misunderstand it. >> > >> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a): >> > >> >>> Ideally utilities for each purpose (parsing URI, reading Object >> Storage, >> >> reading SQL, etc.) should each have its own utility package, so they >> can be >> >> released independently without dependency problems popping up if we >> need to >> >> break compatibility to one purpose. But more providers are >> exponentially >> >> more difficult to maintain, so I’d settle for one utility provider for >> now >> >> and split further if needed in the future. >> >> >> >> Very much agree with this general statement. That's why I generally do >> >> not like the "util" approach because common packaging introduces >> >> unnecessary coupling (you have to upgrade independent utils together). >> And >> >> when we have a common set of things that seem to make sense to be >> released >> >> together when upgraded we should package them together in >> >> "common.<something concrete" (like we have with common.io and >> common.sql). >> >> >> >> However - in this case, I think what Jens proposed (and I am happy to >> try >> >> as well) is to attempt to use symbolic links - i.e. add the code in >> >> `airflow.util` but then create a symbolic link in the provider. I >> tested >> >> it yesterday and it works as expected - i.e. such symbolic link is >> >> dereferenced and the provider package contains the python file, not >> >> symbolic link. That seems like a much more lightweight approach that >> will >> >> serve the purpose of "common.util" much better. The only thing we will >> have >> >> to take care of (and we can add it once the POC is successful) is to >> add >> >> some pre-commit protection that those kind of symbolically linked util >> >> modules are imported in providers, from inside of those provider, not >> from >> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned >> - not >> >> depend on anything in airflow code). We could create a new package for >> that >> >> in airlfow >> >> "airlfow.provider_utils" for example - and make sure (as next step) >> that >> >> anything from that package is never directly imported by any provider, >> and >> >> whenever provider uses it, it should be symbolic link inside that >> provider. >> >> That's all automatable and we can prevent mistakes via pre-commit. >> >> >> >> I think that might lead to a very lightweight approach where we >> introduce >> >> new common functionality which is immediately reusable in providers >> without >> >> the hassle of taking care about backwards compatibility, and managing >> the >> >> "common.util" provider. At the expense of a bit complex pre-commit that >> >> will guard the usage of it. >> >> >> >> Seems that it might be the "Eat cake and have it too" way that we've >> been >> >> looking for. >> >> >> >> J. >> >> >> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung >> <t...@astronomer.io.invalid> >> >> wrote: >> >> >> >>> It would help in the sense mentioned in previous posts, yes. But one >> >> thing >> >>> I want to point out is, for the provider to actually be helpful, it >> must >> >> be >> >>> treated a bit differently from normal providers, but more like a >> separate >> >>> third-party dependency. Specifically, the provider should not have a >> >>> dependency to Core Airflow, so it can be released and depended on more >> >>> flexibly. >> >>> >> >>> Ideally utilities for each purpose (parsing URI, reading Object >> Storage, >> >>> reading SQL, etc.) should each have its own utility package, so they >> can >> >> be >> >>> released independently without dependency problems popping up if we >> need >> >> to >> >>> break compatibility to one purpose. But more providers are >> exponentially >> >>> more difficult to maintain, so I’d settle for one utility provider for >> >> now >> >>> and split further if needed in the future. >> >>> >> >>> TP >> >>> >> >>> >> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) < >> >>> jens.scheff...@de.bosch.com.INVALID> wrote: >> >>>> >> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and >> >>> validate URIs for datasets? >> >>>> >> >>>> Mit freundlichen Grüßen / Best regards >> >>>> >> >>>> Jens Scheffler >> >>>> >> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) >> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | >> >>> GERMANY | www.bosch.com >> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >> >>> jens.scheff...@de.bosch.com >> >>>> >> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >> Markus >> >>> Forschner, >> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert >> >>>> >> >>>> -----Original Message----- >> >>>> From: Jarek Potiuk <ja...@potiuk.com> >> >>>> Sent: Donnerstag, 22. Februar 2024 00:53 >> >>>> To: dev@airflow.apache.org >> >>>> Subject: Re: [DISCUSS] Common.util provider? >> >>>> >> >>>> Yep. It could work with symbolic links. Tested it and with flit - >> both >> >>> wheel and sdist packaged code such symbolically linked file is >> >> dereferenced >> >>> and copy of the file is added there. It could be a nice way of doing >> it. >> >>>> >> >>>> Maybe then worth trying next time if someone has a need? >> >>>> >> >>>> J >> >>>> >> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) < >> >>> jens.scheff...@de.bosch.com.invalid> wrote: >> >>>> >> >>>>>>>> As of additional dependency complexity between providers actually >> >>>>>>>> the >> >>>>> additional dependency I think creates more problems than the >> benefit… >> >>>>> would be cool if there would be an option to „inline“ common code >> from >> >>>>> a single place but keep individual providers fully independent… >> >>>>> >> >>>>>> Well, we already do a lot of inlining, so if we think we should do >> >>>>>> more, >> >>>>> we have mechanisms for that. We have pre-commits and release >> commands >> >>>>> that do a lot of that. Pre commits are inlining scripts in >> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py >> files >> >>>>> and changelogs and index documentation .rst (partially) are >> generated >> >>>>> at release documentation preparation time, pyproject.toml for >> >>>>> providers are generated from common templates at package building >> time >> >>>>> and so on and so on :). So we can do more of that and generate >> common >> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. >> But >> >>>>> again "can't have and eat cake" - this has the drawback that there >> are >> >>>>> extra steps involved and even if it's automated it does add friction >> >>>>> when you have to regenerate the code every time you change it and >> when >> >>>>> you change it in another place than where you use it. >> >>>>> >> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we >> really >> >>>>> in-line and have a pre-commit aligning the redundancy of python in >> >>> folders. >> >>>>> Might need to be an opt-in if only 10 of 85 providers are using >> common >> >>>>> stuff and if we change a common line we probably do not need to >> affect >> >>>>> all providers. >> >>>>> >> >>>>> As long as no Windows users trying to code for airflow (do we need >> to >> >>>>> consider?) would it also work to use symlinks? Git can cope with >> this, >> >>>>> I don't know if the python toolchain can de-reference a copy and are >> >>>>> not packaging a symlink? Would be worth a test... would save the >> >>>>> pre-commit and we even could selectively include common bla into >> >>>>> providers as needed :-D >> >>>>> >> >>>>> Mit freundlichen Grüßen / Best regards >> >>>>> >> >>>>> Jens Scheffler >> >>>>> >> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH | >> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | >> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >> >>>>> jens.scheff...@de.bosch.com >> >>>>> >> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, >> Dr. >> >>>>> Tanja Rückert >> >>>>> >> >>>>> -----Original Message----- >> >>>>> From: Jarek Potiuk <ja...@potiuk.com> >> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18 >> >>>>> To: dev@airflow.apache.org >> >>>>> Subject: Re: [DISCUSS] Common.util provider? >> >>>>> >> >>>>>> if we have a common piece then we are locking all depending >> >>>>>> providers >> >>>>> (potentially) together if common code changes >> >>>>> >> >>>>> Yes, coupling in this case is the drawback of this solution. You >> can't >> >>>>> have cake and eat it too and in this case you trade DRY with >> coupling. >> >>>>> >> >>>>>> As of additional dependency complexity between providers actually >> >>>>>> the >> >>>>> additional dependency I think creates more problems than the >> benefit… >> >>>>> would be cool if there would be an option to „inline“ common code >> from >> >>>>> a single place but keep individual providers fully independent… >> >>>>> >> >>>>> Well, we already do a lot of inlining, so if we think we should do >> >>>>> more, we have mechanisms for that. We have pre-commits and release >> >>>>> commands that do a lot of that. Pre commits are inlining scripts in >> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py >> files >> >>>>> and changelogs and index documentation .rst (partially) are >> generated >> >>>>> at release documentation preparation time, pyproject.toml for >> >>>>> providers are generated from common templates at package building >> time >> >>>>> and so on and so on :). So we can do more of that and generate >> common >> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. >> But >> >>>>> again "can't have and eat cake" - this has the drawback that there >> are >> >>>>> extra steps involved and even if it's automated it does add friction >> >>>>> when you have to regenerate the code every time you change it and >> when >> >>>>> you change it in another place than where you use it. >> >>>>> >> >>>>> J. >> >>>>> >> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) < >> >>>>> jens.scheff...@de.bosch.com.invalid> wrote: >> >>>>> >> >>>>>> Hi Jarek, >> >>>>>> >> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling >> >>>>>> that a lot of very similar code is repeated in all the providers. >> >>>>>> But during review yesterday I dropped the ides because if we have a >> >>>>>> common piece then we are locking all depending providers >> >>>>>> (potentially) together if common code changes. >> >>>>>> As of additional dependency complexity between providers actually >> >>>>>> the additional dependency I think creates more prblems than the >> >>>>>> benefit… would be cool if tehere would be an option to „inline“ >> >>>>>> common code from a single place but keep individual providers fully >> >>>>>> independent… >> >>>>>> >> >>>>>> Jens >> >>>>>> >> >>>>>> Sent from Outlook for >> >>>>>> iOS< >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F% >> >>>>>> 2F >> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com >> %7C98c88 >> >>>>>> 97 >> >>>>>> >> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0 >> >>>>>> %7 >> >>>>>> >> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ >> >>>>>> Ij >> >>>>>> >> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn >> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0> >> >>>>>> ________________________________ >> >>>>>> From: Jarek Potiuk <ja...@potiuk.com> >> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM >> >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org> >> >>>>>> Subject: [DISCUSS] Common.util provider? >> >>>>>> >> >>>>>> Hello everyone, >> >>>>>> >> >>>>>> How do we feel about introducing a common.util provider? >> >>>>>> >> >>>>>> I know it's not been the original idea behind providers, but - >> after >> >>>>>> testing common.sql and now also having common.io, seems like more >> >>>>>> and more we would like to extract some common code that we would >> >>>>>> like providers to use, but we refrain from it, because it will only >> >>>>>> be actually usable 6 months after we introduce some common code. >> >>>>>> >> >>>>>> However, if we introduce common.util, this problem is generally >> gone >> >>>>>> - at the expense of more complex maintenance and cross-provider >> >>>>> dependencies. >> >>>>>> We should be able to add a common util method and use it in a >> >>>>>> provider at the same time. >> >>>>>> >> >>>>>> Think Amazon provider using a new feature released in common.util >> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it >> >>>>>> already for common.sql. We know how to do it, we know what to >> avoid, >> >>>>>> we know we cannot introduce backwards-incompatible changes, so we >> >>>>>> have to be very clear what is and what is not a public API there, >> We >> >>>>>> could rather easily add tests to prevent such >> backwards-incompatible >> >>>>>> changes. We even have a solution for chicken-egg providers where we >> >>>>>> need to release two providers at the same time if they depend on >> >>>>>> each other. Generally speaking it's quite workable but adds a bit >> of >> >>> overhead. >> >>>>>> >> >>>>>> Examples that we could implement as "common.util": >> >>>>>> >> >>>>>> - common versioning check with cache - where multiple providers >> >>>>>> could reuse "do we have pendulum 2" >> >>>>>> - more complex - some date management features (we have a few like >> >>>>>> date_ranges/round_time). But there are many more. >> >>>>>> >> >>>>>> I generally do not love the common "util" approach. It has a >> >>>>>> tendency to become a bag of everything over time. but if we limit >> it >> >>>>>> to a set of small, fully decoupled modules where each module is >> >>>>>> independent - it's OK. And we already have it in "airflow.util" and >> >>>>>> we seem to be >> >>>>> doing well. >> >>>>>> >> >>>>>> WDYT? Is it worth it ? >> >>>>>> >> >>>>>> J. >> >>>>>> >> >>>>> >> >>>> >> >>>> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >> >>> >> >>> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >>