I like the polypill idea. A backport provider that brings new interfaces to providers without the actual functionalities.
> On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org> wrote: > >> That's why I generally do > not like the "util" approach because common packaging introduces > unnecessary coupling (you have to upgrade independent utils together). > > From my experience with releasing OpenLineage where we do things similarly: > I think that's the advantage of it, but only _if_ you can release those > together. > With "build-in" providers it makes sense, but could be burdensome if > "external" > ones would depend on that functionality. > >> I know it's not been the original idea behind providers, but - after > testing common.sql and now also having common.io, seems like more and more > we would like to extract some common code that we would like providers to > use, but we refrain from it, because it will only be actually usable 6 > months after we introduce some common code. > > So, maybe better approach would be to introduce the functionality into > core, > and use common.X provider as "polyfill" (to borrow JS nomenclature) > to make sure providers could use that functionality now, where external > ones could depend on the Airflow ones? > > The symbolic link approach seems to disregard all the external providers, > unless > I misunderstand it. > > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a): > >>> Ideally utilities for each purpose (parsing URI, reading Object Storage, >> reading SQL, etc.) should each have its own utility package, so they can be >> released independently without dependency problems popping up if we need to >> break compatibility to one purpose. But more providers are exponentially >> more difficult to maintain, so I’d settle for one utility provider for now >> and split further if needed in the future. >> >> Very much agree with this general statement. That's why I generally do >> not like the "util" approach because common packaging introduces >> unnecessary coupling (you have to upgrade independent utils together). And >> when we have a common set of things that seem to make sense to be released >> together when upgraded we should package them together in >> "common.<something concrete" (like we have with common.io and common.sql). >> >> However - in this case, I think what Jens proposed (and I am happy to try >> as well) is to attempt to use symbolic links - i.e. add the code in >> `airflow.util` but then create a symbolic link in the provider. I tested >> it yesterday and it works as expected - i.e. such symbolic link is >> dereferenced and the provider package contains the python file, not >> symbolic link. That seems like a much more lightweight approach that will >> serve the purpose of "common.util" much better. The only thing we will have >> to take care of (and we can add it once the POC is successful) is to add >> some pre-commit protection that those kind of symbolically linked util >> modules are imported in providers, from inside of those provider, not from >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned - not >> depend on anything in airflow code). We could create a new package for that >> in airlfow >> "airlfow.provider_utils" for example - and make sure (as next step) that >> anything from that package is never directly imported by any provider, and >> whenever provider uses it, it should be symbolic link inside that provider. >> That's all automatable and we can prevent mistakes via pre-commit. >> >> I think that might lead to a very lightweight approach where we introduce >> new common functionality which is immediately reusable in providers without >> the hassle of taking care about backwards compatibility, and managing the >> "common.util" provider. At the expense of a bit complex pre-commit that >> will guard the usage of it. >> >> Seems that it might be the "Eat cake and have it too" way that we've been >> looking for. >> >> J. >> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung <t...@astronomer.io.invalid> >> wrote: >> >>> It would help in the sense mentioned in previous posts, yes. But one >> thing >>> I want to point out is, for the provider to actually be helpful, it must >> be >>> treated a bit differently from normal providers, but more like a separate >>> third-party dependency. Specifically, the provider should not have a >>> dependency to Core Airflow, so it can be released and depended on more >>> flexibly. >>> >>> Ideally utilities for each purpose (parsing URI, reading Object Storage, >>> reading SQL, etc.) should each have its own utility package, so they can >> be >>> released independently without dependency problems popping up if we need >> to >>> break compatibility to one purpose. But more providers are exponentially >>> more difficult to maintain, so I’d settle for one utility provider for >> now >>> and split further if needed in the future. >>> >>> TP >>> >>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) < >>> jens.scheff...@de.bosch.com.INVALID> wrote: >>>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and >>> validate URIs for datasets? >>>> >>>> Mit freundlichen Grüßen / Best regards >>>> >>>> Jens Scheffler >>>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | >>> GERMANY | www.bosch.com >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >>> jens.scheff...@de.bosch.com >>>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus >>> Forschner, >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert >>>> >>>> -----Original Message----- >>>> From: Jarek Potiuk <ja...@potiuk.com> >>>> Sent: Donnerstag, 22. Februar 2024 00:53 >>>> To: dev@airflow.apache.org >>>> Subject: Re: [DISCUSS] Common.util provider? >>>> >>>> Yep. It could work with symbolic links. Tested it and with flit - both >>> wheel and sdist packaged code such symbolically linked file is >> dereferenced >>> and copy of the file is added there. It could be a nice way of doing it. >>>> >>>> Maybe then worth trying next time if someone has a need? >>>> >>>> J >>>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) < >>> jens.scheff...@de.bosch.com.invalid> wrote: >>>> >>>>>>>> As of additional dependency complexity between providers actually >>>>>>>> the >>>>> additional dependency I think creates more problems than the benefit… >>>>> would be cool if there would be an option to „inline“ common code from >>>>> a single place but keep individual providers fully independent… >>>>> >>>>>> Well, we already do a lot of inlining, so if we think we should do >>>>>> more, >>>>> we have mechanisms for that. We have pre-commits and release commands >>>>> that do a lot of that. Pre commits are inlining scripts in >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files >>>>> and changelogs and index documentation .rst (partially) are generated >>>>> at release documentation preparation time, pyproject.toml for >>>>> providers are generated from common templates at package building time >>>>> and so on and so on :). So we can do more of that and generate common >>>>> code, it's just a matter of adding pre-commits or breeze scripts. But >>>>> again "can't have and eat cake" - this has the drawback that there are >>>>> extra steps involved and even if it's automated it does add friction >>>>> when you have to regenerate the code every time you change it and when >>>>> you change it in another place than where you use it. >>>>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we really >>>>> in-line and have a pre-commit aligning the redundancy of python in >>> folders. >>>>> Might need to be an opt-in if only 10 of 85 providers are using common >>>>> stuff and if we change a common line we probably do not need to affect >>>>> all providers. >>>>> >>>>> As long as no Windows users trying to code for airflow (do we need to >>>>> consider?) would it also work to use symlinks? Git can cope with this, >>>>> I don't know if the python toolchain can de-reference a copy and are >>>>> not packaging a symlink? Would be worth a test... would save the >>>>> pre-commit and we even could selectively include common bla into >>>>> providers as needed :-D >>>>> >>>>> Mit freundlichen Grüßen / Best regards >>>>> >>>>> Jens Scheffler >>>>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH | >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >>>>> jens.scheff...@de.bosch.com >>>>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. >>>>> Tanja Rückert >>>>> >>>>> -----Original Message----- >>>>> From: Jarek Potiuk <ja...@potiuk.com> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18 >>>>> To: dev@airflow.apache.org >>>>> Subject: Re: [DISCUSS] Common.util provider? >>>>> >>>>>> if we have a common piece then we are locking all depending >>>>>> providers >>>>> (potentially) together if common code changes >>>>> >>>>> Yes, coupling in this case is the drawback of this solution. You can't >>>>> have cake and eat it too and in this case you trade DRY with coupling. >>>>> >>>>>> As of additional dependency complexity between providers actually >>>>>> the >>>>> additional dependency I think creates more problems than the benefit… >>>>> would be cool if there would be an option to „inline“ common code from >>>>> a single place but keep individual providers fully independent… >>>>> >>>>> Well, we already do a lot of inlining, so if we think we should do >>>>> more, we have mechanisms for that. We have pre-commits and release >>>>> commands that do a lot of that. Pre commits are inlining scripts in >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files >>>>> and changelogs and index documentation .rst (partially) are generated >>>>> at release documentation preparation time, pyproject.toml for >>>>> providers are generated from common templates at package building time >>>>> and so on and so on :). So we can do more of that and generate common >>>>> code, it's just a matter of adding pre-commits or breeze scripts. But >>>>> again "can't have and eat cake" - this has the drawback that there are >>>>> extra steps involved and even if it's automated it does add friction >>>>> when you have to regenerate the code every time you change it and when >>>>> you change it in another place than where you use it. >>>>> >>>>> J. >>>>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) < >>>>> jens.scheff...@de.bosch.com.invalid> wrote: >>>>> >>>>>> Hi Jarek, >>>>>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling >>>>>> that a lot of very similar code is repeated in all the providers. >>>>>> But during review yesterday I dropped the ides because if we have a >>>>>> common piece then we are locking all depending providers >>>>>> (potentially) together if common code changes. >>>>>> As of additional dependency complexity between providers actually >>>>>> the additional dependency I think creates more prblems than the >>>>>> benefit… would be cool if tehere would be an option to „inline“ >>>>>> common code from a single place but keep individual providers fully >>>>>> independent… >>>>>> >>>>>> Jens >>>>>> >>>>>> Sent from Outlook for >>>>>> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F% >>>>>> 2F >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88 >>>>>> 97 >>>>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0 >>>>>> %7 >>>>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ >>>>>> Ij >>>>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0> >>>>>> ________________________________ >>>>>> From: Jarek Potiuk <ja...@potiuk.com> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org> >>>>>> Subject: [DISCUSS] Common.util provider? >>>>>> >>>>>> Hello everyone, >>>>>> >>>>>> How do we feel about introducing a common.util provider? >>>>>> >>>>>> I know it's not been the original idea behind providers, but - after >>>>>> testing common.sql and now also having common.io, seems like more >>>>>> and more we would like to extract some common code that we would >>>>>> like providers to use, but we refrain from it, because it will only >>>>>> be actually usable 6 months after we introduce some common code. >>>>>> >>>>>> However, if we introduce common.util, this problem is generally gone >>>>>> - at the expense of more complex maintenance and cross-provider >>>>> dependencies. >>>>>> We should be able to add a common util method and use it in a >>>>>> provider at the same time. >>>>>> >>>>>> Think Amazon provider using a new feature released in common.util >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it >>>>>> already for common.sql. We know how to do it, we know what to avoid, >>>>>> we know we cannot introduce backwards-incompatible changes, so we >>>>>> have to be very clear what is and what is not a public API there, We >>>>>> could rather easily add tests to prevent such backwards-incompatible >>>>>> changes. We even have a solution for chicken-egg providers where we >>>>>> need to release two providers at the same time if they depend on >>>>>> each other. Generally speaking it's quite workable but adds a bit of >>> overhead. >>>>>> >>>>>> Examples that we could implement as "common.util": >>>>>> >>>>>> - common versioning check with cache - where multiple providers >>>>>> could reuse "do we have pendulum 2" >>>>>> - more complex - some date management features (we have a few like >>>>>> date_ranges/round_time). But there are many more. >>>>>> >>>>>> I generally do not love the common "util" approach. It has a >>>>>> tendency to become a bag of everything over time. but if we limit it >>>>>> to a set of small, fully decoupled modules where each module is >>>>>> independent - it's OK. And we already have it in "airflow.util" and >>>>>> we seem to be >>>>> doing well. >>>>>> >>>>>> WDYT? Is it worth it ? >>>>>> >>>>>> J. >>>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org