Any other ideas or suggestions here? Can someone explain how the "polypill" approach would look like, maybe? How do we imagine this working?
Just to continue this discussion - another example. Small thing that David wanted to add for changes in some sql providers: @contextmanager def suppress_and_warn(*exceptions: type[BaseException]): """Context manager that suppresses the given exceptions and logs a warning message.""" try: yield except exceptions as e: warnings.warn(f"Exception suppressed: {e}\n{traceback.format_exc()}", category=UserWarning) https://github.com/apache/airflow/pull/38707/files#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R115 This is a small thing - but adding it in `airflow` is problematic - because it will only be released in 1.10, so we cannot use it in providers if we do. Currently - since it is used in sql providers, I suggested using `common.sql` for that code (and add >= 1.12 for common-sql-providers for those providers that use it). And I will write a separate email about a proposed versioning approach there. Do we have a good proposal on how we can solve similar things in the future? Do we want it at all? It has some challenges - yes it DRY's the code but it also introduces coupling. J. On Sun, Mar 10, 2024 at 6:21 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Coming back to it - what about the "polypill" :)? What's different vs the > "common.sql" way of doing it ? How do we think it can work ? > > On Thu, Feb 22, 2024 at 1:58 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> > The symbolic link approach seems to disregard all the external >> providers, unless I misunderstand it. >> >> Not really. It just does not make it easy for the external providers to >> use it "fast". They can still - if they want to just manually copy those >> utils from the latest version of Airflow if they want to use it. Almost by >> definition, those will be small, independent modules that can be simply >> copied as needed by whoever releases external providers - and they are also >> free to copy any older version if they want. That is a nice feature that >> makes them fully decoupled from the version of Airflow they are installed >> in (same as community providers). Or - if they want they can just import >> them from "airflow.provider_utils" - but then they have to add >= Airflow >> 2.9 if that util appeared in Airflow 2.9 (which is the main reason we want >> to use symbolic links - because due to our policies and promises, we do not >> want community providers to depend on latest version of Airflow in vast >> majority of cases. >> >> So this approach is also fully usable by external providers, but it >> requires some manual effort to copy the modules to their providers. >> >> > I like the polypill idea. A backport provider that brings new >> interfaces to providers without the actual functionalities. >> >> I would love to hear more about this, I think the "common.util" was >> exactly the kind of polyfill approach (with its own versioning >> complexities) but maybe I do not understand how such a polyfill provider >> would work. Say we want to add a new "urlparse" method usable for all >> providers. Could you explain how it would work - say: >> >> * we add "urlparse" in Airflow 2.9 >> * some provider wants to use it in Airflow 2.7 >> >> What providers, with what code/interfaces we would have to release in >> this case and what dependencies such providers that want to use it (both >> community and Airflow should have)? I **think** that would mean exactly the >> "common.<something>" approach we already have with "io" and "sql", but >> maybe I do not understand it :) >> >> On Thu, Feb 22, 2024 at 1:45 PM Tzu-ping Chung <t...@astronomer.io.invalid> >> wrote: >> >>> I like the polypill idea. A backport provider that brings new interfaces >>> to providers without the actual functionalities. >>> >>> >>> > On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org> >>> wrote: >>> > >>> >> That's why I generally do >>> > not like the "util" approach because common packaging introduces >>> > unnecessary coupling (you have to upgrade independent utils together). >>> > >>> > From my experience with releasing OpenLineage where we do things >>> similarly: >>> > I think that's the advantage of it, but only _if_ you can release those >>> > together. >>> > With "build-in" providers it makes sense, but could be burdensome if >>> > "external" >>> > ones would depend on that functionality. >>> > >>> >> I know it's not been the original idea behind providers, but - after >>> > testing common.sql and now also having common.io, seems like more and >>> more >>> > we would like to extract some common code that we would like providers >>> to >>> > use, but we refrain from it, because it will only be actually usable 6 >>> > months after we introduce some common code. >>> > >>> > So, maybe better approach would be to introduce the functionality into >>> > core, >>> > and use common.X provider as "polyfill" (to borrow JS nomenclature) >>> > to make sure providers could use that functionality now, where external >>> > ones could depend on the Airflow ones? >>> > >>> > The symbolic link approach seems to disregard all the external >>> providers, >>> > unless >>> > I misunderstand it. >>> > >>> > czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a): >>> > >>> >>> Ideally utilities for each purpose (parsing URI, reading Object >>> Storage, >>> >> reading SQL, etc.) should each have its own utility package, so they >>> can be >>> >> released independently without dependency problems popping up if we >>> need to >>> >> break compatibility to one purpose. But more providers are >>> exponentially >>> >> more difficult to maintain, so I’d settle for one utility provider >>> for now >>> >> and split further if needed in the future. >>> >> >>> >> Very much agree with this general statement. That's why I generally do >>> >> not like the "util" approach because common packaging introduces >>> >> unnecessary coupling (you have to upgrade independent utils >>> together). And >>> >> when we have a common set of things that seem to make sense to be >>> released >>> >> together when upgraded we should package them together in >>> >> "common.<something concrete" (like we have with common.io and >>> common.sql). >>> >> >>> >> However - in this case, I think what Jens proposed (and I am happy to >>> try >>> >> as well) is to attempt to use symbolic links - i.e. add the code in >>> >> `airflow.util` but then create a symbolic link in the provider. I >>> tested >>> >> it yesterday and it works as expected - i.e. such symbolic link is >>> >> dereferenced and the provider package contains the python file, not >>> >> symbolic link. That seems like a much more lightweight approach that >>> will >>> >> serve the purpose of "common.util" much better. The only thing we >>> will have >>> >> to take care of (and we can add it once the POC is successful) is to >>> add >>> >> some pre-commit protection that those kind of symbolically linked util >>> >> modules are imported in providers, from inside of those provider, not >>> from >>> >> airlfow, and make sure they are "standalone" (i.e. - as you mentioned >>> - not >>> >> depend on anything in airflow code). We could create a new package >>> for that >>> >> in airlfow >>> >> "airlfow.provider_utils" for example - and make sure (as next step) >>> that >>> >> anything from that package is never directly imported by any >>> provider, and >>> >> whenever provider uses it, it should be symbolic link inside that >>> provider. >>> >> That's all automatable and we can prevent mistakes via pre-commit. >>> >> >>> >> I think that might lead to a very lightweight approach where we >>> introduce >>> >> new common functionality which is immediately reusable in providers >>> without >>> >> the hassle of taking care about backwards compatibility, and managing >>> the >>> >> "common.util" provider. At the expense of a bit complex pre-commit >>> that >>> >> will guard the usage of it. >>> >> >>> >> Seems that it might be the "Eat cake and have it too" way that we've >>> been >>> >> looking for. >>> >> >>> >> J. >>> >> >>> >> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung >>> <t...@astronomer.io.invalid> >>> >> wrote: >>> >> >>> >>> It would help in the sense mentioned in previous posts, yes. But one >>> >> thing >>> >>> I want to point out is, for the provider to actually be helpful, it >>> must >>> >> be >>> >>> treated a bit differently from normal providers, but more like a >>> separate >>> >>> third-party dependency. Specifically, the provider should not have a >>> >>> dependency to Core Airflow, so it can be released and depended on >>> more >>> >>> flexibly. >>> >>> >>> >>> Ideally utilities for each purpose (parsing URI, reading Object >>> Storage, >>> >>> reading SQL, etc.) should each have its own utility package, so they >>> can >>> >> be >>> >>> released independently without dependency problems popping up if we >>> need >>> >> to >>> >>> break compatibility to one purpose. But more providers are >>> exponentially >>> >>> more difficult to maintain, so I’d settle for one utility provider >>> for >>> >> now >>> >>> and split further if needed in the future. >>> >>> >>> >>> TP >>> >>> >>> >>> >>> >>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) < >>> >>> jens.scheff...@de.bosch.com.INVALID> wrote: >>> >>>> >>> >>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse >>> and >>> >>> validate URIs for datasets? >>> >>>> >>> >>>> Mit freundlichen Grüßen / Best regards >>> >>>> >>> >>>> Jens Scheffler >>> >>>> >>> >>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) >>> >>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen >>> | >>> >>> GERMANY | www.bosch.com >>> >>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >>> >>> jens.scheff...@de.bosch.com >>> >>>> >>> >>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >>> >>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >>> >>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >>> Markus >>> >>> Forschner, >>> >>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert >>> >>>> >>> >>>> -----Original Message----- >>> >>>> From: Jarek Potiuk <ja...@potiuk.com> >>> >>>> Sent: Donnerstag, 22. Februar 2024 00:53 >>> >>>> To: dev@airflow.apache.org >>> >>>> Subject: Re: [DISCUSS] Common.util provider? >>> >>>> >>> >>>> Yep. It could work with symbolic links. Tested it and with flit - >>> both >>> >>> wheel and sdist packaged code such symbolically linked file is >>> >> dereferenced >>> >>> and copy of the file is added there. It could be a nice way of doing >>> it. >>> >>>> >>> >>>> Maybe then worth trying next time if someone has a need? >>> >>>> >>> >>>> J >>> >>>> >>> >>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) < >>> >>> jens.scheff...@de.bosch.com.invalid> wrote: >>> >>>> >>> >>>>>>>> As of additional dependency complexity between providers >>> actually >>> >>>>>>>> the >>> >>>>> additional dependency I think creates more problems than the >>> benefit… >>> >>>>> would be cool if there would be an option to „inline“ common code >>> from >>> >>>>> a single place but keep individual providers fully independent… >>> >>>>> >>> >>>>>> Well, we already do a lot of inlining, so if we think we should >>> do >>> >>>>>> more, >>> >>>>> we have mechanisms for that. We have pre-commits and release >>> commands >>> >>>>> that do a lot of that. Pre commits are inlining scripts in >>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py >>> files >>> >>>>> and changelogs and index documentation .rst (partially) are >>> generated >>> >>>>> at release documentation preparation time, pyproject.toml for >>> >>>>> providers are generated from common templates at package building >>> time >>> >>>>> and so on and so on :). So we can do more of that and generate >>> common >>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. >>> But >>> >>>>> again "can't have and eat cake" - this has the drawback that there >>> are >>> >>>>> extra steps involved and even if it's automated it does add >>> friction >>> >>>>> when you have to regenerate the code every time you change it and >>> when >>> >>>>> you change it in another place than where you use it. >>> >>>>> >>> >>>>> Yes, also thought a moment about pre-commit. I#d be okay if we >>> really >>> >>>>> in-line and have a pre-commit aligning the redundancy of python in >>> >>> folders. >>> >>>>> Might need to be an opt-in if only 10 of 85 providers are using >>> common >>> >>>>> stuff and if we change a common line we probably do not need to >>> affect >>> >>>>> all providers. >>> >>>>> >>> >>>>> As long as no Windows users trying to code for airflow (do we need >>> to >>> >>>>> consider?) would it also work to use symlinks? Git can cope with >>> this, >>> >>>>> I don't know if the python toolchain can de-reference a copy and >>> are >>> >>>>> not packaging a symlink? Would be worth a test... would save the >>> >>>>> pre-commit and we even could selectively include common bla into >>> >>>>> providers as needed :-D >>> >>>>> >>> >>>>> Mit freundlichen Grüßen / Best regards >>> >>>>> >>> >>>>> Jens Scheffler >>> >>>>> >>> >>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH | >>> >>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | >>> >>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 | >>> >>>>> jens.scheff...@de.bosch.com >>> >>>>> >>> >>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; >>> >>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; >>> >>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. >>> >>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, >>> Dr. >>> >>>>> Tanja Rückert >>> >>>>> >>> >>>>> -----Original Message----- >>> >>>>> From: Jarek Potiuk <ja...@potiuk.com> >>> >>>>> Sent: Mittwoch, 21. Februar 2024 21:18 >>> >>>>> To: dev@airflow.apache.org >>> >>>>> Subject: Re: [DISCUSS] Common.util provider? >>> >>>>> >>> >>>>>> if we have a common piece then we are locking all depending >>> >>>>>> providers >>> >>>>> (potentially) together if common code changes >>> >>>>> >>> >>>>> Yes, coupling in this case is the drawback of this solution. You >>> can't >>> >>>>> have cake and eat it too and in this case you trade DRY with >>> coupling. >>> >>>>> >>> >>>>>> As of additional dependency complexity between providers actually >>> >>>>>> the >>> >>>>> additional dependency I think creates more problems than the >>> benefit… >>> >>>>> would be cool if there would be an option to „inline“ common code >>> from >>> >>>>> a single place but keep individual providers fully independent… >>> >>>>> >>> >>>>> Well, we already do a lot of inlining, so if we think we should do >>> >>>>> more, we have mechanisms for that. We have pre-commits and release >>> >>>>> commands that do a lot of that. Pre commits are inlining scripts in >>> >>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py >>> files >>> >>>>> and changelogs and index documentation .rst (partially) are >>> generated >>> >>>>> at release documentation preparation time, pyproject.toml for >>> >>>>> providers are generated from common templates at package building >>> time >>> >>>>> and so on and so on :). So we can do more of that and generate >>> common >>> >>>>> code, it's just a matter of adding pre-commits or breeze scripts. >>> But >>> >>>>> again "can't have and eat cake" - this has the drawback that there >>> are >>> >>>>> extra steps involved and even if it's automated it does add >>> friction >>> >>>>> when you have to regenerate the code every time you change it and >>> when >>> >>>>> you change it in another place than where you use it. >>> >>>>> >>> >>>>> J. >>> >>>>> >>> >>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) < >>> >>>>> jens.scheff...@de.bosch.com.invalid> wrote: >>> >>>>> >>> >>>>>> Hi Jarek, >>> >>>>>> >>> >>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the >>> feeling >>> >>>>>> that a lot of very similar code is repeated in all the providers. >>> >>>>>> But during review yesterday I dropped the ides because if we have >>> a >>> >>>>>> common piece then we are locking all depending providers >>> >>>>>> (potentially) together if common code changes. >>> >>>>>> As of additional dependency complexity between providers actually >>> >>>>>> the additional dependency I think creates more prblems than the >>> >>>>>> benefit… would be cool if tehere would be an option to „inline“ >>> >>>>>> common code from a single place but keep individual providers >>> fully >>> >>>>>> independent… >>> >>>>>> >>> >>>>>> Jens >>> >>>>>> >>> >>>>>> Sent from Outlook for >>> >>>>>> iOS< >>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F% >>> >>>>>> 2F >>> >>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com >>> %7C98c88 >>> >>>>>> 97 >>> >>>>>> >>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0 >>> >>>>>> %7 >>> >>>>>> >>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ >>> >>>>>> Ij >>> >>>>>> >>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn >>> >>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0> >>> >>>>>> ________________________________ >>> >>>>>> From: Jarek Potiuk <ja...@potiuk.com> >>> >>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM >>> >>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org> >>> >>>>>> Subject: [DISCUSS] Common.util provider? >>> >>>>>> >>> >>>>>> Hello everyone, >>> >>>>>> >>> >>>>>> How do we feel about introducing a common.util provider? >>> >>>>>> >>> >>>>>> I know it's not been the original idea behind providers, but - >>> after >>> >>>>>> testing common.sql and now also having common.io, seems like more >>> >>>>>> and more we would like to extract some common code that we would >>> >>>>>> like providers to use, but we refrain from it, because it will >>> only >>> >>>>>> be actually usable 6 months after we introduce some common code. >>> >>>>>> >>> >>>>>> However, if we introduce common.util, this problem is generally >>> gone >>> >>>>>> - at the expense of more complex maintenance and cross-provider >>> >>>>> dependencies. >>> >>>>>> We should be able to add a common util method and use it in a >>> >>>>>> provider at the same time. >>> >>>>>> >>> >>>>>> Think Amazon provider using a new feature released in common.util >>> >>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it >>> >>>>>> already for common.sql. We know how to do it, we know what to >>> avoid, >>> >>>>>> we know we cannot introduce backwards-incompatible changes, so we >>> >>>>>> have to be very clear what is and what is not a public API there, >>> We >>> >>>>>> could rather easily add tests to prevent such >>> backwards-incompatible >>> >>>>>> changes. We even have a solution for chicken-egg providers where >>> we >>> >>>>>> need to release two providers at the same time if they depend on >>> >>>>>> each other. Generally speaking it's quite workable but adds a bit >>> of >>> >>> overhead. >>> >>>>>> >>> >>>>>> Examples that we could implement as "common.util": >>> >>>>>> >>> >>>>>> - common versioning check with cache - where multiple providers >>> >>>>>> could reuse "do we have pendulum 2" >>> >>>>>> - more complex - some date management features (we have a few like >>> >>>>>> date_ranges/round_time). But there are many more. >>> >>>>>> >>> >>>>>> I generally do not love the common "util" approach. It has a >>> >>>>>> tendency to become a bag of everything over time. but if we limit >>> it >>> >>>>>> to a set of small, fully decoupled modules where each module is >>> >>>>>> independent - it's OK. And we already have it in "airflow.util" >>> and >>> >>>>>> we seem to be >>> >>>>> doing well. >>> >>>>>> >>> >>>>>> WDYT? Is it worth it ? >>> >>>>>> >>> >>>>>> J. >>> >>>>>> >>> >>>>> >>> >>>> >>> >>>> >>> --------------------------------------------------------------------- >>> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>> >>> >>> >>> >> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>>