I like the polypill idea. A backport provider that brings new interfaces to 
providers without the actual functionalities.


> On 22 Feb 2024, at 20:41, Maciej Obuchowski <mobuchow...@apache.org> wrote:
> 
>> That's why I generally do
> not like the "util" approach because common packaging introduces
> unnecessary coupling (you have to upgrade independent utils together).
> 
> From my experience with releasing OpenLineage where we do things similarly:
> I think that's the advantage of it, but only _if_ you can release those
> together.
> With "build-in" providers it makes sense, but could be burdensome if
> "external"
> ones would depend on that functionality.
> 
>> I know it's not been the original idea behind providers, but - after
> testing common.sql and now also having common.io, seems like more and more
> we would like to extract some common code that we would like providers to
> use, but we refrain from it, because it will only be actually usable 6
> months after we introduce some common code.
> 
> So, maybe better approach would be to introduce the functionality into
> core,
> and use common.X provider as "polyfill" (to borrow JS nomenclature)
> to make sure providers could use that functionality now, where external
> ones could depend on the Airflow ones?
> 
> The symbolic link approach seems to disregard all the external providers,
> unless
> I misunderstand it.
> 
> czw., 22 lut 2024 o 13:28 Jarek Potiuk <ja...@potiuk.com> napisał(a):
> 
>>> Ideally utilities for each purpose (parsing URI, reading Object Storage,
>> reading SQL, etc.) should each have its own utility package, so they can be
>> released independently without dependency problems popping up if we need to
>> break compatibility to one purpose. But more providers are exponentially
>> more difficult to maintain, so I’d settle for one utility provider for now
>> and split further if needed in the future.
>> 
>> Very much agree with this general statement. That's why I generally do
>> not like the "util" approach because common packaging introduces
>> unnecessary coupling (you have to upgrade independent utils together). And
>> when we have a common set of things that seem to make sense to be released
>> together when upgraded we should package them together in
>> "common.<something concrete" (like we have with common.io and common.sql).
>> 
>> However - in this case, I think what Jens proposed (and I am happy to try
>> as well) is to attempt to use symbolic links - i.e. add the code in
>> `airflow.util` but then create a symbolic link in the provider.  I tested
>> it yesterday and it works as expected - i.e. such symbolic link is
>> dereferenced and the provider package contains the python file, not
>> symbolic link. That seems like a much more lightweight approach that will
>> serve the purpose of "common.util" much better. The only thing we will have
>> to take care of (and we can add it once the POC is successful) is to add
>> some pre-commit protection that those kind of symbolically linked util
>> modules are imported in providers, from inside of those provider, not from
>> airlfow, and make sure they are "standalone" (i.e. - as you mentioned - not
>> depend on anything in airflow code). We could create a new package for that
>> in airlfow
>> "airlfow.provider_utils" for example - and make sure (as next step) that
>> anything from that package is never directly imported by any provider, and
>> whenever provider uses it, it should be symbolic link inside that provider.
>> That's all automatable and we can prevent mistakes via pre-commit.
>> 
>> I think that might lead to a very lightweight approach where we introduce
>> new common functionality which is immediately reusable in providers without
>> the hassle of taking care about backwards compatibility, and managing the
>> "common.util" provider. At the expense of a bit complex pre-commit that
>> will guard the usage of it.
>> 
>> Seems that it might be the "Eat cake and have it too" way that we've been
>> looking for.
>> 
>> J.
>> 
>> On Thu, Feb 22, 2024 at 6:14 AM Tzu-ping Chung <t...@astronomer.io.invalid>
>> wrote:
>> 
>>> It would help in the sense mentioned in previous posts, yes. But one
>> thing
>>> I want to point out is, for the provider to actually be helpful, it must
>> be
>>> treated a bit differently from normal providers, but more like a separate
>>> third-party dependency. Specifically, the provider should not have a
>>> dependency to Core Airflow, so it can be released and depended on more
>>> flexibly.
>>> 
>>> Ideally utilities for each purpose (parsing URI, reading Object Storage,
>>> reading SQL, etc.) should each have its own utility package, so they can
>> be
>>> released independently without dependency problems popping up if we need
>> to
>>> break compatibility to one purpose. But more providers are exponentially
>>> more difficult to maintain, so I’d settle for one utility provider for
>> now
>>> and split further if needed in the future.
>>> 
>>> TP
>>> 
>>> 
>>>> On 22 Feb 2024, at 10:10, Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> jens.scheff...@de.bosch.com.INVALID> wrote:
>>>> 
>>>> @Uranusjr would this help as a pilot in your AIP-60 code to parse and
>>> validate URIs for datasets?
>>>> 
>>>> Mit freundlichen Grüßen / Best regards
>>>> 
>>>> Jens Scheffler
>>>> 
>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
>>>> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
>>> GERMANY | www.bosch.com
>>>> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>> jens.scheff...@de.bosch.com
>>>> 
>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
>>> Forschner,
>>>> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>>>> 
>>>> -----Original Message-----
>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>> Sent: Donnerstag, 22. Februar 2024 00:53
>>>> To: dev@airflow.apache.org
>>>> Subject: Re: [DISCUSS] Common.util provider?
>>>> 
>>>> Yep. It could work with symbolic links. Tested it and with flit - both
>>> wheel and sdist packaged code such symbolically linked file is
>> dereferenced
>>> and copy of the file is added there. It could be a nice way of doing it.
>>>> 
>>>> Maybe then worth trying next time if someone has a need?
>>>> 
>>>> J
>>>> 
>>>> On Thu, Feb 22, 2024 at 12:39 AM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>> jens.scheff...@de.bosch.com.invalid> wrote:
>>>> 
>>>>>>>> As of additional dependency complexity between providers actually
>>>>>>>> the
>>>>> additional dependency I think creates more problems than the benefit…
>>>>> would be cool if there would be an option to „inline“ common code from
>>>>> a single place but keep individual providers fully independent…
>>>>> 
>>>>>> Well, we already  do a lot of inlining, so if we think we should do
>>>>>> more,
>>>>> we have mechanisms for that. We have  pre-commits and release commands
>>>>> that do a lot of that. Pre commits are inlining scripts in
>>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files
>>>>> and changelogs and index documentation .rst (partially) are generated
>>>>> at release documentation preparation time, pyproject.toml for
>>>>> providers are generated from common templates at package building time
>>>>> and so on and so on :). So we can do more of that and generate common
>>>>> code, it's just a matter of adding pre-commits or breeze scripts. But
>>>>> again "can't have and eat cake" - this has the drawback that there are
>>>>> extra steps involved and even if it's automated it does add friction
>>>>> when you have to regenerate the code every time you change it and when
>>>>> you change it in another place than where you use it.
>>>>> 
>>>>> Yes, also thought a moment about pre-commit. I#d be okay if we really
>>>>> in-line and have a pre-commit aligning the redundancy of python in
>>> folders.
>>>>> Might need to be an opt-in if only 10 of 85 providers are using common
>>>>> stuff and if we change a common line we probably do not need to affect
>>>>> all providers.
>>>>> 
>>>>> As long as no Windows users trying to code for airflow (do we need to
>>>>> consider?) would it also work to use symlinks? Git can cope with this,
>>>>> I don't know if the python toolchain can de-reference a copy and are
>>>>> not packaging a symlink? Would be worth a test... would save the
>>>>> pre-commit and we even could selectively include common bla into
>>>>> providers as needed :-D
>>>>> 
>>>>> Mit freundlichen Grüßen / Best regards
>>>>> 
>>>>> Jens Scheffler
>>>>> 
>>>>> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) Robert Bosch GmbH |
>>>>> Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY |
>>>>> www.bosch.com Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
>>>>> jens.scheff...@de.bosch.com
>>>>> 
>>>>> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
>>>>> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
>>>>> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
>>>>> Markus Forschner, Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr.
>>>>> Tanja Rückert
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>>> Sent: Mittwoch, 21. Februar 2024 21:18
>>>>> To: dev@airflow.apache.org
>>>>> Subject: Re: [DISCUSS] Common.util provider?
>>>>> 
>>>>>> if we have a common piece then we are locking all depending
>>>>>> providers
>>>>> (potentially) together if common code changes
>>>>> 
>>>>> Yes, coupling in this case is the drawback of this solution. You can't
>>>>> have cake and eat it too and in this case you trade DRY with coupling.
>>>>> 
>>>>>> As of additional dependency complexity between providers actually
>>>>>> the
>>>>> additional dependency I think creates more problems than the benefit…
>>>>> would be cool if there would be an option to „inline“ common code from
>>>>> a single place but keep individual providers fully independent…
>>>>> 
>>>>> Well, we already  do a lot of inlining, so if we think we should do
>>>>> more, we have mechanisms for that. We have  pre-commits and release
>>>>> commands that do a lot of that. Pre commits are inlining scripts in
>>>>> Dockerfiles, shortening PyPI readme . The providers __init__.py files
>>>>> and changelogs and index documentation .rst (partially) are generated
>>>>> at release documentation preparation time, pyproject.toml for
>>>>> providers are generated from common templates at package building time
>>>>> and so on and so on :). So we can do more of that and generate common
>>>>> code, it's just a matter of adding pre-commits or breeze scripts. But
>>>>> again "can't have and eat cake" - this has the drawback that there are
>>>>> extra steps involved and even if it's automated it does add friction
>>>>> when you have to regenerate the code every time you change it and when
>>>>> you change it in another place than where you use it.
>>>>> 
>>>>> J.
>>>>> 
>>>>> On Wed, Feb 21, 2024 at 9:02 PM Scheffler Jens (XC-AS/EAE-ADA-T) <
>>>>> jens.scheff...@de.bosch.com.invalid> wrote:
>>>>> 
>>>>>> Hi Jarek,
>>>>>> 
>>>>>> At reviewing the PR from uranusjr for AIP-60 I also had the feeling
>>>>>> that a lot of very similar code is repeated in all the providers.
>>>>>> But during review yesterday I dropped the ides because if we have a
>>>>>> common piece then we are locking all depending providers
>>>>>> (potentially) together if common code changes.
>>>>>> As of additional dependency complexity between providers actually
>>>>>> the additional dependency I think creates more prblems than the
>>>>>> benefit… would be cool if tehere would be an option to „inline“
>>>>>> common code from a single place but keep individual providers fully
>>>>>> independent…
>>>>>> 
>>>>>> Jens
>>>>>> 
>>>>>> Sent from Outlook for
>>>>>> iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>>>>> 2F
>>>>>> aka.ms%2Fo0ukef&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C98c88
>>>>>> 97
>>>>>> 195d944d483ab08dc331a49bb%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0
>>>>>> %7
>>>>>> C638441435197193656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
>>>>>> Ij
>>>>>> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=n6gk9fNn
>>>>>> WB SJOPYEgJ9WbriZ3H4id3RhLr16SguOuFA%3D&reserved=0>
>>>>>> ________________________________
>>>>>> From: Jarek Potiuk <ja...@potiuk.com>
>>>>>> Sent: Wednesday, February 21, 2024 5:42:20 PM
>>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org>
>>>>>> Subject: [DISCUSS] Common.util provider?
>>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> How do we feel about introducing a common.util provider?
>>>>>> 
>>>>>> I know it's not been the original idea behind providers, but - after
>>>>>> testing common.sql and now also having common.io, seems like more
>>>>>> and more we would like to extract some common code that we would
>>>>>> like providers to use, but we refrain from it, because it will only
>>>>>> be actually usable 6 months after we introduce some common code.
>>>>>> 
>>>>>> However, if we introduce common.util, this problem is generally gone
>>>>>> - at the expense of more complex maintenance and cross-provider
>>>>> dependencies.
>>>>>> We should be able to add a common util method and use it in a
>>>>>> provider at the same time.
>>>>>> 
>>>>>> Think Amazon provider using a new feature released in common.util
>>>>>>> =1.2.0 and google provider >= 1.1.0. All manageable and we do it
>>>>>> already for common.sql. We know how to do it, we know what to avoid,
>>>>>> we know we cannot introduce backwards-incompatible changes, so we
>>>>>> have to be very clear what is and what is not a public API there, We
>>>>>> could rather easily add tests to prevent such backwards-incompatible
>>>>>> changes. We even have a solution for chicken-egg providers where we
>>>>>> need to release two providers at the same time if they depend on
>>>>>> each other. Generally speaking it's quite workable but adds a bit of
>>> overhead.
>>>>>> 
>>>>>> Examples that we could implement as "common.util":
>>>>>> 
>>>>>> - common versioning check with cache - where multiple providers
>>>>>> could reuse "do we have pendulum 2"
>>>>>> - more complex - some date management features (we have a few like
>>>>>> date_ranges/round_time). But there are many more.
>>>>>> 
>>>>>> I generally do not love the common "util" approach. It has a
>>>>>> tendency to become a bag of everything over time. but if we limit it
>>>>>> to a set of small, fully decoupled modules where each module is
>>>>>> independent - it's OK. And we already have it in "airflow.util" and
>>>>>> we seem to be
>>>>> doing well.
>>>>>> 
>>>>>> WDYT? Is it worth it ?
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to