> 1. https://registry.astronomer.io/ > 2. Using the new classifier > https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
Yep. precisely what I thought to place at the top of the ecosystem page. > On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" > <ferru...@amazon.com.INVALID> wrote: >> >> I still think that easy inclusion with a defined pruning process is best, >> but it's looking like that is the minority opinion. In which case, IFF we >> are going to be keeping them separate then I definitely agree that there >> needs to be a fast/easy/convenient way to find them. >> ________________________________ >> From: Jarek Potiuk <ja...@potiuk.com> >> Sent: Monday, April 25, 2022 7:17 AM >> To: dev@airflow.apache.org >> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community >> >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you can confirm the sender and know >> the content is safe. >> >> >> >> Just to come back to it (please everyone a little patience - I think >> some people have not chimed in yet due to 2.3.0 "focus" so this >> discussion might take a little more time. >> >> My current thinking on it so far: >> >> * I am not really in the camp of "lets not add any more providers at >> all" and also not in the "let's accept all that are good quality code >> providers". I think there are a few providers which "after fulfilling >> all the criteria" could be added - mostly open-source standards, >> generic, established technologies - but it should be rather limited >> and rare event. >> >> * when there is a proprietary service which has not too broad reach >> and it's not likely that we will have some committers who will be >> maintaining it - becauyse they are users - the default option should >> be to make a standalone per-service providers. the difficulty here is >> to set the right "non-quality" criteria - but I think we really want >> to limit any new code to maintain. Here maybe we can have some more >> concrete criteria proposed - so that we do not have to vote >> individually on each proposed providers - and so that those who want >> to propose a provider could check themselves by reading the criteria, >> what's best for them. >> >> * we might improve our "providers" list at the "ecosystem" to make >> providers stand out a bit more (maybe simply put them on top and make >> a clearly visible section). We are not going to maintain and keep the >> nice "registry" similar to Astronomer's one (we could even actually >> make the link to the Astronomer registry more prominent as the way to >> "search" for providers on our Ecosystem Page. We could also add a link >> to Pypi with the "aifrflow provider" classifier at the ecosystem page >> as another way of searching for providers. All that is perfectly fine, >> I think with the ASF Policies and spirit. And it will be good for >> discovery. >> >> WDYT? >> >> J. >> >> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <samh...@union.ai> wrote: >>> >>> >>> Hello! >>> >>> The reason behind submitting Flyte provider to the Airflow repository is >>> because we felt it'd be effortless for the Airflow users to use the >>> integration. Moreover, since it'd be under the umbrella of Airflow, we >>> estimated that the Airflow users would not hesitate from using the provider. >>> >>> We could definitely have this as a standalone provider, but the >>> easy-to-get-started incentive of Airflow providers seemed like a better >>> option. >>> >>> If there's a sophisticated plan in place for having standalone providers in >>> PyPI, we're up for it. >>> >>> Thanks, >>> Samhita >>> >>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <alex...@gmail.com> wrote: >>>> >>>> >>>> Hello all >>>> >>>> I want to try to explain a motivation behind submission of the Delta >>>> Sharing provider: >>>> >>>> Let me start with the fact that the original issue was created against >>>> Airflow repository, and it was accepted as potential new functionality. >>>> And discussion about new providers has started almost on the day when PR >>>> was submitted :-) >>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation >>>> that defines a protocol and reference implementations. It was started by >>>> the Databricks, but has other contributors as well - that's why it wasn't >>>> pushed into a Databricks provider, as it's not specific to Databricks. >>>> Another thought about submitting it as a separate provider was to get more >>>> people interested in this functionality and build additional integrations >>>> on top of it. >>>> Another important aspect of having providers in the Airflow repository is >>>> that they are tested together with changes in the core of the Airflow. >>>> >>>> I completely understand the concerns about more maintenance effort, but my >>>> personal point of view (about it below) is similar to Rafal's & John's - >>>> if there are well defined criteria & plans for decommissioning or >>>> something like, then providers could be part of the releases, etc. >>>> >>>> I just want to add that although I'm employed by Databricks, I'm not a >>>> part of the development team - I'm in the field team who work with >>>> customers, sees how they are using different tools, seeing pain points, >>>> etc. Most of work so far was done on my own time - I'm doing some >>>> coordination, but most of new functionality (AAD tokens support, Repos, >>>> Databricks SQL operators, etc.) is coming from seeing customers using >>>> Airflow together with Databricks. >>>> >>>> >>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz >>>> <rafalbieg...@google.com.invalid> wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I think that we will need to find some middle ground here - we are trying >>>>> to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would >>>>> also add another 4th dimension - Airflow Service Provider, :). >>>>> >>>>> Airflow users - whether they do self-managed Airflow or use "managed >>>>> Airflow" provided by others are beneficients of the fact that Airflow has >>>>> a decent portfolio of providers. >>>>> It's not only a guarantee that these providers should work fine and they >>>>> meet Airflow coding/testing standards. It's also a kind of guarantee, >>>>> that once they start using Airflow >>>>> with providers backed by the Airflow community they won't be on their own >>>>> when it comes to troubleshooting/updating/etc. It will be much easier for >>>>> them to convince their companies to use Airflow for production use cases >>>>> as the Airflow platform (core + providers) is tested/maintained by the >>>>> Airflow community. >>>>> >>>>> Keeping providers within the Airflow repository generates integration and >>>>> maintenance work on the Airflow community side. On the other hand, if >>>>> this work is not done within the community then this effort would need to >>>>> be done by all users to a certain extent. So from this perspective it's >>>>> more optimal for the community to do it so users can use off-the-shelf >>>>> Airflow for the majority of their use cases >>>>> >>>>> When it comes to accepting new providers - I like John's suggestions: >>>>> a) well defined standard that needs to be met by providers - passing the >>>>> "provider qualification" would be some effort so each service provider >>>>> would need to decide if it wouldn't be easier to maintain their provider >>>>> on their own. >>>>> b) well define lifecycle for providers - which would allow to identify >>>>> providers that are obsolete/not popular any more and make them obsolete. >>>>> >>>>> Regards, Rafal. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>> >>>>>> >>>>>> I've been thinking about it - to make up my mind a little. The good >>>>>> thing for me is that I have no strong opinion and I can rather easily >>>>>> see (or so I think) of both sides. >>>>>> >>>>>> TL;DR; I think we need an explanation from the "Service Providers" - >>>>>> what they want to achieve by contributing providers to the community and >>>>>> see if we can achieve similar results differently. >>>>>> >>>>>> >>>>>> Obviously I am a bit biased from the maintainer point of view, but since >>>>>> I cooperate with various stakeholders i spoke to some of them just see >>>>>> their point of view and this is what I got: >>>>>> >>>>>> Seems that we have really three types of stakeholders that are really >>>>>> interested in "providers": >>>>>> >>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take >>>>>> care about its future and development and "grand vision" of where we >>>>>> want to be in few years >>>>>> 2) "Users" - those who use Airflow and integration with the Service >>>>>> Provider >>>>>> 3) "Service providers" - those who run the services that Airflow >>>>>> integrates with - via providers (that group might also contain those >>>>>> stakeholders that run Airflow "as a service") >>>>>> >>>>>> Let me see it from all the different POVs: >>>>>> >>>>>> >>>>>> From 1) Maintainer POV >>>>>> >>>>>> More providers mean slower growth of the platform overall as the more >>>>>> providers we add and manage as a community, the less time we can spend >>>>>> on improving Airflow as a core. >>>>>> Also the vision I think we all share is that Airflow is not a >>>>>> "standalone orchestrator" any more - due to its popularity, reach and >>>>>> power, it became an "orchestrating platform" and this is the vision that >>>>>> keeps us - maintainers - busy. >>>>>> >>>>>> Over the last 2 years pretty much everything we do - make Airflow "more >>>>>> extensible". You can add custom "secrets managers". "timetables", >>>>>> "defferers" etc. "Customizability" is now built-in and "theme" of being >>>>>> a modern platform. >>>>>> Hell - we even recently added "Airflow Provider" trove classified in >>>>>> PyPI: >>>>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider >>>>>> and the main justification in the discussion was that we expect MORE >>>>>> 3rd-parties to use it, rather than relying on "apache-airflow-provider" >>>>>> package name. >>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to >>>>>> Airlow makes perfect sense and is the way to go. >>>>>> >>>>>> >>>>>> From 2) User POV >>>>>> >>>>>> Users want to use Airflow with all the integrations they use together. >>>>>> But only with those that they actually use. Similarly as maintainers - >>>>>> supporting and needing all 70+ providers is something they usually do >>>>>> not REALLY care about. >>>>>> They literally care about the few providers they use. We even taught the >>>>>> users that they can upgrade and install providers separately from the >>>>>> core. So they already know they can mix and match Airflow + Providers to >>>>>> get what they want. >>>>>> >>>>>> And they do use it - even if they use our image, the image only contains >>>>>> a handful of the providers and when they need to install >>>>>> new providers - they just install it from PyPI. And for that the >>>>>> difference of "community providers" vs. 3rd party providers - except the >>>>>> stamp of approval of the ASF, is not really visible. >>>>>> Surely they can use [extras] to install the providers but that is just a >>>>>> convenience and is definitely not needed by the users. >>>>>> For example when they build a custom image they usually extend Airflow >>>>>> and simply 'pip install <PROVIDER>' >>>>>> As long as someone makes sure that the provider can be installed on >>>>>> certain versions of Airflow - it does not matter. >>>>>> >>>>>> Also from the users perspective Airflow became "popular" enough that it >>>>>> no longer needed "more integrations" to be more "appealing" for the >>>>>> users. >>>>>> They already use Airflow. They like it (hopefully) and the fact that >>>>>> this or that provider is part of the community makes no difference any >>>>>> more. >>>>>> >>>>>> >>>>>> From 3) "Service providers" POV >>>>>> >>>>>> Here I am not sure. It's not very clear what service providers get from >>>>>> being part of the "community providers". >>>>>> >>>>>> I hear that some big service (cloud providers) find it cool that we give >>>>>> it the ASF "Stamp of Approval". And they are willing to pay the price of >>>>>> a slower merge process, dependence on the community and following strict >>>>>> rules of the ASF. >>>>>> And the community also is happy to pay the price of maintaining those >>>>>> (including the dependencies which Elad mention) to make sure that all >>>>>> the community providers work in concert - because those "Services" are >>>>>> hugely popular and we "want" as a community to invest there. >>>>>> But maintaining those deps in sync is a huge effort and it will become >>>>>> even worse - the more we add. On the other hand for 3rd party providers >>>>>> it will be EASIER to keep up. >>>>>> They don't have to care about all the community providers to work >>>>>> together, they can choose a subset. And when they release their >>>>>> libraries they can take care about making sure the dependencies are not >>>>>> broken. >>>>>> >>>>>> There are other "drawbacks" for being a "community" provider. For >>>>>> example we have the rule that we support the min-Airflow version for >>>>>> providers from the community 12 months after Airflow release. >>>>>> This means that users of Airflow 2.1 will not receive updates for the >>>>>> providers after 21st of May. This is the price to pay for >>>>>> community-managed providers. We will not release bug fixes in providers >>>>>> or changes for Airflow 2.1 users after 21st of May. >>>>>> But if you manage your own provider - you still can support 2.0 or even >>>>>> 1.10 if you want. >>>>>> >>>>>> I cannot really see why a Service Provider would want to become an >>>>>> Airflow Community Provider. >>>>>> >>>>>> And I am not really sure what Flyte, Delta Sharing, Versatile Data Kit, >>>>>> and Cloudera people think and why they think this is the best choice. >>>>>> >>>>>> I think when we understand what the "Service Providers" want to achieve >>>>>> this way, maybe we will be able to come up with some middle ground and >>>>>> at least set some rules when it makes sense and when it does not make >>>>>> sense. >>>>>> Maybe 'contributing provider' is the way to achieve something else and >>>>>> we simply do not realize that in the new "Airflow as a Platform" world, >>>>>> all the stakeholders can achieve very similar results using different >>>>>> approaches. >>>>>> >>>>>> * For example we could think about how we can make it easier for Airflow >>>>>> users to discover and install their providers - without actually taking >>>>>> ownership of the code by the community. >>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a >>>>>> "compliance check" as suggested above >>>>>> * Or maybe we could introduce a "breeze" extension to be able to install >>>>>> and test provider in the "latest airflow" so that the service providers >>>>>> could check it before we even release airflow and dependencies >>>>>> >>>>>> So what I think we really need - Alex, Samhita, Andon, Philippe (I >>>>>> think) - could you tell us (every one of you separately) - what are your >>>>>> goals when you came up with the "contribute the new provider" idea? >>>>>> >>>>>> J. >>>>>> >>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> wrote: >>>>>>> >>>>>>> >>>>>>> Ash what is your recommendation for the users should we follow your >>>>>>> suggestion? >>>>>>> This means that the big big big joy of using airflow constraints and >>>>>>> getting a working environment with all required providers will be no >>>>>>> more. >>>>>>> So users will get a working "Vanilla" Airflow and then will need to >>>>>>> figure out how they are going to tackle independent providers that may >>>>>>> not be able to coexist one with another. >>>>>>> This means that users will need to create their own constraints >>>>>>> mechanism and maintain it. >>>>>>> >>>>>>> From my perspective this increases the complexity of getting Airflow to >>>>>>> be production ready. >>>>>>> I know that we say providers vs core but I think that from users >>>>>>> perspective providers are an integral part of Airflow. >>>>>>> Having the best scheduler and the best UI is not enough. Providers are >>>>>>> a crucial part that complete the set. >>>>>>> >>>>>>> Maybe eventually there should be something like a provider store where >>>>>>> there can be official providers and 3rd party providers. >>>>>>> >>>>>>> This may be even greater discussion than what we are having here. It >>>>>>> feels more like Airflow as a product vs Airflow as an ecosystem. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty >>>>>>> <col...@astronomer.io.invalid> wrote: >>>>>>>> >>>>>>>> >>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think >>>>>>>> in an ideal world even the providers currently part of the Airflow >>>>>>>> repo would be managed separately. (I'm not actually suggesting >>>>>>>> removing any providers.) I don't think it's a matter of gatekeeping, I >>>>>>>> just think it's actually kind of odd to have providers in the same >>>>>>>> repo as core Airflow, and it increases confusion about Airflow >>>>>>>> versions vs provider package versions. >>>>>>>> >>>>>>>> Collin McNulty >>>>>>>> >>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <turbas...@apache.org> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the >>>>>>>>> packages may streamline many aspects for providers/companies. >>>>>>>>> >>>>>>>>> 1. They are owners so they can merge and release whenever they need. >>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources >>>>>>>>> needed for running them. >>>>>>>>> 3. The development of the package can be incorporated into their >>>>>>>>> company processes - not every company is used to OSS mode. >>>>>>>>> >>>>>>>>> Whatever way we go - we should have some basics guidelines and >>>>>>>>> requirements (for example to brand a provider as “recommended by >>>>>>>>> community” or something). >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Tomsk >>>> >>>> >>>> >>>> >>>> -- >>>> With best wishes, Alex Ott >>>> http://alexott.net/ >>>> Twitter: alexott_en (English), alexott (Russian)