Hello everyone, I think we have a series of things that make it difficult to focus on such long term discussions - 2.3.0 was out, many people are busy with 2.3.1 which is going to focus on "teething" problems and we have Airflow Summit next week (yay!) and I know how many people in our community are either busy preparing the local events or their talks :).
I have some ideas and proposals on how we can approach the subject and would like to continue the discussion (I would still love to hear more voices), but I think it would be great if we can resume the discussion after the Summit. But - Summit is not only a "disruption" - it's also an opportunity to make the discussion better. I think the summit with the local events is a great opportunity to discuss this in person - and at least in 13 separate locations :). So I have a kind request to everyone - let's talk about it at the local events. I will be in both - London and Warsaw, so if you happen to be there - happy to share my thoughts with anyone interested and hear what you have to say :) - and I encourage similar discussions elsewhere. I think the decision on how we approach providers in the future is a very important one and we should take it very seriously and we should give anyone a chance to participate. It will define a bit the future of the whole Airflow Ecosystem. J. On Tue, Apr 26, 2022 at 12:43 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > I think this is a different story (and different discussion). > And I think we should have good reasons to split the repo. I think we > do have it but for different reasons many people think we will get > there sooner rather than later - but I think we should not hijack the > discussion for it. > This discussion is more for governance of providers rather than which > repo they are. > > Unless I am mistaken - moving providers to separate repo does not > really solve any of the "should we have more or less community > providers". It's really a technical split of code, but If we have > separate repo and we still add more providers from community we will > still have to make sure all of them can be installed, run the tests > the code, make sure they run with Airflow (released and main) and make > sure that airflow changes do not break it. > > It means about the same amount of safeguards and protection, CI > overhead we have now - only the code will be somewhere else, but the > amount of CI tests, when they are executing, who is allowed to merge > the code, approval process will remain the same as long as this will > be "apache Airflow PMC" project. > > J. > > On Tue, Apr 26, 2022 at 12:21 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > Hey all, > > > > Another alternative is separating out core providers from the Core Airflow > > Repo into a separate repo within the Apache Org itself, maybe: > > apache-airflow-providers. > > > > That will not decrease the maintenance from the Committers but the Core > > work and release will be completely separate and untangled from Apache > > Airflow repo and can move at a faster pace. > > > > The benefit and compromise for the community is that all the providers are > > still officially maintained and released by the committers. However, over > > time we can invite more committers who show active participation in > > apache-airflow-providers repo too. > > > > This is a compromise to the arguments about Providers being integral to the > > success of Airflow and as such should be maintained and released officially. > > > > Regards, > > Kaxil > > > > On Mon, 25 Apr 2022 at 19:17, Jarek Potiuk <ja...@potiuk.com> wrote: > >> > >> > 1. https://registry.astronomer.io/ > >> > 2. Using the new classifier > >> > https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider > >> > >> Yep. precisely what I thought to place at the top of the ecosystem page. > >> > >> > On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" > >> > <ferru...@amazon.com.INVALID> wrote: > >> >> > >> >> I still think that easy inclusion with a defined pruning process is > >> >> best, but it's looking like that is the minority opinion. In which > >> >> case, IFF we are going to be keeping them separate then I definitely > >> >> agree that there needs to be a fast/easy/convenient way to find them. > >> >> ________________________________ > >> >> From: Jarek Potiuk <ja...@potiuk.com> > >> >> Sent: Monday, April 25, 2022 7:17 AM > >> >> To: dev@airflow.apache.org > >> >> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the > >> >> community > >> >> > >> >> CAUTION: This email originated from outside of the organization. Do not > >> >> click links or open attachments unless you can confirm the sender and > >> >> know the content is safe. > >> >> > >> >> > >> >> > >> >> Just to come back to it (please everyone a little patience - I think > >> >> some people have not chimed in yet due to 2.3.0 "focus" so this > >> >> discussion might take a little more time. > >> >> > >> >> My current thinking on it so far: > >> >> > >> >> * I am not really in the camp of "lets not add any more providers at > >> >> all" and also not in the "let's accept all that are good quality code > >> >> providers". I think there are a few providers which "after fulfilling > >> >> all the criteria" could be added - mostly open-source standards, > >> >> generic, established technologies - but it should be rather limited > >> >> and rare event. > >> >> > >> >> * when there is a proprietary service which has not too broad reach > >> >> and it's not likely that we will have some committers who will be > >> >> maintaining it - becauyse they are users - the default option should > >> >> be to make a standalone per-service providers. the difficulty here is > >> >> to set the right "non-quality" criteria - but I think we really want > >> >> to limit any new code to maintain. Here maybe we can have some more > >> >> concrete criteria proposed - so that we do not have to vote > >> >> individually on each proposed providers - and so that those who want > >> >> to propose a provider could check themselves by reading the criteria, > >> >> what's best for them. > >> >> > >> >> * we might improve our "providers" list at the "ecosystem" to make > >> >> providers stand out a bit more (maybe simply put them on top and make > >> >> a clearly visible section). We are not going to maintain and keep the > >> >> nice "registry" similar to Astronomer's one (we could even actually > >> >> make the link to the Astronomer registry more prominent as the way to > >> >> "search" for providers on our Ecosystem Page. We could also add a link > >> >> to Pypi with the "aifrflow provider" classifier at the ecosystem page > >> >> as another way of searching for providers. All that is perfectly fine, > >> >> I think with the ASF Policies and spirit. And it will be good for > >> >> discovery. > >> >> > >> >> WDYT? > >> >> > >> >> J. > >> >> > >> >> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <samh...@union.ai> wrote: > >> >>> > >> >>> > >> >>> Hello! > >> >>> > >> >>> The reason behind submitting Flyte provider to the Airflow repository > >> >>> is because we felt it'd be effortless for the Airflow users to use the > >> >>> integration. Moreover, since it'd be under the umbrella of Airflow, we > >> >>> estimated that the Airflow users would not hesitate from using the > >> >>> provider. > >> >>> > >> >>> We could definitely have this as a standalone provider, but the > >> >>> easy-to-get-started incentive of Airflow providers seemed like a > >> >>> better option. > >> >>> > >> >>> If there's a sophisticated plan in place for having standalone > >> >>> providers in PyPI, we're up for it. > >> >>> > >> >>> Thanks, > >> >>> Samhita > >> >>> > >> >>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <alex...@gmail.com> wrote: > >> >>>> > >> >>>> > >> >>>> Hello all > >> >>>> > >> >>>> I want to try to explain a motivation behind submission of the Delta > >> >>>> Sharing provider: > >> >>>> > >> >>>> Let me start with the fact that the original issue was created > >> >>>> against Airflow repository, and it was accepted as potential new > >> >>>> functionality. And discussion about new providers has started almost > >> >>>> on the day when PR was submitted :-) > >> >>>> Delta Sharing is the OSS project under umbrella of the Linux > >> >>>> Foundation that defines a protocol and reference implementations. It > >> >>>> was started by the Databricks, but has other contributors as well - > >> >>>> that's why it wasn't pushed into a Databricks provider, as it's not > >> >>>> specific to Databricks. > >> >>>> Another thought about submitting it as a separate provider was to get > >> >>>> more people interested in this functionality and build additional > >> >>>> integrations on top of it. > >> >>>> Another important aspect of having providers in the Airflow > >> >>>> repository is that they are tested together with changes in the core > >> >>>> of the Airflow. > >> >>>> > >> >>>> I completely understand the concerns about more maintenance effort, > >> >>>> but my personal point of view (about it below) is similar to Rafal's > >> >>>> & John's - if there are well defined criteria & plans for > >> >>>> decommissioning or something like, then providers could be part of > >> >>>> the releases, etc. > >> >>>> > >> >>>> I just want to add that although I'm employed by Databricks, I'm not > >> >>>> a part of the development team - I'm in the field team who work with > >> >>>> customers, sees how they are using different tools, seeing pain > >> >>>> points, etc. Most of work so far was done on my own time - I'm doing > >> >>>> some coordination, but most of new functionality (AAD tokens support, > >> >>>> Repos, Databricks SQL operators, etc.) is coming from seeing > >> >>>> customers using Airflow together with Databricks. > >> >>>> > >> >>>> > >> >>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz > >> >>>> <rafalbieg...@google.com.invalid> wrote: > >> >>>>> > >> >>>>> > >> >>>>> Hi, > >> >>>>> > >> >>>>> I think that we will need to find some middle ground here - we are > >> >>>>> trying to optimize in many dimensions (Jarek mentioned 3 of them). > >> >>>>> Maybe I would also add another 4th dimension - Airflow Service > >> >>>>> Provider, :). > >> >>>>> > >> >>>>> Airflow users - whether they do self-managed Airflow or use "managed > >> >>>>> Airflow" provided by others are beneficients of the fact that > >> >>>>> Airflow has a decent portfolio of providers. > >> >>>>> It's not only a guarantee that these providers should work fine and > >> >>>>> they meet Airflow coding/testing standards. It's also a kind of > >> >>>>> guarantee, that once they start using Airflow > >> >>>>> with providers backed by the Airflow community they won't be on > >> >>>>> their own when it comes to troubleshooting/updating/etc. It will be > >> >>>>> much easier for them to convince their companies to use Airflow for > >> >>>>> production use cases as the Airflow platform (core + providers) is > >> >>>>> tested/maintained by the Airflow community. > >> >>>>> > >> >>>>> Keeping providers within the Airflow repository generates > >> >>>>> integration and maintenance work on the Airflow community side. On > >> >>>>> the other hand, if this work is not done within the community then > >> >>>>> this effort would need to be done by all users to a certain extent. > >> >>>>> So from this perspective it's more optimal for the community to do > >> >>>>> it so users can use off-the-shelf Airflow for the majority of their > >> >>>>> use cases > >> >>>>> > >> >>>>> When it comes to accepting new providers - I like John's suggestions: > >> >>>>> a) well defined standard that needs to be met by providers - passing > >> >>>>> the "provider qualification" would be some effort so each service > >> >>>>> provider would need to decide if it wouldn't be easier to maintain > >> >>>>> their provider on their own. > >> >>>>> b) well define lifecycle for providers - which would allow to > >> >>>>> identify providers that are obsolete/not popular any more and make > >> >>>>> them obsolete. > >> >>>>> > >> >>>>> Regards, Rafal. > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> I've been thinking about it - to make up my mind a little. The good > >> >>>>>> thing for me is that I have no strong opinion and I can rather > >> >>>>>> easily see (or so I think) of both sides. > >> >>>>>> > >> >>>>>> TL;DR; I think we need an explanation from the "Service Providers" > >> >>>>>> - what they want to achieve by contributing providers to the > >> >>>>>> community and see if we can achieve similar results differently. > >> >>>>>> > >> >>>>>> > >> >>>>>> Obviously I am a bit biased from the maintainer point of view, but > >> >>>>>> since I cooperate with various stakeholders i spoke to some of them > >> >>>>>> just see their point of view and this is what I got: > >> >>>>>> > >> >>>>>> Seems that we have really three types of stakeholders that are > >> >>>>>> really interested in "providers": > >> >>>>>> > >> >>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to > >> >>>>>> take care about its future and development and "grand vision" of > >> >>>>>> where we want to be in few years > >> >>>>>> 2) "Users" - those who use Airflow and integration with the Service > >> >>>>>> Provider > >> >>>>>> 3) "Service providers" - those who run the services that Airflow > >> >>>>>> integrates with - via providers (that group might also contain > >> >>>>>> those stakeholders that run Airflow "as a service") > >> >>>>>> > >> >>>>>> Let me see it from all the different POVs: > >> >>>>>> > >> >>>>>> > >> >>>>>> From 1) Maintainer POV > >> >>>>>> > >> >>>>>> More providers mean slower growth of the platform overall as the > >> >>>>>> more providers we add and manage as a community, the less time we > >> >>>>>> can spend on improving Airflow as a core. > >> >>>>>> Also the vision I think we all share is that Airflow is not a > >> >>>>>> "standalone orchestrator" any more - due to its popularity, reach > >> >>>>>> and power, it became an "orchestrating platform" and this is the > >> >>>>>> vision that keeps us - maintainers - busy. > >> >>>>>> > >> >>>>>> Over the last 2 years pretty much everything we do - make Airflow > >> >>>>>> "more extensible". You can add custom "secrets managers". > >> >>>>>> "timetables", "defferers" etc. "Customizability" is now built-in > >> >>>>>> and "theme" of being a modern platform. > >> >>>>>> Hell - we even recently added "Airflow Provider" trove classified > >> >>>>>> in PyPI: > >> >>>>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider > >> >>>>>> and the main justification in the discussion was that we expect > >> >>>>>> MORE 3rd-parties to use it, rather than relying on > >> >>>>>> "apache-airflow-provider" package name. > >> >>>>>> So from maintainer POV - having 3rd-party providers as "extensions" > >> >>>>>> to Airlow makes perfect sense and is the way to go. > >> >>>>>> > >> >>>>>> > >> >>>>>> From 2) User POV > >> >>>>>> > >> >>>>>> Users want to use Airflow with all the integrations they use > >> >>>>>> together. But only with those that they actually use. Similarly as > >> >>>>>> maintainers - supporting and needing all 70+ providers is something > >> >>>>>> they usually do not REALLY care about. > >> >>>>>> They literally care about the few providers they use. We even > >> >>>>>> taught the users that they can upgrade and install providers > >> >>>>>> separately from the core. So they already know they can mix and > >> >>>>>> match Airflow + Providers to get what they want. > >> >>>>>> > >> >>>>>> And they do use it - even if they use our image, the image only > >> >>>>>> contains a handful of the providers and when they need to install > >> >>>>>> new providers - they just install it from PyPI. And for that the > >> >>>>>> difference of "community providers" vs. 3rd party providers - > >> >>>>>> except the stamp of approval of the ASF, is not really visible. > >> >>>>>> Surely they can use [extras] to install the providers but that is > >> >>>>>> just a convenience and is definitely not needed by the users. > >> >>>>>> For example when they build a custom image they usually extend > >> >>>>>> Airflow and simply 'pip install <PROVIDER>' > >> >>>>>> As long as someone makes sure that the provider can be installed on > >> >>>>>> certain versions of Airflow - it does not matter. > >> >>>>>> > >> >>>>>> Also from the users perspective Airflow became "popular" enough > >> >>>>>> that it no longer needed "more integrations" to be more "appealing" > >> >>>>>> for the users. > >> >>>>>> They already use Airflow. They like it (hopefully) and the fact > >> >>>>>> that this or that provider is part of the community makes no > >> >>>>>> difference any more. > >> >>>>>> > >> >>>>>> > >> >>>>>> From 3) "Service providers" POV > >> >>>>>> > >> >>>>>> Here I am not sure. It's not very clear what service providers get > >> >>>>>> from being part of the "community providers". > >> >>>>>> > >> >>>>>> I hear that some big service (cloud providers) find it cool that we > >> >>>>>> give it the ASF "Stamp of Approval". And they are willing to pay > >> >>>>>> the price of a slower merge process, dependence on the community > >> >>>>>> and following strict rules of the ASF. > >> >>>>>> And the community also is happy to pay the price of maintaining > >> >>>>>> those (including the dependencies which Elad mention) to make sure > >> >>>>>> that all the community providers work in concert - because those > >> >>>>>> "Services" are hugely popular and we "want" as a community to > >> >>>>>> invest there. > >> >>>>>> But maintaining those deps in sync is a huge effort and it will > >> >>>>>> become even worse - the more we add. On the other hand for 3rd > >> >>>>>> party providers it will be EASIER to keep up. > >> >>>>>> They don't have to care about all the community providers to work > >> >>>>>> together, they can choose a subset. And when they release their > >> >>>>>> libraries they can take care about making sure the dependencies are > >> >>>>>> not broken. > >> >>>>>> > >> >>>>>> There are other "drawbacks" for being a "community" provider. For > >> >>>>>> example we have the rule that we support the min-Airflow version > >> >>>>>> for providers from the community 12 months after Airflow release. > >> >>>>>> This means that users of Airflow 2.1 will not receive updates for > >> >>>>>> the providers after 21st of May. This is the price to pay for > >> >>>>>> community-managed providers. We will not release bug fixes in > >> >>>>>> providers or changes for Airflow 2.1 users after 21st of May. > >> >>>>>> But if you manage your own provider - you still can support 2.0 or > >> >>>>>> even 1.10 if you want. > >> >>>>>> > >> >>>>>> I cannot really see why a Service Provider would want to become an > >> >>>>>> Airflow Community Provider. > >> >>>>>> > >> >>>>>> And I am not really sure what Flyte, Delta Sharing, Versatile Data > >> >>>>>> Kit, and Cloudera people think and why they think this is the best > >> >>>>>> choice. > >> >>>>>> > >> >>>>>> I think when we understand what the "Service Providers" want to > >> >>>>>> achieve this way, maybe we will be able to come up with some middle > >> >>>>>> ground and at least set some rules when it makes sense and when it > >> >>>>>> does not make sense. > >> >>>>>> Maybe 'contributing provider' is the way to achieve something else > >> >>>>>> and we simply do not realize that in the new "Airflow as a > >> >>>>>> Platform" world, all the stakeholders can achieve very similar > >> >>>>>> results using different approaches. > >> >>>>>> > >> >>>>>> * For example we could think about how we can make it easier for > >> >>>>>> Airflow users to discover and install their providers - without > >> >>>>>> actually taking ownership of the code by the community. > >> >>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider > >> >>>>>> pass a "compliance check" as suggested above > >> >>>>>> * Or maybe we could introduce a "breeze" extension to be able to > >> >>>>>> install and test provider in the "latest airflow" so that the > >> >>>>>> service providers could check it before we even release airflow and > >> >>>>>> dependencies > >> >>>>>> > >> >>>>>> So what I think we really need - Alex, Samhita, Andon, Philippe (I > >> >>>>>> think) - could you tell us (every one of you separately) - what are > >> >>>>>> your goals when you came up with the "contribute the new provider" > >> >>>>>> idea? > >> >>>>>> > >> >>>>>> J. > >> >>>>>> > >> >>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Ash what is your recommendation for the users should we follow > >> >>>>>>> your suggestion? > >> >>>>>>> This means that the big big big joy of using airflow constraints > >> >>>>>>> and getting a working environment with all required providers will > >> >>>>>>> be no more. > >> >>>>>>> So users will get a working "Vanilla" Airflow and then will need > >> >>>>>>> to figure out how they are going to tackle independent providers > >> >>>>>>> that may not be able to coexist one with another. > >> >>>>>>> This means that users will need to create their own constraints > >> >>>>>>> mechanism and maintain it. > >> >>>>>>> > >> >>>>>>> From my perspective this increases the complexity of getting > >> >>>>>>> Airflow to be production ready. > >> >>>>>>> I know that we say providers vs core but I think that from users > >> >>>>>>> perspective providers are an integral part of Airflow. > >> >>>>>>> Having the best scheduler and the best UI is not enough. Providers > >> >>>>>>> are a crucial part that complete the set. > >> >>>>>>> > >> >>>>>>> Maybe eventually there should be something like a provider store > >> >>>>>>> where there can be official providers and 3rd party providers. > >> >>>>>>> > >> >>>>>>> This may be even greater discussion than what we are having here. > >> >>>>>>> It feels more like Airflow as a product vs Airflow as an ecosystem. > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty > >> >>>>>>> <col...@astronomer.io.invalid> wrote: > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I > >> >>>>>>>> think in an ideal world even the providers currently part of the > >> >>>>>>>> Airflow repo would be managed separately. (I'm not actually > >> >>>>>>>> suggesting removing any providers.) I don't think it's a matter > >> >>>>>>>> of gatekeeping, I just think it's actually kind of odd to have > >> >>>>>>>> providers in the same repo as core Airflow, and it increases > >> >>>>>>>> confusion about Airflow versions vs provider package versions. > >> >>>>>>>> > >> >>>>>>>> Collin McNulty > >> >>>>>>>> > >> >>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek > >> >>>>>>>> <turbas...@apache.org> wrote: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining > >> >>>>>>>>> the packages may streamline many aspects for providers/companies. > >> >>>>>>>>> > >> >>>>>>>>> 1. They are owners so they can merge and release whenever they > >> >>>>>>>>> need. > >> >>>>>>>>> 2. It’s easier for them to add E2E tests and manage the > >> >>>>>>>>> resources needed for running them. > >> >>>>>>>>> 3. The development of the package can be incorporated into their > >> >>>>>>>>> company processes - not every company is used to OSS mode. > >> >>>>>>>>> > >> >>>>>>>>> Whatever way we go - we should have some basics guidelines and > >> >>>>>>>>> requirements (for example to brand a provider as “recommended by > >> >>>>>>>>> community” or something). > >> >>>>>>>>> > >> >>>>>>>>> Cheers, > >> >>>>>>>>> Tomsk > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> With best wishes, Alex Ott > >> >>>> http://alexott.net/ > >> >>>> Twitter: alexott_en (English), alexott (Russian)