Hello everyone,

I think we have a series of things that make it difficult to focus on
such  long term discussions  - 2.3.0 was out, many  people are busy
with 2.3.1 which is going to focus on "teething" problems and we have
Airflow Summit next week (yay!) and I know how many people in our
community are either busy preparing the local events or their talks
:).

I have some ideas and proposals on how we can approach the subject and
would like to continue the discussion (I would still love to hear more
voices), but I think it would be great if we can resume the discussion
after the Summit.

But - Summit is not only a "disruption" - it's also an opportunity to
make the discussion better. I think the summit with the local events
is a great opportunity to discuss this in person - and at least in 13
separate locations  :).

So I have a kind request to everyone - let's talk about it at the
local events. I will be in both - London and Warsaw, so if you happen
to be there - happy to share my thoughts with anyone interested  and
hear what you have to say :) - and I encourage similar discussions
elsewhere.

I think the decision on how we approach providers in the future is a
very important one and we should take it very seriously and we should
give anyone a chance to participate. It will define a bit the future
of the whole Airflow Ecosystem.

J.

On Tue, Apr 26, 2022 at 12:43 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> I think this is a different story (and different discussion).
> And I think we should have good reasons to split the repo. I think we
> do have it but for different reasons many people think we will get
> there sooner rather than later - but I think we should not hijack the
> discussion for it.
> This discussion is more for governance of providers rather than which
> repo they are.
>
> Unless I am mistaken - moving providers to separate repo does not
> really solve any of the "should we have more or less community
> providers". It's really a technical split of code, but If we have
> separate repo and we still add more providers from community we will
> still have to make sure all of them can be installed, run the tests
> the code, make sure they run with Airflow (released and main) and make
> sure that airflow changes do not break it.
>
> It means about the same amount of safeguards and protection, CI
> overhead we have now - only the code will be somewhere else, but the
> amount of CI tests, when they are executing, who is allowed to merge
> the code, approval process will remain the same as long as this will
> be "apache Airflow PMC" project.
>
> J.
>
> On Tue, Apr 26, 2022 at 12:21 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > Hey all,
> >
> > Another alternative is separating out core providers from the Core Airflow 
> > Repo into a separate repo within the Apache Org itself, maybe: 
> > apache-airflow-providers.
> >
> > That will not decrease the maintenance from the Committers but the Core 
> > work and release will be completely separate and untangled from Apache 
> > Airflow repo and can move at a faster pace.
> >
> > The benefit and compromise for the community is that all the providers are 
> > still officially maintained and released by the committers. However, over 
> > time we can invite more committers who show active participation in 
> > apache-airflow-providers repo too.
> >
> > This is a compromise to the arguments about Providers being integral to the 
> > success of Airflow and as such should be maintained and released officially.
> >
> > Regards,
> > Kaxil
> >
> > On Mon, 25 Apr 2022 at 19:17, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >> > 1. https://registry.astronomer.io/
> >> > 2. Using the new classifier 
> >> > https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
> >>
> >> Yep. precisely what I thought to place at the top of the ecosystem page.
> >>
> >> > On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" 
> >> > <ferru...@amazon.com.INVALID> wrote:
> >> >>
> >> >> I still think that easy inclusion with a defined pruning process is 
> >> >> best, but it's looking like that is the minority opinion.  In which 
> >> >> case, IFF we are going to be keeping them separate then I definitely 
> >> >> agree that there needs to be a fast/easy/convenient way to find them.
> >> >> ________________________________
> >> >> From: Jarek Potiuk <ja...@potiuk.com>
> >> >> Sent: Monday, April 25, 2022 7:17 AM
> >> >> To: dev@airflow.apache.org
> >> >> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the 
> >> >> community
> >> >>
> >> >> CAUTION: This email originated from outside of the organization. Do not 
> >> >> click links or open attachments unless you can confirm the sender and 
> >> >> know the content is safe.
> >> >>
> >> >>
> >> >>
> >> >> Just to come back to it (please everyone a little patience - I think
> >> >> some people have not chimed in yet due to 2.3.0 "focus" so this
> >> >> discussion might take a little more time.
> >> >>
> >> >> My current thinking on it so far:
> >> >>
> >> >> * I am not really in the camp of "lets not add any more providers at
> >> >> all" and also not in the "let's accept all that are good quality code
> >> >> providers". I think there are a few providers which "after fulfilling
> >> >> all the criteria" could be added - mostly open-source standards,
> >> >> generic, established technologies - but it should be rather limited
> >> >> and rare event.
> >> >>
> >> >> * when there is a proprietary service which has not too broad reach
> >> >> and it's not likely that we will have some committers who will be
> >> >> maintaining it - becauyse they are users - the default option should
> >> >> be to make a standalone per-service providers. the difficulty here is
> >> >> to set the right "non-quality" criteria - but I think we really want
> >> >> to limit any new code to maintain. Here maybe we can have some more
> >> >> concrete criteria proposed - so that we do not have to vote
> >> >> individually on each proposed providers - and so that those who want
> >> >> to propose a provider could check themselves by reading the criteria,
> >> >> what's best for them.
> >> >>
> >> >> * we might improve our "providers" list at the "ecosystem" to make
> >> >> providers stand out a bit more (maybe simply put them on top and make
> >> >> a clearly visible section). We are not going to maintain and keep the
> >> >> nice "registry" similar to Astronomer's one (we could even actually
> >> >> make the link to the Astronomer registry more prominent as the way to
> >> >> "search" for providers on our Ecosystem Page. We could also add a link
> >> >> to Pypi with the "aifrflow provider" classifier at the ecosystem page
> >> >> as another way of searching for providers. All that is perfectly fine,
> >> >> I think with the ASF Policies and spirit. And it will be good for
> >> >> discovery.
> >> >>
> >> >> WDYT?
> >> >>
> >> >> J.
> >> >>
> >> >> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <samh...@union.ai> wrote:
> >> >>>
> >> >>>
> >> >>> Hello!
> >> >>>
> >> >>> The reason behind submitting Flyte provider to the Airflow repository 
> >> >>> is because we felt it'd be effortless for the Airflow users to use the 
> >> >>> integration. Moreover, since it'd be under the umbrella of Airflow, we 
> >> >>> estimated that the Airflow users would not hesitate from using the 
> >> >>> provider.
> >> >>>
> >> >>> We could definitely have this as a standalone provider, but the 
> >> >>> easy-to-get-started incentive of Airflow providers seemed like a 
> >> >>> better option.
> >> >>>
> >> >>> If there's a sophisticated plan in place for having standalone 
> >> >>> providers in PyPI, we're up for it.
> >> >>>
> >> >>> Thanks,
> >> >>> Samhita
> >> >>>
> >> >>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <alex...@gmail.com> wrote:
> >> >>>>
> >> >>>>
> >> >>>> Hello all
> >> >>>>
> >> >>>> I want to try to explain a motivation behind submission of the Delta 
> >> >>>> Sharing provider:
> >> >>>>
> >> >>>> Let me start with the fact that the original issue was created 
> >> >>>> against Airflow repository, and it was accepted as potential new 
> >> >>>> functionality. And discussion about new providers has started almost 
> >> >>>> on the day when PR was submitted :-)
> >> >>>> Delta Sharing is the OSS project under umbrella of the Linux 
> >> >>>> Foundation that defines a protocol and reference implementations. It 
> >> >>>> was started by the Databricks, but has other contributors as well - 
> >> >>>> that's why it wasn't pushed into a Databricks provider, as it's not 
> >> >>>> specific to Databricks.
> >> >>>> Another thought about submitting it as a separate provider was to get 
> >> >>>> more people interested in this functionality and build additional 
> >> >>>> integrations on top of it.
> >> >>>> Another important aspect of having providers in the Airflow 
> >> >>>> repository is that they are tested together with changes in the core 
> >> >>>> of the Airflow.
> >> >>>>
> >> >>>> I completely understand the concerns about more maintenance effort, 
> >> >>>> but my personal point of view (about it below) is similar to Rafal's 
> >> >>>> & John's - if there are well defined criteria & plans for 
> >> >>>> decommissioning or something like, then providers could be part of 
> >> >>>> the releases, etc.
> >> >>>>
> >> >>>> I just want to add that although I'm employed by Databricks, I'm not 
> >> >>>> a part of the development team - I'm in the field team who work with 
> >> >>>> customers, sees how they are using different tools, seeing pain 
> >> >>>> points, etc.  Most of work so far was done on my own time - I'm doing 
> >> >>>> some coordination, but most of new functionality (AAD tokens support, 
> >> >>>> Repos, Databricks SQL operators, etc.) is coming from seeing 
> >> >>>> customers using Airflow together with Databricks.
> >> >>>>
> >> >>>>
> >> >>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz 
> >> >>>> <rafalbieg...@google.com.invalid> wrote:
> >> >>>>>
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> I think that we will need to find some middle ground here - we are 
> >> >>>>> trying to optimize in many dimensions (Jarek mentioned 3 of them). 
> >> >>>>> Maybe I would also add another 4th dimension - Airflow Service 
> >> >>>>> Provider, :).
> >> >>>>>
> >> >>>>> Airflow users - whether they do self-managed Airflow or use "managed 
> >> >>>>> Airflow" provided by others are beneficients of the fact that 
> >> >>>>> Airflow has a decent portfolio of providers.
> >> >>>>> It's not only a guarantee that these providers should work fine and 
> >> >>>>> they meet Airflow coding/testing standards. It's also a kind of 
> >> >>>>> guarantee, that once they start using Airflow
> >> >>>>> with providers backed by the Airflow community they won't be on 
> >> >>>>> their own when it comes to troubleshooting/updating/etc. It will be 
> >> >>>>> much easier for them to convince their companies to use Airflow for 
> >> >>>>> production use cases as the Airflow platform (core + providers) is 
> >> >>>>> tested/maintained by the Airflow community.
> >> >>>>>
> >> >>>>> Keeping providers within the Airflow repository generates 
> >> >>>>> integration and maintenance work on the Airflow community side. On 
> >> >>>>> the other hand, if this work is not done within the community then 
> >> >>>>> this effort would need to be done by all users to a certain extent. 
> >> >>>>> So from this perspective it's more optimal for the community to do 
> >> >>>>> it so users can use off-the-shelf Airflow for the majority of their 
> >> >>>>> use cases
> >> >>>>>
> >> >>>>> When it comes to accepting new providers - I like John's suggestions:
> >> >>>>> a) well defined standard that needs to be met by providers - passing 
> >> >>>>> the "provider qualification" would be some effort so each service 
> >> >>>>> provider would need to decide if it wouldn't be easier to maintain 
> >> >>>>> their provider on their own.
> >> >>>>> b) well define lifecycle for providers - which would allow to 
> >> >>>>> identify providers that are obsolete/not popular any more and make 
> >> >>>>> them obsolete.
> >> >>>>>
> >> >>>>> Regards, Rafal.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> 
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> I've been thinking about it - to make up my mind a little. The good 
> >> >>>>>> thing for me is that I have no strong opinion and I can rather 
> >> >>>>>> easily see (or so I think) of both sides.
> >> >>>>>>
> >> >>>>>> TL;DR; I think we need an explanation from the "Service Providers" 
> >> >>>>>> - what they want to achieve by contributing providers to the 
> >> >>>>>> community and see if we can achieve similar results differently.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Obviously I am a bit biased from the maintainer point of view, but 
> >> >>>>>> since I cooperate with various stakeholders i spoke to some of them 
> >> >>>>>> just see their point of view and this is what I got:
> >> >>>>>>
> >> >>>>>> Seems that we have really three  types of stakeholders that are 
> >> >>>>>> really interested in "providers":
> >> >>>>>>
> >> >>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to 
> >> >>>>>> take care about its future and development and "grand vision" of 
> >> >>>>>> where we want to be in few years
> >> >>>>>> 2) "Users" - those who use Airflow and integration with the Service 
> >> >>>>>> Provider
> >> >>>>>> 3) "Service providers" - those who run the services that Airflow 
> >> >>>>>> integrates with - via providers (that group might also contain 
> >> >>>>>> those stakeholders that run Airflow "as a service")
> >> >>>>>>
> >> >>>>>> Let me see it from all the different POVs:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> From 1) Maintainer POV
> >> >>>>>>
> >> >>>>>> More providers mean slower growth of the platform overall as the 
> >> >>>>>> more providers we add and manage as a community, the less time we 
> >> >>>>>> can spend on improving Airflow as a core.
> >> >>>>>> Also the vision I think we all share is that Airflow is not a 
> >> >>>>>> "standalone orchestrator" any more - due to its popularity, reach 
> >> >>>>>> and power, it became an "orchestrating platform" and this is the 
> >> >>>>>> vision that keeps us - maintainers - busy.
> >> >>>>>>
> >> >>>>>> Over the last 2 years pretty much everything we do - make Airflow 
> >> >>>>>> "more extensible". You can add custom "secrets managers". 
> >> >>>>>> "timetables", "defferers" etc. "Customizability" is now built-in 
> >> >>>>>> and "theme" of being a modern platform.
> >> >>>>>> Hell - we even recently added "Airflow Provider" trove classified 
> >> >>>>>> in PyPI: 
> >> >>>>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
> >> >>>>>>  and the main justification in the discussion was that we expect 
> >> >>>>>> MORE 3rd-parties to use it, rather than relying on 
> >> >>>>>> "apache-airflow-provider" package name.
> >> >>>>>> So from maintainer POV - having 3rd-party providers as "extensions" 
> >> >>>>>> to Airlow makes perfect sense and is the way to go.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> From  2) User POV
> >> >>>>>>
> >> >>>>>> Users want to use Airflow with all the integrations they use 
> >> >>>>>> together. But only with those that they actually use. Similarly as 
> >> >>>>>> maintainers - supporting and needing all 70+ providers is something 
> >> >>>>>> they usually do not REALLY care about.
> >> >>>>>> They literally care about the few providers they use. We even 
> >> >>>>>> taught the users that they can upgrade and install providers 
> >> >>>>>> separately from the core. So they already know they can mix and 
> >> >>>>>> match Airflow + Providers to get what they want.
> >> >>>>>>
> >> >>>>>> And they do use it - even if they use our image, the image only 
> >> >>>>>> contains a handful of the providers and when they need to install
> >> >>>>>> new providers - they just install it from PyPI. And for that the 
> >> >>>>>> difference of "community providers" vs. 3rd party providers - 
> >> >>>>>> except the stamp of approval of the ASF, is not really visible.
> >> >>>>>> Surely they can use [extras] to install the providers but that is 
> >> >>>>>> just a convenience and is definitely not needed by the users.
> >> >>>>>> For example when they build a custom image they usually extend 
> >> >>>>>> Airflow and simply 'pip install <PROVIDER>'
> >> >>>>>> As long as someone makes sure that the provider can be installed on 
> >> >>>>>> certain versions of Airflow - it does not matter.
> >> >>>>>>
> >> >>>>>> Also from the users perspective Airflow became "popular" enough 
> >> >>>>>> that it no longer needed "more integrations" to be more "appealing" 
> >> >>>>>> for the users.
> >> >>>>>> They already use Airflow. They like it (hopefully) and the fact 
> >> >>>>>> that this or that provider is part of the community makes no 
> >> >>>>>> difference any more.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> From 3) "Service providers" POV
> >> >>>>>>
> >> >>>>>> Here I am not sure. It's not very clear what service providers get 
> >> >>>>>> from being part of the "community providers".
> >> >>>>>>
> >> >>>>>> I hear that some big service (cloud providers) find it cool that we 
> >> >>>>>> give it the ASF "Stamp of Approval". And they are willing to pay 
> >> >>>>>> the price of a slower merge process, dependence on the community 
> >> >>>>>> and following strict rules of the ASF.
> >> >>>>>> And the community also is happy to pay the price of maintaining 
> >> >>>>>> those (including the dependencies which Elad mention) to make sure 
> >> >>>>>> that all the community providers work in concert - because those 
> >> >>>>>> "Services" are hugely popular and we "want" as a community to 
> >> >>>>>> invest there.
> >> >>>>>> But maintaining those  deps in sync is a huge effort and it will 
> >> >>>>>> become even worse - the more we add. On the other hand for 3rd 
> >> >>>>>> party providers it will be EASIER to keep up.
> >> >>>>>> They don't have to care about all the community providers to work 
> >> >>>>>> together, they can choose a subset. And when they release their 
> >> >>>>>> libraries they can take care about making sure the dependencies are 
> >> >>>>>> not broken.
> >> >>>>>>
> >> >>>>>> There are other "drawbacks" for being a "community" provider. For 
> >> >>>>>> example we have the rule that we support the min-Airflow version 
> >> >>>>>> for providers from the community 12 months after Airflow release.
> >> >>>>>> This means that users of Airflow 2.1 will not receive updates for 
> >> >>>>>> the providers after 21st of May. This is the price to pay for 
> >> >>>>>> community-managed providers. We will not release bug fixes in 
> >> >>>>>> providers or changes for Airflow 2.1 users after 21st of May.
> >> >>>>>> But if you manage your own provider - you still can support 2.0 or 
> >> >>>>>> even 1.10 if you want.
> >> >>>>>>
> >> >>>>>> I cannot really see why a Service Provider would want to become an 
> >> >>>>>> Airflow Community Provider.
> >> >>>>>>
> >> >>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data 
> >> >>>>>> Kit, and Cloudera people think and why they think this is the best 
> >> >>>>>> choice.
> >> >>>>>>
> >> >>>>>> I think when we understand what the  "Service Providers" want to 
> >> >>>>>> achieve this way, maybe we will be able to come up with some middle 
> >> >>>>>> ground and at least set some rules when it makes sense and when it 
> >> >>>>>> does not make sense.
> >> >>>>>> Maybe 'contributing provider' is the way to achieve something else 
> >> >>>>>> and we simply do not realize that in the new "Airflow as a 
> >> >>>>>> Platform" world, all the stakeholders can achieve very similar 
> >> >>>>>> results using different approaches.
> >> >>>>>>
> >> >>>>>> * For example we could think about how we can make it easier for 
> >> >>>>>> Airflow users to discover and install their providers - without 
> >> >>>>>> actually taking ownership of the code by the community.
> >> >>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider 
> >> >>>>>> pass a "compliance check" as suggested above
> >> >>>>>> * Or maybe we could introduce a "breeze" extension to be able to 
> >> >>>>>> install and test provider in the "latest airflow" so that the 
> >> >>>>>> service providers could check it before we even release airflow and 
> >> >>>>>> dependencies
> >> >>>>>>
> >> >>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I 
> >> >>>>>> think) - could you tell us (every one of you separately) - what are 
> >> >>>>>> your goals when you came up with the "contribute the new provider" 
> >> >>>>>> idea?
> >> >>>>>>
> >> >>>>>> J.
> >> >>>>>>
> >> >>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> 
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Ash what is your recommendation for the users should we follow 
> >> >>>>>>> your suggestion?
> >> >>>>>>> This means that the big big big joy of using airflow constraints 
> >> >>>>>>> and getting a working environment with all required providers will 
> >> >>>>>>> be no more.
> >> >>>>>>> So users will get a working "Vanilla" Airflow and then will need 
> >> >>>>>>> to figure out how they are going to tackle independent providers 
> >> >>>>>>> that may not be able to coexist one with another.
> >> >>>>>>> This means that users will need to create their own constraints 
> >> >>>>>>> mechanism and maintain it.
> >> >>>>>>>
> >> >>>>>>> From my perspective this increases the complexity of getting 
> >> >>>>>>> Airflow to be production ready.
> >> >>>>>>> I know that we say providers vs core but I think that from users 
> >> >>>>>>> perspective providers are an integral part of Airflow.
> >> >>>>>>> Having the best scheduler and the best UI is not enough. Providers 
> >> >>>>>>> are a crucial part that complete the set.
> >> >>>>>>>
> >> >>>>>>> Maybe eventually there should be something like a provider store 
> >> >>>>>>> where there can be official providers and 3rd party providers.
> >> >>>>>>>
> >> >>>>>>> This may be even greater discussion than what we are having here. 
> >> >>>>>>> It feels more like Airflow as a product vs Airflow as an ecosystem.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty 
> >> >>>>>>> <col...@astronomer.io.invalid> wrote:
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I 
> >> >>>>>>>> think in an ideal world even the providers currently part of the 
> >> >>>>>>>> Airflow repo would be managed separately. (I'm not actually 
> >> >>>>>>>> suggesting removing any providers.) I don't think it's a matter 
> >> >>>>>>>> of gatekeeping, I just think it's actually kind of odd to have 
> >> >>>>>>>> providers in the same repo as core Airflow, and it increases 
> >> >>>>>>>> confusion about Airflow versions vs provider package versions.
> >> >>>>>>>>
> >> >>>>>>>> Collin McNulty
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek 
> >> >>>>>>>> <turbas...@apache.org> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining 
> >> >>>>>>>>> the packages may streamline many aspects for providers/companies.
> >> >>>>>>>>>
> >> >>>>>>>>> 1. They are owners so they can merge and release whenever they 
> >> >>>>>>>>> need.
> >> >>>>>>>>> 2. It’s easier for them to add E2E tests and manage the 
> >> >>>>>>>>> resources needed for running them.
> >> >>>>>>>>> 3. The development of the package can be incorporated into their 
> >> >>>>>>>>> company processes - not every company is used to OSS mode.
> >> >>>>>>>>>
> >> >>>>>>>>> Whatever way we go - we should have some basics guidelines and 
> >> >>>>>>>>> requirements (for example to brand a provider as “recommended by 
> >> >>>>>>>>> community” or something).
> >> >>>>>>>>>
> >> >>>>>>>>> Cheers,
> >> >>>>>>>>> Tomsk
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> With best wishes,                    Alex Ott
> >> >>>> http://alexott.net/
> >> >>>> Twitter: alexott_en (English), alexott (Russian)

Reply via email to