I've been thinking about it - to make up my mind a little. The good thing
for me is that I have no strong opinion and I can rather easily see (or so
I think) of both sides.

TL;DR; I think we need an explanation from the "Service Providers" - what
they want to achieve by contributing providers to the community and see if
we can achieve similar results differently.


Obviously I am a bit biased from the maintainer point of view, but since I
cooperate with various stakeholders i spoke to some of them just see their
point of view and this is what I got:

Seems that we have really three  types of stakeholders that are really
interested in "providers":

1) "Maintainers" - those who mostly maintain Airflow and have to take care
about its future and development and "grand vision" of where we want to be
in few years
2) "Users" - those who use Airflow and integration with the Service Provider
3) "Service providers" - those who run the services that Airflow integrates
with - via providers (that group might also contain those stakeholders that
run Airflow "as a service")

Let me see it from all the different POVs:


>From 1) Maintainer POV

More providers mean slower growth of the platform overall as the more
providers we add and manage as a community, the less time we can spend on
improving Airflow as a core.
Also the vision I think we all share is that Airflow is not a "standalone
orchestrator" any more - due to its popularity, reach and power, it became
an "orchestrating platform" and this is the vision that keeps us -
maintainers - busy.

Over the last 2 years pretty much everything we do - make Airflow "more
extensible". You can add custom "secrets managers". "timetables",
"defferers" etc. "Customizability" is now built-in and "theme" of being a
modern platform.
Hell - we even recently added "Airflow Provider" trove classified in PyPI:
https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
and the main justification in the discussion was that we expect MORE
3rd-parties to use it, rather than relying on "apache-airflow-provider"
package name.
So from maintainer POV - having 3rd-party providers as "extensions" to
Airlow makes perfect sense and is the way to go.


>From  2) User POV

Users want to use Airflow with all the integrations they use together. But
only with those that they actually use. Similarly as maintainers -
supporting and needing all 70+ providers is something they usually do not
REALLY care about.
They literally care about the few providers they use. We even taught the
users that they can upgrade and install providers separately from the core.
So they already know they can mix and match Airflow + Providers to get what
they want.

And they do use it - even if they use our image, the image only contains a
handful of the providers and when they need to install
new providers - they just install it from PyPI. And for that the difference
of "community providers" vs. 3rd party providers - except the stamp of
approval of the ASF, is not really visible.
Surely they can use [extras] to install the providers but that is just a
convenience and is definitely not needed by the users.
For example when they build a custom image they usually extend Airflow and
simply 'pip install <PROVIDER>'
As long as someone makes sure that the provider can be installed on certain
versions of Airflow - it does not matter.

Also from the users perspective Airflow became "popular" enough that it no
longer needed "more integrations" to be more "appealing" for the users.
They already use Airflow. They like it (hopefully) and the fact that this
or that provider is part of the community makes no difference any more.


>From 3) "Service providers" POV

Here I am not sure. It's not very clear what service providers get from
being part of the "community providers".

I hear that some big service (cloud providers) find it cool that we give it
the ASF "Stamp of Approval". And they are willing to pay the price of a
slower merge process, dependence on the community and following strict
rules of the ASF.
And the community also is happy to pay the price of maintaining those
(including the dependencies which Elad mention) to make sure that all the
community providers work in concert - because those "Services" are hugely
popular and we "want" as a community to invest there.
But maintaining those  deps in sync is a huge effort and it will become
even worse - the more we add. On the other hand for 3rd party providers it
will be EASIER to keep up.
They don't have to care about all the community providers to work together,
they can choose a subset. And when they release their libraries they can
take care about making sure the dependencies are not broken.

There are other "drawbacks" for being a "community" provider. For example
we have the rule that we support the min-Airflow version for providers from
the community 12 months after Airflow release.
This means that users of Airflow 2.1 will not receive updates for the
providers after 21st of May. This is the price to pay for community-managed
providers. We will not release bug fixes in providers or changes for
Airflow 2.1 users after 21st of May.
But if you manage your own provider - you still can support 2.0 or even
1.10 if you want.

I cannot really see why a Service Provider would want to become an Airflow
Community Provider.

And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit,
and Cloudera people think and why they think this is the best choice.

I think when we understand what the  "Service Providers" want to achieve
this way, maybe we will be able to come up with some middle ground and at
least set some rules when it makes sense and when it does not make sense.
Maybe 'contributing provider' is the way to achieve something else and we
simply do not realize that in the new "Airflow as a Platform" world, all
the stakeholders can achieve very similar results using different
approaches.

* For example we could think about how we can make it easier for Airflow
users to discover and install their providers - without actually taking
ownership of the code by the community.
* Or maybe we could introduce a tool to make a 3rd-party provider pass a
"compliance check" as suggested above
* Or maybe we could introduce a "breeze" extension to be able to install
and test provider in the "latest airflow" so that the service providers
could check it before we even release airflow and dependencies

So what I think we really need -  Alex, Samhita, Andon, Philippe (I think)
- could you tell us (every one of you separately) - what are your goals
when you came up with the "contribute the new provider" idea?

J.

On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> wrote:

> Ash what is your recommendation for the users should we follow your
> suggestion?
> This means that the big big big joy of using airflow constraints and
> getting a working environment with all required providers will be no more.
> So users will get a working "Vanilla" Airflow and then will need to figure
> out how they are going to tackle independent providers that may not be able
> to coexist one with another.
> This means that users will need to create their own constraints mechanism
> and maintain it.
>
> From my perspective this increases the complexity of getting Airflow to be
> production ready.
> I know that we say providers vs core but I think that from users
> perspective providers are an integral part of Airflow.
> Having the best scheduler and the best UI is not enough. Providers are a
> crucial part that complete the set.
>
> Maybe eventually there should be something like a provider store where
> there can be official providers and 3rd party providers.
>
> This may be even greater discussion than what we are having here. It feels
> more like Airflow as a product vs Airflow as an ecosystem.
>
>
>
>
>
> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty
> <col...@astronomer.io.invalid> wrote:
>
>> I agree with Ash and Tomasz. If it were not for the history, I think in
>> an ideal world even the providers currently part of the Airflow repo would
>> be managed separately. (I'm not actually suggesting removing any
>> providers.) I don't think it's a matter of gatekeeping, I just think it's
>> actually kind of odd to have providers in the same repo as core Airflow,
>> and it increases confusion about Airflow versions vs provider package
>> versions.
>>
>> Collin McNulty
>>
>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <turbas...@apache.org>
>> wrote:
>>
>>> I’m leaning toward Ash approach. Having providers maintaining the
>>> packages may streamline many aspects for providers/companies.
>>>
>>> 1. They are owners so they can merge and release whenever they need.
>>> 2. It’s easier for them to add E2E tests and manage the resources needed
>>> for running them.
>>> 3. The development of the package can be incorporated into their company
>>> processes - not every company is used to OSS mode.
>>>
>>> Whatever way we go - we should have some basics guidelines and
>>> requirements (for example to brand a provider as “recommended by community”
>>> or something).
>>>
>>> Cheers,
>>> Tomsk
>>>
>>

Reply via email to