Hi all,
I’m Shubham, Sr. Product Manager at AWS, working closely with John and the MWAA 
team. Glad to see the Airflow community openly discussing this topic which will 
likely shape Airflow’s growth in the future.

Firstly, I am with Elad and Dennis that we shouldn’t be gatekeeping the new 
providers. At the same time, I empathize with Jarek’s concern about taking 
responsibility for maintaining the new providers. It is important to set the 
right expectation for our Airflow users when they try to use any Airflow 
provider to meet their development needs.

Borrowing the “verified” feature from Twitter, I believe Airflow can provide a 
list of providers that meet our community guidelines, are well maintained, and 
are healthy. We can leverage AIP-47 Airflow System Test 
(https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests)
 to build a public-facing dashboard (something that Niko has been a big 
proponent of internally for AWS provider) that shows the status of system tests 
for all providers. It will improve the experience of Airflow users when they 
start using any provider package and reduce the issues we get.

Deprecation will be difficult once a provider is added as there might be some 
users who depend on it. A list of "verified" Airflow providers and a dashboard 
with system tests will reduce the need for deprecation.

Shubham

From: "Jackson, John" <jacn...@amazon.com.INVALID>
Reply-To: "dev@airflow.apache.org" <dev@airflow.apache.org>
Date: Tuesday, April 5, 2022 at 10:56 AM
To: "dev@airflow.apache.org" <dev@airflow.apache.org>
Subject: RE: [EXTERNAL] Re: [DISCUSS] Approach for new providers of the 
community


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Folks,

This is a great topic and indeed important as Airflow’s popularity continues to 
grow.

One thing that will help is to provide clear, unambiguous, community guidelines 
for providers--both existing and new.  It should provide such items as:


  *   What qualifies as a “new provider” vs extending an existing provider or 
releasing as an independent project.
  *   Rules about Python dependencies and other install actions that providers 
can take, and how it interacts with Airflow core code (for example, providers 
or their dependencies should not be allowed to monkey-patch core code, or force 
an Airflow/DB upgrade).
  *   The minimum standards for unit tests, system tests, examples, and 
documentation with consistent naming conventions (I’m looking at you 
“examples”) and technology stacks (i.e. “mock” usage) for each.
  *   Clear direction as to when to create a hook vs operator vs sensor, and 
minimum required functionality for each.
  *   A depreciation plan, for example that a provider is guaranteed to be 
supported for x releases, however if it goes through n releases without update 
it goes into a “quarantined” state, and if not verified it moves to “retired” 
(or “moved to the attic” as Jarek stated).
  *   A bar-raising plan, to get all existing providers either up to the 
current feature bar by a certain date, or retired.

This should make accepting new providers/operators/etc PRs easier, as there 
will be an unambiguous checklist that needs to be met before it’s even reviewed 
(which could maybe even be automated).  It will also ensure user confidence in 
Airflow providers as a whole, as there will be a consistent level of features, 
functionality, and quality regardless of which provider the user chooses to 
deploy.

John

On 2022/04/05 08:58:17 Jarek Potiuk wrote:
> One more Provider in progress I forgot :). Cloudera:
> https://github.com/apache/airflow/pull/22659
>
> Just wanted to stress how important the result of this discussion is.
> The number of PR for new providers we get is kind of unprecedented.
> This month we have started discussions (and actual PRs) about adding
> at least 4 biggish providers.
>
> It's either a coincidence, or we simply reached the status that a
> "lot" of 3rd parties want to integrate with Airflow as Airflow is
> really a de-facto Platform for Orchestration for "Everyone" :D :D.
>
> This is a great thing if it's the latter.
>
> I just want to make sure we get it right when it comes to "embracing"
> then as a community. It's not really about gatekeeping but more about
> "taking responsibility" for the code. If we accept code to the
> community we take responsibility for maintaining it too. Of course
> there are various stakeholders there and I am sure "Cloudera" people
> will maintain their provider and provide bug fixes - but the issues
> will also come our way if the Cloudera provider does not work (and
> with the ASF "stamp of approval" we give our users some kind of
> expectations that we have to fulfill).
>
> Unlike in many technical decisions :) I have no very strong opinion
> about this and I am really interested to hear what the community
> members think.
>
> We are prepared to handle literally hundreds of providers if need be
> (with some small automation improvements) - so there are no technical
> reasons to limit the number of providers.
> In the (near) future we might even decide to split them into separate
> repositories (there are some discussions about that and it's likely to
> happen) to make some housekeeping easier and to make sure it does not
> hold us back when we develop some core features.
>
> I am however leaning towards what both Elad and Denis wrote: accepting
> new providers should be easy and it should only be gated by the
> technical code quality bar, but there should also be some expectations
> for the provider being maintained.
> And as Dennis wrote - rather than "voting" for approval, there should
> be rather a clear road (and voting possibly) to "retire" provider if
> it is not maintained any more (This is called "Moving to attic" in the
> ASF terminology).
>
> But maybe there are others who think differently. Would love to hear it.
>
> J
>
>
> On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis
> <fe...@amazon.com.invalid<mailto:fe...@amazon.com.invalid>> wrote:
> >
> > I think I'd just +1 Elad's comments.  I don't know if we (the community) 
> > really need to be gatekeeping which providers get first class status like 
> > that.  In the end, the users of any given provider become responsible for 
> > maintaining it, so I feel it sorts itself out without added bureaucracy.  
> > Perhaps some form of formalized decision tree on when to drop a provider 
> > package as "no longer maintained/supported", but I don't feel there should 
> > be a high barrier to entry on adding a new one provided the code doesn't 
> > break any existing packages and meets community quality standards.
> >
> >
> > ________________________________
> > From: Elad Kalif <el...@apache.org<mailto:el...@apache.org>>
> > Sent: Monday, April 4, 2022 7:24 AM
> > To: dev@airflow.apache.org<mailto:dev@airflow.apache.org>
> > Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the 
> > community
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> >
> >
> > Interesting topic!
> >
> > I think the most important thing for us is that we are able to maintain the 
> > provider (in terms of not causing problems for Airflow core or other 
> > providers).
> > Some of the maintained providers (Google for example) have open bugs for 2 
> > years. So even if we have many provider mantiners it doesn't guarantee 
> > fixing problems.
> > I am not worried about provider internal issues (operator not working 
> > properly, etc..)  - it affects only the users of the provider itself and 
> > the users of the provider are always welcome to submit PRs with fixes.
> >
> > I don't feel comfortable blocking a new provider just because it has a 
> > small market / competitors' tools also don't support it etc...
> >
> > I guess my take is:
> >
> > Accept any new provider that meets quality/requirements (just as we did so 
> > far)
> > Since providers are independent packages - In the rare case (I say rare as 
> > it never happened till now) where the provider causes problems with 
> > core/other providers and no one is willing to address it.
> >  if we can terminate the provider/mark it as not matinined in PyPi - it 
> > should be enough I think.
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk 
> > <ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote:
> >>
> >> Hey all,
> >>
> >> We seem to have an influx of new providers coming our way:
> >>
> >> * Delta Sharing:
> >> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
> >> * Flyte:  https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
> >> * Versatile Data Kit:
> >> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
> >>
> >> I think it might be a good idea to bring the discussion in one place
> >> (here) and decide on what our approach is for accepting new providers
> >> (the original discussion from Andon was focused mostly about VDK's
> >> case, but maybe we could work out a general approach and "guidelines"
> >> - what approach is best so that we do not have to discuss it
> >> separately for each proposal, but we have some more (or less) clear
> >> rules on when we think it's good to accept providers as community.
> >>
> >> Generally speaking we have two approaches:
> >> * providers managed by the Apache Airflow community
> >> * providers managed by 3rd-parties
> >>
> >> I think my email here, nicely summarizes what is in
> >> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
> >>
> >> I tried to look for earlier devlist discussions about the subject
> >> (maybe someone can find it :), I think we have never formalized nor
> >> written down but I do recall some (slack??) discussions about it from
> >> the past.
> >>
> >> While we have no control/influence (and we do not want to have) for
> >> 3rd-party providers, we definitely have both for the community-managed
> >> ones - and there should be some rules defined to decide when we are
> >> "ok" to accept a provider. Not always having "more" providers in the
> >> "community" area is better. More often than not, code is a liability
> >> more often than an asset.
> >>
> >> From those discussions I had I recall points such us:
> >>
> >> * likelihood of the provider being used by many users
> >> * possibility to test/support the providers by maintainers or
> >> dedicated "stakeholders"
> >> * quality of the code and following our expectations (docs/how to
> >> guides, unit/system test)
> >> * competing (?) with Airflow - there could be some providers of
> >> "competing" products maybe (I am not sure if this is a concern of
> >> ours) which we simply might decide to not maintain in the community
> >>
> >> I am happy to write it down and propose such rules revolving around
> >> those - but I would like to hear what people think first.
> >>
> >> What are your thoughts here?
> >>
> >> J
>

Reply via email to