I very much like some of the points there :).

I think indeed we missed so far clear guidance on what criteria a new
provider needs to meet - even if we actually had some of that in our heads
- that was more of a "tribal knowledge" and you could likely figure it out
by looking at other providers, but we did not have it hashed out.

And yeah absolutely AIP-47 as an enabler for finishing AIP-4 (automating
system tests for external systems) and specifically the dashboard showing
status is very, very, very dear to my heart :). I wrote AIP-4 in
September 2018 as my first AIP proposal which I created ~ month after I
started my first contributions to Airflow :).

Looks like we might be finally completing it :D.

I think we should wait a bit for more comments and I might try to start
drafting a proposed PR describing the policy.

J.

On Tue, Apr 5, 2022 at 10:34 PM Mehta, Shubham <shu...@amazon.com.invalid>
wrote:

> Hi all,
>
> I’m Shubham, Sr. Product Manager at AWS, working closely with John and the
> MWAA team. Glad to see the Airflow community openly discussing this topic
> which will likely shape Airflow’s growth in the future.
>
>
>
> Firstly, I am with Elad and Dennis that we shouldn’t be gatekeeping the
> new providers. At the same time, I empathize with Jarek’s concern about
> taking responsibility for maintaining the new providers. It is important to
> set the right expectation for our Airflow users when they try to use any
> Airflow provider to meet their development needs.
>
>
>
> Borrowing the “verified” feature from Twitter, I believe Airflow can
> provide a list of providers that meet our community guidelines, are well
> maintained, and are healthy. We can leverage AIP-47 Airflow System Test (
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests)
> to build a public-facing dashboard (something that Niko has been a big
> proponent of internally for AWS provider) that shows the status of system
> tests for all providers. It will improve the experience of Airflow users
> when they start using any provider package and reduce the issues we get.
>
>
>
> Deprecation will be difficult once a provider is added as there might be
> some users who depend on it. A list of "verified" Airflow providers and a
> dashboard with system tests will reduce the need for deprecation.
>
>
>
> Shubham
>
>
>
> *From: *"Jackson, John" <jacn...@amazon.com.INVALID>
> *Reply-To: *"dev@airflow.apache.org" <dev@airflow.apache.org>
> *Date: *Tuesday, April 5, 2022 at 10:56 AM
> *To: *"dev@airflow.apache.org" <dev@airflow.apache.org>
> *Subject: *RE: [EXTERNAL] Re: [DISCUSS] Approach for new providers of the
> community
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi Folks,
>
>
>
> This is a great topic and indeed important as Airflow’s popularity
> continues to grow.
>
>
>
> One thing that will help is to provide clear, unambiguous, community
> guidelines for providers--both existing and new.  It should provide such
> items as:
>
>
>
>    - What qualifies as a “new provider” vs extending an existing provider
>    or releasing as an independent project.
>    - Rules about Python dependencies and other install actions that
>    providers can take, and how it interacts with Airflow core code (for
>    example, providers or their dependencies should not be allowed to
>    monkey-patch core code, or force an Airflow/DB upgrade).
>    - The minimum standards for unit tests, system tests, examples, and
>    documentation with consistent naming conventions (I’m looking at you
>    “examples”) and technology stacks (i.e. “mock” usage) for each.
>    - Clear direction as to when to create a hook vs operator vs sensor,
>    and minimum required functionality for each.
>    - A depreciation plan, for example that a provider is guaranteed to be
>    supported for x releases, however if it goes through n releases without
>    update it goes into a “quarantined” state, and if not verified it moves to
>    “retired” (or “moved to the attic” as Jarek stated).
>    - A bar-raising plan, to get all existing providers either up to the
>    current feature bar by a certain date, or retired.
>
>
>
> This should make accepting new providers/operators/etc PRs easier, as
> there will be an unambiguous checklist that needs to be met before it’s
> even reviewed (which could maybe even be automated).  It will also ensure
> user confidence in Airflow providers as a whole, as there will be a
> consistent level of features, functionality, and quality regardless of
> which provider the user chooses to deploy.
>
>
>
> John
>
>
>
> On 2022/04/05 08:58:17 Jarek Potiuk wrote:
>
> > One more Provider in progress I forgot :). Cloudera:
>
> > https://github.com/apache/airflow/pull/22659
>
> >
>
> > Just wanted to stress how important the result of this discussion is.
>
> > The number of PR for new providers we get is kind of unprecedented.
>
> > This month we have started discussions (and actual PRs) about adding
>
> > at least 4 biggish providers.
>
> >
>
> > It's either a coincidence, or we simply reached the status that a
>
> > "lot" of 3rd parties want to integrate with Airflow as Airflow is
>
> > really a de-facto Platform for Orchestration for "Everyone" :D :D.
>
> >
>
> > This is a great thing if it's the latter.
>
> >
>
> > I just want to make sure we get it right when it comes to "embracing"
>
> > then as a community. It's not really about gatekeeping but more about
>
> > "taking responsibility" for the code. If we accept code to the
>
> > community we take responsibility for maintaining it too. Of course
>
> > there are various stakeholders there and I am sure "Cloudera" people
>
> > will maintain their provider and provide bug fixes - but the issues
>
> > will also come our way if the Cloudera provider does not work (and
>
> > with the ASF "stamp of approval" we give our users some kind of
>
> > expectations that we have to fulfill).
>
> >
>
> > Unlike in many technical decisions :) I have no very strong opinion
>
> > about this and I am really interested to hear what the community
>
> > members think.
>
> >
>
> > We are prepared to handle literally hundreds of providers if need be
>
> > (with some small automation improvements) - so there are no technical
>
> > reasons to limit the number of providers.
>
> > In the (near) future we might even decide to split them into separate
>
> > repositories (there are some discussions about that and it's likely to
>
> > happen) to make some housekeeping easier and to make sure it does not
>
> > hold us back when we develop some core features.
>
> >
>
> > I am however leaning towards what both Elad and Denis wrote: accepting
>
> > new providers should be easy and it should only be gated by the
>
> > technical code quality bar, but there should also be some expectations
>
> > for the provider being maintained.
>
> > And as Dennis wrote - rather than "voting" for approval, there should
>
> > be rather a clear road (and voting possibly) to "retire" provider if
>
> > it is not maintained any more (This is called "Moving to attic" in the
>
> > ASF terminology).
>
> >
>
> > But maybe there are others who think differently. Would love to hear it.
>
> >
>
> > J
>
> >
>
> >
>
> > On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis
>
> > <fe...@amazon.com.invalid> wrote:
>
> > >
>
> > > I think I'd just +1 Elad's comments.  I don't know if we (the
> community) really need to be gatekeeping which providers get first class
> status like that.  In the end, the users of any given provider become
> responsible for maintaining it, so I feel it sorts itself out without added
> bureaucracy.  Perhaps some form of formalized decision tree on when to drop
> a provider package as "no longer maintained/supported", but I don't feel
> there should be a high barrier to entry on adding a new one provided the
> code doesn't break any existing packages and meets community quality
> standards.
>
> > >
>
> > >
>
> > > ________________________________
>
> > > From: Elad Kalif <el...@apache.org>
>
> > > Sent: Monday, April 4, 2022 7:24 AM
>
> > > To: dev@airflow.apache.org
>
> > > Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the
> community
>
> > >
>
> > >
>
> > > CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
> > >
>
> > >
>
> > > Interesting topic!
>
> > >
>
> > > I think the most important thing for us is that we are able to
> maintain the provider (in terms of not causing problems for Airflow core or
> other providers).
>
> > > Some of the maintained providers (Google for example) have open bugs
> for 2 years. So even if we have many provider mantiners it doesn't
> guarantee fixing problems.
>
> > > I am not worried about provider internal issues (operator not working
> properly, etc..)  - it affects only the users of the provider itself and
> the users of the provider are always welcome to submit PRs with fixes.
>
> > >
>
> > > I don't feel comfortable blocking a new provider just because it has a
> small market / competitors' tools also don't support it etc...
>
> > >
>
> > > I guess my take is:
>
> > >
>
> > > Accept any new provider that meets quality/requirements (just as we
> did so far)
>
> > > Since providers are independent packages - In the rare case (I say
> rare as it never happened till now) where the provider causes problems with
> core/other providers and no one is willing to address it.
>
> > >  if we can terminate the provider/mark it as not matinined in PyPi -
> it should be enough I think.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > >>
>
> > >> Hey all,
>
> > >>
>
> > >> We seem to have an influx of new providers coming our way:
>
> > >>
>
> > >> * Delta Sharing:
>
> > >> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c
>
> > >> * Flyte:
> https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x
>
> > >> * Versatile Data Kit:
>
> > >> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0
>
> > >>
>
> > >> I think it might be a good idea to bring the discussion in one place
>
> > >> (here) and decide on what our approach is for accepting new providers
>
> > >> (the original discussion from Andon was focused mostly about VDK's
>
> > >> case, but maybe we could work out a general approach and "guidelines"
>
> > >> - what approach is best so that we do not have to discuss it
>
> > >> separately for each proposal, but we have some more (or less) clear
>
> > >> rules on when we think it's good to accept providers as community.
>
> > >>
>
> > >> Generally speaking we have two approaches:
>
> > >> * providers managed by the Apache Airflow community
>
> > >> * providers managed by 3rd-parties
>
> > >>
>
> > >> I think my email here, nicely summarizes what is in
>
> > >> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n
>
> > >>
>
> > >> I tried to look for earlier devlist discussions about the subject
>
> > >> (maybe someone can find it :), I think we have never formalized nor
>
> > >> written down but I do recall some (slack??) discussions about it from
>
> > >> the past.
>
> > >>
>
> > >> While we have no control/influence (and we do not want to have) for
>
> > >> 3rd-party providers, we definitely have both for the community-managed
>
> > >> ones - and there should be some rules defined to decide when we are
>
> > >> "ok" to accept a provider. Not always having "more" providers in the
>
> > >> "community" area is better. More often than not, code is a liability
>
> > >> more often than an asset.
>
> > >>
>
> > >> From those discussions I had I recall points such us:
>
> > >>
>
> > >> * likelihood of the provider being used by many users
>
> > >> * possibility to test/support the providers by maintainers or
>
> > >> dedicated "stakeholders"
>
> > >> * quality of the code and following our expectations (docs/how to
>
> > >> guides, unit/system test)
>
> > >> * competing (?) with Airflow - there could be some providers of
>
> > >> "competing" products maybe (I am not sure if this is a concern of
>
> > >> ours) which we simply might decide to not maintain in the community
>
> > >>
>
> > >> I am happy to write it down and propose such rules revolving around
>
> > >> those - but I would like to hear what people think first.
>
> > >>
>
> > >> What are your thoughts here?
>
> > >>
>
> > >> J
>
> >
>

Reply via email to