I very much like some of the points there :). I think indeed we missed so far clear guidance on what criteria a new provider needs to meet - even if we actually had some of that in our heads - that was more of a "tribal knowledge" and you could likely figure it out by looking at other providers, but we did not have it hashed out.
And yeah absolutely AIP-47 as an enabler for finishing AIP-4 (automating system tests for external systems) and specifically the dashboard showing status is very, very, very dear to my heart :). I wrote AIP-4 in September 2018 as my first AIP proposal which I created ~ month after I started my first contributions to Airflow :). Looks like we might be finally completing it :D. I think we should wait a bit for more comments and I might try to start drafting a proposed PR describing the policy. J. On Tue, Apr 5, 2022 at 10:34 PM Mehta, Shubham <shu...@amazon.com.invalid> wrote: > Hi all, > > I’m Shubham, Sr. Product Manager at AWS, working closely with John and the > MWAA team. Glad to see the Airflow community openly discussing this topic > which will likely shape Airflow’s growth in the future. > > > > Firstly, I am with Elad and Dennis that we shouldn’t be gatekeeping the > new providers. At the same time, I empathize with Jarek’s concern about > taking responsibility for maintaining the new providers. It is important to > set the right expectation for our Airflow users when they try to use any > Airflow provider to meet their development needs. > > > > Borrowing the “verified” feature from Twitter, I believe Airflow can > provide a list of providers that meet our community guidelines, are well > maintained, and are healthy. We can leverage AIP-47 Airflow System Test ( > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests) > to build a public-facing dashboard (something that Niko has been a big > proponent of internally for AWS provider) that shows the status of system > tests for all providers. It will improve the experience of Airflow users > when they start using any provider package and reduce the issues we get. > > > > Deprecation will be difficult once a provider is added as there might be > some users who depend on it. A list of "verified" Airflow providers and a > dashboard with system tests will reduce the need for deprecation. > > > > Shubham > > > > *From: *"Jackson, John" <jacn...@amazon.com.INVALID> > *Reply-To: *"dev@airflow.apache.org" <dev@airflow.apache.org> > *Date: *Tuesday, April 5, 2022 at 10:56 AM > *To: *"dev@airflow.apache.org" <dev@airflow.apache.org> > *Subject: *RE: [EXTERNAL] Re: [DISCUSS] Approach for new providers of the > community > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Hi Folks, > > > > This is a great topic and indeed important as Airflow’s popularity > continues to grow. > > > > One thing that will help is to provide clear, unambiguous, community > guidelines for providers--both existing and new. It should provide such > items as: > > > > - What qualifies as a “new provider” vs extending an existing provider > or releasing as an independent project. > - Rules about Python dependencies and other install actions that > providers can take, and how it interacts with Airflow core code (for > example, providers or their dependencies should not be allowed to > monkey-patch core code, or force an Airflow/DB upgrade). > - The minimum standards for unit tests, system tests, examples, and > documentation with consistent naming conventions (I’m looking at you > “examples”) and technology stacks (i.e. “mock” usage) for each. > - Clear direction as to when to create a hook vs operator vs sensor, > and minimum required functionality for each. > - A depreciation plan, for example that a provider is guaranteed to be > supported for x releases, however if it goes through n releases without > update it goes into a “quarantined” state, and if not verified it moves to > “retired” (or “moved to the attic” as Jarek stated). > - A bar-raising plan, to get all existing providers either up to the > current feature bar by a certain date, or retired. > > > > This should make accepting new providers/operators/etc PRs easier, as > there will be an unambiguous checklist that needs to be met before it’s > even reviewed (which could maybe even be automated). It will also ensure > user confidence in Airflow providers as a whole, as there will be a > consistent level of features, functionality, and quality regardless of > which provider the user chooses to deploy. > > > > John > > > > On 2022/04/05 08:58:17 Jarek Potiuk wrote: > > > One more Provider in progress I forgot :). Cloudera: > > > https://github.com/apache/airflow/pull/22659 > > > > > > Just wanted to stress how important the result of this discussion is. > > > The number of PR for new providers we get is kind of unprecedented. > > > This month we have started discussions (and actual PRs) about adding > > > at least 4 biggish providers. > > > > > > It's either a coincidence, or we simply reached the status that a > > > "lot" of 3rd parties want to integrate with Airflow as Airflow is > > > really a de-facto Platform for Orchestration for "Everyone" :D :D. > > > > > > This is a great thing if it's the latter. > > > > > > I just want to make sure we get it right when it comes to "embracing" > > > then as a community. It's not really about gatekeeping but more about > > > "taking responsibility" for the code. If we accept code to the > > > community we take responsibility for maintaining it too. Of course > > > there are various stakeholders there and I am sure "Cloudera" people > > > will maintain their provider and provide bug fixes - but the issues > > > will also come our way if the Cloudera provider does not work (and > > > with the ASF "stamp of approval" we give our users some kind of > > > expectations that we have to fulfill). > > > > > > Unlike in many technical decisions :) I have no very strong opinion > > > about this and I am really interested to hear what the community > > > members think. > > > > > > We are prepared to handle literally hundreds of providers if need be > > > (with some small automation improvements) - so there are no technical > > > reasons to limit the number of providers. > > > In the (near) future we might even decide to split them into separate > > > repositories (there are some discussions about that and it's likely to > > > happen) to make some housekeeping easier and to make sure it does not > > > hold us back when we develop some core features. > > > > > > I am however leaning towards what both Elad and Denis wrote: accepting > > > new providers should be easy and it should only be gated by the > > > technical code quality bar, but there should also be some expectations > > > for the provider being maintained. > > > And as Dennis wrote - rather than "voting" for approval, there should > > > be rather a clear road (and voting possibly) to "retire" provider if > > > it is not maintained any more (This is called "Moving to attic" in the > > > ASF terminology). > > > > > > But maybe there are others who think differently. Would love to hear it. > > > > > > J > > > > > > > > > On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis > > > <fe...@amazon.com.invalid> wrote: > > > > > > > > I think I'd just +1 Elad's comments. I don't know if we (the > community) really need to be gatekeeping which providers get first class > status like that. In the end, the users of any given provider become > responsible for maintaining it, so I feel it sorts itself out without added > bureaucracy. Perhaps some form of formalized decision tree on when to drop > a provider package as "no longer maintained/supported", but I don't feel > there should be a high barrier to entry on adding a new one provided the > code doesn't break any existing packages and meets community quality > standards. > > > > > > > > > > > > ________________________________ > > > > From: Elad Kalif <el...@apache.org> > > > > Sent: Monday, April 4, 2022 7:24 AM > > > > To: dev@airflow.apache.org > > > > Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the > community > > > > > > > > > > > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > > > > > > > > > Interesting topic! > > > > > > > > I think the most important thing for us is that we are able to > maintain the provider (in terms of not causing problems for Airflow core or > other providers). > > > > Some of the maintained providers (Google for example) have open bugs > for 2 years. So even if we have many provider mantiners it doesn't > guarantee fixing problems. > > > > I am not worried about provider internal issues (operator not working > properly, etc..) - it affects only the users of the provider itself and > the users of the provider are always welcome to submit PRs with fixes. > > > > > > > > I don't feel comfortable blocking a new provider just because it has a > small market / competitors' tools also don't support it etc... > > > > > > > > I guess my take is: > > > > > > > > Accept any new provider that meets quality/requirements (just as we > did so far) > > > > Since providers are independent packages - In the rare case (I say > rare as it never happened till now) where the provider causes problems with > core/other providers and no one is willing to address it. > > > > if we can terminate the provider/mark it as not matinined in PyPi - > it should be enough I think. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > >> > > > >> Hey all, > > > >> > > > >> We seem to have an influx of new providers coming our way: > > > >> > > > >> * Delta Sharing: > > > >> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c > > > >> * Flyte: > https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x > > > >> * Versatile Data Kit: > > > >> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0 > > > >> > > > >> I think it might be a good idea to bring the discussion in one place > > > >> (here) and decide on what our approach is for accepting new providers > > > >> (the original discussion from Andon was focused mostly about VDK's > > > >> case, but maybe we could work out a general approach and "guidelines" > > > >> - what approach is best so that we do not have to discuss it > > > >> separately for each proposal, but we have some more (or less) clear > > > >> rules on when we think it's good to accept providers as community. > > > >> > > > >> Generally speaking we have two approaches: > > > >> * providers managed by the Apache Airflow community > > > >> * providers managed by 3rd-parties > > > >> > > > >> I think my email here, nicely summarizes what is in > > > >> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n > > > >> > > > >> I tried to look for earlier devlist discussions about the subject > > > >> (maybe someone can find it :), I think we have never formalized nor > > > >> written down but I do recall some (slack??) discussions about it from > > > >> the past. > > > >> > > > >> While we have no control/influence (and we do not want to have) for > > > >> 3rd-party providers, we definitely have both for the community-managed > > > >> ones - and there should be some rules defined to decide when we are > > > >> "ok" to accept a provider. Not always having "more" providers in the > > > >> "community" area is better. More often than not, code is a liability > > > >> more often than an asset. > > > >> > > > >> From those discussions I had I recall points such us: > > > >> > > > >> * likelihood of the provider being used by many users > > > >> * possibility to test/support the providers by maintainers or > > > >> dedicated "stakeholders" > > > >> * quality of the code and following our expectations (docs/how to > > > >> guides, unit/system test) > > > >> * competing (?) with Airflow - there could be some providers of > > > >> "competing" products maybe (I am not sure if this is a concern of > > > >> ours) which we simply might decide to not maintain in the community > > > >> > > > >> I am happy to write it down and propose such rules revolving around > > > >> those - but I would like to hear what people think first. > > > >> > > > >> What are your thoughts here? > > > >> > > > >> J > > > >