One more Provider in progress I forgot :). Cloudera: https://github.com/apache/airflow/pull/22659
Just wanted to stress how important the result of this discussion is. The number of PR for new providers we get is kind of unprecedented. This month we have started discussions (and actual PRs) about adding at least 4 biggish providers. It's either a coincidence, or we simply reached the status that a "lot" of 3rd parties want to integrate with Airflow as Airflow is really a de-facto Platform for Orchestration for "Everyone" :D :D. This is a great thing if it's the latter. I just want to make sure we get it right when it comes to "embracing" then as a community. It's not really about gatekeeping but more about "taking responsibility" for the code. If we accept code to the community we take responsibility for maintaining it too. Of course there are various stakeholders there and I am sure "Cloudera" people will maintain their provider and provide bug fixes - but the issues will also come our way if the Cloudera provider does not work (and with the ASF "stamp of approval" we give our users some kind of expectations that we have to fulfill). Unlike in many technical decisions :) I have no very strong opinion about this and I am really interested to hear what the community members think. We are prepared to handle literally hundreds of providers if need be (with some small automation improvements) - so there are no technical reasons to limit the number of providers. In the (near) future we might even decide to split them into separate repositories (there are some discussions about that and it's likely to happen) to make some housekeeping easier and to make sure it does not hold us back when we develop some core features. I am however leaning towards what both Elad and Denis wrote: accepting new providers should be easy and it should only be gated by the technical code quality bar, but there should also be some expectations for the provider being maintained. And as Dennis wrote - rather than "voting" for approval, there should be rather a clear road (and voting possibly) to "retire" provider if it is not maintained any more (This is called "Moving to attic" in the ASF terminology). But maybe there are others who think differently. Would love to hear it. J On Mon, Apr 4, 2022 at 9:58 PM Ferruzzi, Dennis <ferru...@amazon.com.invalid> wrote: > > I think I'd just +1 Elad's comments. I don't know if we (the community) > really need to be gatekeeping which providers get first class status like > that. In the end, the users of any given provider become responsible for > maintaining it, so I feel it sorts itself out without added bureaucracy. > Perhaps some form of formalized decision tree on when to drop a provider > package as "no longer maintained/supported", but I don't feel there should be > a high barrier to entry on adding a new one provided the code doesn't break > any existing packages and meets community quality standards. > > > ________________________________ > From: Elad Kalif <elad...@apache.org> > Sent: Monday, April 4, 2022 7:24 AM > To: dev@airflow.apache.org > Subject: RE: [EXTERNAL] [DISCUSS] Approach for new providers of the community > > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > Interesting topic! > > I think the most important thing for us is that we are able to maintain the > provider (in terms of not causing problems for Airflow core or other > providers). > Some of the maintained providers (Google for example) have open bugs for 2 > years. So even if we have many provider mantiners it doesn't guarantee fixing > problems. > I am not worried about provider internal issues (operator not working > properly, etc..) - it affects only the users of the provider itself and the > users of the provider are always welcome to submit PRs with fixes. > > I don't feel comfortable blocking a new provider just because it has a small > market / competitors' tools also don't support it etc... > > I guess my take is: > > Accept any new provider that meets quality/requirements (just as we did so > far) > Since providers are independent packages - In the rare case (I say rare as it > never happened till now) where the provider causes problems with core/other > providers and no one is willing to address it. > if we can terminate the provider/mark it as not matinined in PyPi - it > should be enough I think. > > > > > > > > On Mon, Apr 4, 2022 at 4:39 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >> Hey all, >> >> We seem to have an influx of new providers coming our way: >> >> * Delta Sharing: >> https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c >> * Flyte: https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x >> * Versatile Data Kit: >> https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0 >> >> I think it might be a good idea to bring the discussion in one place >> (here) and decide on what our approach is for accepting new providers >> (the original discussion from Andon was focused mostly about VDK's >> case, but maybe we could work out a general approach and "guidelines" >> - what approach is best so that we do not have to discuss it >> separately for each proposal, but we have some more (or less) clear >> rules on when we think it's good to accept providers as community. >> >> Generally speaking we have two approaches: >> * providers managed by the Apache Airflow community >> * providers managed by 3rd-parties >> >> I think my email here, nicely summarizes what is in >> https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n >> >> I tried to look for earlier devlist discussions about the subject >> (maybe someone can find it :), I think we have never formalized nor >> written down but I do recall some (slack??) discussions about it from >> the past. >> >> While we have no control/influence (and we do not want to have) for >> 3rd-party providers, we definitely have both for the community-managed >> ones - and there should be some rules defined to decide when we are >> "ok" to accept a provider. Not always having "more" providers in the >> "community" area is better. More often than not, code is a liability >> more often than an asset. >> >> From those discussions I had I recall points such us: >> >> * likelihood of the provider being used by many users >> * possibility to test/support the providers by maintainers or >> dedicated "stakeholders" >> * quality of the code and following our expectations (docs/how to >> guides, unit/system test) >> * competing (?) with Airflow - there could be some providers of >> "competing" products maybe (I am not sure if this is a concern of >> ours) which we simply might decide to not maintain in the community >> >> I am happy to write it down and propose such rules revolving around >> those - but I would like to hear what people think first. >> >> What are your thoughts here? >> >> J