So I think my opinion is opposite of Elad -- and that we basically accept almost no new providers, and instead encourage people to create and release their own packages directly.

I want _less_ code in apache/airflow repo, not more. The more we have the more combinations we have to test on every commit, and the longer and longer the list of extras and providers we have to maintain is.

On Wed, Apr 6 2022 at 22:08:28 +0100, Ash Berlin-Taylor <a...@apache.org> wrote:
My general thoughts: have as much as possible outside of Airflow.

If a provider is being contributed by the "owner" of the service (i.e. Cloudera provider being contributed by Cloudera) then it shouldn't live in Airflow and that company/project should release to pypi directly.

The only time we should accept a new provider is if it is by a user of the service, and likely to be popular and possible for us (Airflow team) to run (i.e. no paid for accounts needed).

-Ash

On 4 April 2022 14:39:34 BST, Jarek Potiuk <ja...@potiuk.com> wrote:
Hey all,

We seem to have an influx of new providers coming our way:

* Delta Sharing:
<https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c>
* Flyte: <https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x>
* Versatile Data Kit:
<https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0>

I think it might be a good idea to bring the discussion in one place
(here) and decide on what our approach is for accepting new providers
(the original discussion from Andon was focused mostly about VDK's
case, but maybe we could work out a general approach and "guidelines"
- what approach is best so that we do not have to discuss it
separately for each proposal, but we have some more (or less) clear
rules on when we think it's good to accept providers as community.

Generally speaking we have two approaches:
* providers managed by the Apache Airflow community
* providers managed by 3rd-parties

I think my email here, nicely summarizes what is in
<https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n>

I tried to look for earlier devlist discussions about the subject
(maybe someone can find it :), I think we have never formalized nor
written down but I do recall some (slack??) discussions about it from
the past.

While we have no control/influence (and we do not want to have) for
3rd-party providers, we definitely have both for the community-managed
ones - and there should be some rules defined to decide when we are
"ok" to accept a provider. Not always having "more" providers in the
"community" area is better. More often than not, code is a liability
more often than an asset.

From those discussions I had I recall points such us:

* likelihood of the provider being used by many users
* possibility to test/support the providers by maintainers or
dedicated "stakeholders"
* quality of the code and following our expectations (docs/how to
guides, unit/system test)
* competing (?) with Airflow - there could be some providers of
"competing" products maybe (I am not sure if this is a concern of
ours) which we simply might decide to not maintain in the community

I am happy to write it down and propose such rules revolving around
those - but I would like to hear what people think first.

What are your thoughts here?

J

Reply via email to