My general thoughts: have as much as possible outside of Airflow. If a provider is being contributed by the "owner" of the service (i.e. Cloudera provider being contributed by Cloudera) then it shouldn't live in Airflow and that company/project should release to pypi directly.
The only time we should accept a new provider is if it is by a user of the service, and likely to be popular and possible for us (Airflow team) to run (i.e. no paid for accounts needed). -Ash On 4 April 2022 14:39:34 BST, Jarek Potiuk <ja...@potiuk.com> wrote: >Hey all, > >We seem to have an influx of new providers coming our way: > >* Delta Sharing: >https://lists.apache.org/thread/kny1f23noqf1ssh7l9ys607m5wk3ff8c >* Flyte: https://lists.apache.org/thread/b55g3gydgmqmhow6f7xzzbm5t0gmhs2x >* Versatile Data Kit: >https://lists.apache.org/thread/t1k3d0518v4kxz1pqsprdc78h0wxobg0 > >I think it might be a good idea to bring the discussion in one place >(here) and decide on what our approach is for accepting new providers >(the original discussion from Andon was focused mostly about VDK's >case, but maybe we could work out a general approach and "guidelines" >- what approach is best so that we do not have to discuss it >separately for each proposal, but we have some more (or less) clear >rules on when we think it's good to accept providers as community. > >Generally speaking we have two approaches: >* providers managed by the Apache Airflow community >* providers managed by 3rd-parties > >I think my email here, nicely summarizes what is in >https://lists.apache.org/thread/6oomg5rlphxvc7xl0nccm3zdg18qv83n > >I tried to look for earlier devlist discussions about the subject >(maybe someone can find it :), I think we have never formalized nor >written down but I do recall some (slack??) discussions about it from >the past. > >While we have no control/influence (and we do not want to have) for >3rd-party providers, we definitely have both for the community-managed >ones - and there should be some rules defined to decide when we are >"ok" to accept a provider. Not always having "more" providers in the >"community" area is better. More often than not, code is a liability >more often than an asset. > >From those discussions I had I recall points such us: > >* likelihood of the provider being used by many users >* possibility to test/support the providers by maintainers or >dedicated "stakeholders" >* quality of the code and following our expectations (docs/how to >guides, unit/system test) >* competing (?) with Airflow - there could be some providers of >"competing" products maybe (I am not sure if this is a concern of >ours) which we simply might decide to not maintain in the community > >I am happy to write it down and propose such rules revolving around >those - but I would like to hear what people think first. > >What are your thoughts here? > >J