Technicalities (I put a longer description in order you can get your expectations):
What you did is good. Starting discussion on Devlist is a good start. Eventually - like everything in the Apache Project, if code change is small and can be approved by one or two committers, any bigger change needs to be: * discussed in devlist (you already started it) * consensus seems to be reached * aither consensus is that it can be done by lazy-consensus (if we seem to all agree) or voting https://www.apache.org/foundation/voting.html Note that (see the rules of voting) code modifications like that can be simply vetoed (by a single commiter) during the voting process (and one justified vote is enough to block It) so you should be rather convinced that you will not get anyone's veto. It's your job to guide the discussion in the way to reach the consensus and start and proceed with voting when you think the time is right. Note that there are certain communication rules to follow - especially about making sure that everyone can participate, so such discussions tend to take quite some time (weeks or months). More information and general "contribution" guide can be found here: https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst Your case: I think we do not have precise rules, however what is important is that the community (as a whole) commits to maintain the code. Code contribution is not a one-way-street, code is actually more often liability than asset. We also need to make sure that someone in the community can test it, knows how to do it and is able to validate that any fixes and changes there can be maintained. And the bigger the contribution, and the less "popular" a given provider is, the less chance we would like to make it part of the community. I think - but this is my personal opinion and possibly others will chime in - we move away from the mode when we accept new providers "by default". We even have some discussions on whether we should not give some providers back to the people who are "owning" services so that they can maintain it. I personally think before the provider gets accepted, we need to see that the service is used and popular. And certainly Airflow cannot (and should not be) used as a driver of that popularity. And this is perfectly fine if you maintain your provider outside of Airflow "Community Managed Providers". We have dedicated chapter for that in our "ecosystem" page - https://airflow.apache.org/ecosystem/#third-party-airflow-plugins-and-providers. So anyone who develops the provider is free to publish, release it and even make a PR to link to it from the Ecosystem Page. There are various Pros and Cons of being a "Community Provider" Pros: * It is released with the "ASF Stamp of Approval" and guarantees it follows the rules * It comes as an "apache-airflow-provider" package. * It comes as an "extra" of Airflow (though we might get rid of the extras in the future) * It gets tested automatically in CI with the latest version of Airflow Cons: * There is a certain burden and process that all community providers must follow (documentation, testing) - we are also introducing automated system testing finally (see https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests). So I would say all future providers (if they are using some external service) will have to have system tests implemented. That means few things: first of all, there should be systems tests, secondly, there should be some way (credits, free accounts with enough capacity to run regular testing in an automated way - donated to Airflow in order to be able to support a provider). We have not yet discussed the last point (credits and system tests being condition) but this seems like we are heading towards - with Amazon and Google providers leading the effort of implementing system tests (we already got credits and commitments from both). * We release our community providers in regular cadence (monthly) - we only very rarely break the cycle and if you want to get better control over release schedule, you will not be able to have it. * Code committed to providers have to go through a regular process of review and approval and for many services it might seem slow. It is not uncommon for PR to take weeks to go through a review and approval process. And (except nagging people) you have zero influence on the review process. Airflow is maintained by committers - individuals (often volunteers) so they review the code and approve it when they can / have time/ don't have other more priority tasks. Becoming a community provider means that you accept this and accept that you have zero control over that as an organisation. Only individuals can become committers, so even if someone from your company will become one, at some point in time, the commitership goes with that person if he or she changes jobs. So you have to accept the fact that you, as an organization, have no real guaranteed influence on that. And if the service is not really popular, you risk that none of the committers is interested or even has capacity and knowledge to be able to review the code. So you have to consider yourself in the first place if you really want to become a community provider, or whether it is better that you release it and maintain it on your own. I hope others will chime in. I personally do not know anything about VDK and I do not know if any of the committers know. If they don't ths is a rather strong indication that you should come "your own" route. But if you decide to try the "community" route, this will be your job to convince the committers this is a good idea and make sure you don't have strong "vetoes" after the discussions. J. On Thu, Mar 31, 2022 at 12:43 PM Andon Andonov <[email protected]> wrote: > > Hello, > > > > We are developing a data engineering framework, called Versatile Data Kit, > which allows data engineers to develop, deploy and manage data processing > workloads which we refer to as ``Data Jobs``. These jobs allow data engineers > to implement automated pull ingestion (E in ELT) and batch data > transformation (T in ELT) into a database. > > > > Currently, we are working on a Provider to integrate our project with Airflow > and would like to contribute it upstream at some point in time. > > The architecture specification for the Provider can be seen here. > > > > According to this FAQ, sub-section “Can I contribute my own provider to > Apache Airflow?”, we need to check if the community would accept the > contribution. However, from what is written in the paragraph, it makes it a > bit difficult (at least to me) to understand, if there is a specific process > in place that needs to be followed, or should a proposal be put for a vote? > > > > I have probably missed some piece of documentation explaining it, so would > appreciate any help or tips, pointing me where to look at. > > > > Kind Regards, > > Andon
