Thanks for taking time to give more details Jarek. This puts things
in perspective.

Le ven. 9 déc. 2022 à 18:48, Collin McNulty <[email protected]>
a écrit :

> I concur with the concerns raised by Ash. Cloudera seems like an
> organization quite well suited to releasing its own provider. If such an
> organization is not expected to release outside the Apache process, who is?
> Maybe I'm misunderstanding, but I thought that the idea was that providers
> going forward would be mostly third party which allows for a larger and
> more vibrant ecosystem.
>
> Collin McNulty
>
> On Fri, Dec 9, 2022 at 6:04 AM Pierre Jeambrun <[email protected]>
> wrote:
>
>> Hello,
>>
>> I am really excited about a public official cloudera provider for
>> airflow. This would be a great addition to the airflow ecosystem.
>>
>> System tests would be an additional layer that would be great for the CI
>> and release process, but would individual contributors be able to run these
>> system tests locally ? From what I understand, such credentials would be
>> stored in the CI, and only people with their own credentials would be able
>> to test the code locally and therefore realistically help in maintaining
>> the provider. (Iterating on CI failure wouldn't be great :p)
>>
>> Ash point is echoing in me, remembering when I had to work on a specific
>> provider where free accounts/quotas were not available. It was basically a
>> shot in the dark, making code changes based on documentation and api specs
>> without being able to actually test the code. Maybe this was the reason the
>> issues stayed open for more than a year without being picked up.
>>
>> Will the community really be able to contribute and support the provider,
>> while most of us don't have a paid account ? Or is it 'stakeholders'
>> maintained and 'community' released at most. (Even reviewing code for
>> release would be tricky without an account).
>>
>> Maybe I misunderstood something and apologize in advance.
>>
>> Best regards,
>> Pierre
>>
>> Le ven. 9 déc. 2022 à 12:24, Jarek Potiuk <[email protected]> a écrit :
>>
>>> > My concern about how we will actually test it works given we'd need a
>>> cloudera account/install/instance would be good to comment on though.
>>>
>>> This is a very good point Ash and I love you've made it as I think we
>>> have a very good solution at hand.
>>>
>>> This simply calls for Cloudera's commitment to work on AIP-47 style
>>> tests and providing a test bed for that.
>>>
>>> This has yet to be published by Google and Amazon - I know they are
>>> progressing a lot on making the automation and publishing regular result of
>>> the System tests from main in the way that we can verify that all tests
>>> pass - all that is done outside of the community resources and maintenance
>>> (i.e. this is entirely on the Amazon and Google teams to run and publish
>>> those tests).
>>>
>>> So I have a PROPOSAL (I can send a formal vote on that shortly)
>>>
>>> For all the future (starting from Cloudera) we should make that as a
>>> requirement that any of the providers accepted by the community MUST have
>>> AIP-47 style System Tests and the service provider in question MUST provide
>>> their own System Test environment with public access of status for the
>>> community and commit to maintaining those for as long as the Provider is
>>> released by the community.
>>>
>>> I think this is a very reasonable ask for Cloudera (and anyone else in
>>> the future) and a very, very good compromise (win-win for both sides while
>>> also requiring both sides to commit to a long term cooperation). This way
>>> we make sure we have to cooperate with the service provider rather than
>>> letting the Service provider "throw the code over the fence" and put all
>>> the burden of maintenance on the community.
>>>
>>> * with AIP-47 we provided a very solid foundation for fully-automated
>>> system testing of precisely this kind of external service providers
>>> * we (community) take on our shoulders the burden of reviewing and
>>> releasing the code, and at the same time the service gets community
>>> recognition and becomes part of the "Airflow Community supported"
>>> * similarly the Service Provider takes on their shoulders the burden or
>>> running and keeping in check the System Tests Bed for their system tests
>>> submitted to the community and make sure they succeed before the release
>>> happens
>>> * whenever we release such a service provider - we hold on with the
>>> release for that provider until the system tests for such provider are
>>> green (and it's on the service provider to fix the problems with those
>>> before we release).
>>> * I know both Google and Amazon are committed to do so, I also know
>>> Databricks is looking into it and in the future we might decide to apply it
>>> to all "external service providers".
>>>
>>> Philippe - what do you think about such an arrangement? Is that
>>> something that Cloudera will be able to commit to?
>>>
>>> J.
>>>
>>>
>>> On Fri, Dec 9, 2022 at 12:00 PM Ash Berlin-Taylor <[email protected]>
>>> wrote:
>>>
>>>> As per the original vote email:
>>>>
>>>> Please note that this vote is about the fact to add this new provider
>>>> not about the code itself, which will be reviewed as part of the PR
>>>>
>>>>
>>>> So it's not a veto (as vetos can only apply to code).
>>>>
>>>> My concern about how we will actually test it works given we'd need a
>>>> cloudera account/install/instance would be good to comment on though.
>>>>
>>>> -ash
>>>>
>>>> On Dec 7 2022, at 1:43 pm, Jarek Potiuk <[email protected]> wrote:
>>>>
>>>> Yeah. I would really want to understand that (and maybe others have an
>>>> opinion here):
>>>>
>>>> https://www.apache.org/foundation/voting.html
>>>>
>>>> * Is this a "code modification" - where -1 is veto
>>>> * or is it a "procedural issue" - where -1 is just a vote and majority
>>>> rules
>>>>
>>>> I personally think that "code modification" is really on "PR review"
>>>> level - when we see that the code submitted is not good.  But this case
>>>> seems to be more of a procedural issue than code modification. For me this
>>>> is more "are we ok to accept a provider from cloudera?" rather than "do we
>>>> accept this code".
>>>>
>>>> Ash - how do you treat your -1 ?
>>>>
>>>> And others - what do you think of that ?
>>>>
>>>> I think the next course of action depends if we have consensus on how
>>>> we treat the issue of "adding a new provider".
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Dec 7, 2022 at 1:45 PM Philippe Lanoe
>>>> <[email protected]> wrote:
>>>>
>>>> Hello Airflow community,
>>>>
>>>> Following up on this -1. I'm assuming that's a veto?
>>>>
>>>> If it is, would it be possible to decouple the provider sustainability
>>>> discussion from this proposal (Cloudera provider addition request)?
>>>>
>>>> I do think sustainability discussions make full sense but I feel that
>>>> this new provider is following the current rules that the community has
>>>> established so far. The original thread [1] in which we discussed Cloudera
>>>> provider addition (we were not ready with the PR at that time) led to the
>>>> new provider discussion [2] and finally the lazy consensus [3] on mixed
>>>> governance model. The outcome was a new mixed governance rule which was
>>>> introduced [4], with an aim to (a) reduce the maintenance burden for the
>>>> community and (b) allow more providers in since point (a) became 
>>>> acceptable.
>>>>
>>>> Let me know if it is acceptable to break up these two discussions and
>>>> have this vote move forward.
>>>>
>>>> Thank you,
>>>> Regards.
>>>> Philippe
>>>>
>>>> [1] https://lists.apache.org/thread/2z0lvgj466ksxxrbvofx41qvn03jrwwb
>>>> [2] https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn
>>>> [3] https://lists.apache.org/thread/gq9vym17x0o8j8s9clkbmdz2nt38nnbt
>>>> [4] https://github.com/apache/airflow/pull/24680
>>>>
>>>>
>>>>
>>>> On Mon, Dec 5, 2022 at 1:54 PM Ash Berlin-Taylor <[email protected]>
>>>> wrote:
>>>>
>>>> Just to break with the consensus: -1
>>>>
>>>> Not because I don't think the provider would be useful or popular
>>>> enough, precisely the opposite, and I'd like to see more companies maintain
>>>> and manage their own providers and see an ecosystem of providers start to
>>>> grow.
>>>>
>>>> Cloudera def has the means and resources to maintain their own
>>>> provider, and the communication channels to let their users/customers know
>>>> about its existence. And I have no problem with linking to the provider
>>>> from our docs index.
>>>>
>>>> In generaly I am slightly worried about the workload we as maintainers
>>>> are letting ourselves in for inthe long run with an ever growing number of
>>>> providers. Particularly one that needs paid-for accounts that we don't have
>>>> access to!
>>>>
>>>> -ash
>>>>
>>>> On Dec 4 2022, at 11:59 pm, Kaxil Naik <[email protected]> wrote:
>>>>
>>>> +1 binding
>>>>
>>>> On Sat, 3 Dec 2022 at 15:14, Holden Karau <[email protected]> wrote:
>>>>
>>>> non-binding +1
>>>>
>>>> On Sat, Dec 3, 2022 at 3:55 AM Jarek Potiuk <[email protected]> wrote:
>>>>
>>>> I think cloudera is important player in our ecosystem and as long as
>>>> it passes all the bars (i.e. 2.3.0+ compatibility and good
>>>> non-conflicting dependencies, passing all the tests, I am +1.
>>>>
>>>> On Sat, Dec 3, 2022 at 12:51 PM Philippe Lanoe
>>>> <[email protected]> wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > Correction: since it is a vote on code modification, all committers'
>>>> votes count, I was mistaken in my previous email (which mentioned only PMC
>>>> votes are binding), quite new in this process.
>>>> > Please let me know if a discussion thread is preferred.
>>>> >
>>>> > Thanks,
>>>> > Regards,
>>>> > Philippe
>>>> >
>>>> > On Wed, Nov 30, 2022 at 5:34 PM Philippe Lanoe <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Hello Airflow community!
>>>> >>
>>>> >> As requested in our PR, I would like to start a vote for adding a
>>>> new provider (Cloudera). Please note that this vote is about the fact to
>>>> add this new provider not about the code itself, which will be reviewed as
>>>> part of the PR.
>>>> >>
>>>> >> We would like to contribute the Cloudera provider to allow data
>>>> practitioners out-of-the-box interactions with a multi-function analytics
>>>> and hybrid platform,
>>>> >>
>>>> >> Our first two Operators are CdeRunJobOperator, to run a CDE job
>>>> (Spark or Airflow within the Cloudera Data Engineering service) and
>>>> CdwExecuteQueryOperator, to execute a query on a managed CDW cluster (Hive
>>>> / Impala within the Cloudera Data Warehousing service). It also comes with
>>>> a Sensor for CDW, in order to wait on a Hive partition.
>>>> >> We are also planning to contribute more in the future, as we develop
>>>> operators for other Cloudera services in Cloudera Data Platform (CDP), like
>>>> Cloudera Machine Learning and others, to cover the various needs of data
>>>> practitioners across the entire data lifecycle.
>>>> >>
>>>> >> Our code has been already used for quite some time internally and we
>>>> would like to contribute it to Airflow, to give a better experience for the
>>>> users as it would be another system that users can reach seamlessly in
>>>> their pipelines.
>>>> >>
>>>> >> Another important Note: Cloudera already filed a CCLA as mentioned
>>>> in this thread, so I think we are OK on the Legal side.
>>>> >>
>>>> >> You can find the PR here:
>>>> >> https://github.com/apache/airflow/pull/27866
>>>> >>
>>>> >> The voting will last for 6 days (until 6th of December 2022, 6pm
>>>> UTC), and until at least 3 binding votes have been cast. I am sure about
>>>> the timeframe which is needed for providers actually, please let me know if
>>>> it is adequate.
>>>> >>
>>>> >> Please vote accordingly:
>>>> >>
>>>> >> [ ] + 1 approve
>>>> >> [ ] + 0 no opinion
>>>> >> [ ] - 1 disapprove with the reason
>>>> >>
>>>> >> Only votes from PMC members and committers are binding, but other
>>>> members of the community are encouraged to check the AIP and vote with
>>>> "(non-binding)".
>>>> >>
>>>> >> Thanks!
>>>> >>
>>>> >> Regards,
>>>> >> Philippe
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>>

Reply via email to