Thanks for taking time to give more details Jarek. This puts things in perspective.
Le ven. 9 déc. 2022 à 18:48, Collin McNulty <[email protected]> a écrit : > I concur with the concerns raised by Ash. Cloudera seems like an > organization quite well suited to releasing its own provider. If such an > organization is not expected to release outside the Apache process, who is? > Maybe I'm misunderstanding, but I thought that the idea was that providers > going forward would be mostly third party which allows for a larger and > more vibrant ecosystem. > > Collin McNulty > > On Fri, Dec 9, 2022 at 6:04 AM Pierre Jeambrun <[email protected]> > wrote: > >> Hello, >> >> I am really excited about a public official cloudera provider for >> airflow. This would be a great addition to the airflow ecosystem. >> >> System tests would be an additional layer that would be great for the CI >> and release process, but would individual contributors be able to run these >> system tests locally ? From what I understand, such credentials would be >> stored in the CI, and only people with their own credentials would be able >> to test the code locally and therefore realistically help in maintaining >> the provider. (Iterating on CI failure wouldn't be great :p) >> >> Ash point is echoing in me, remembering when I had to work on a specific >> provider where free accounts/quotas were not available. It was basically a >> shot in the dark, making code changes based on documentation and api specs >> without being able to actually test the code. Maybe this was the reason the >> issues stayed open for more than a year without being picked up. >> >> Will the community really be able to contribute and support the provider, >> while most of us don't have a paid account ? Or is it 'stakeholders' >> maintained and 'community' released at most. (Even reviewing code for >> release would be tricky without an account). >> >> Maybe I misunderstood something and apologize in advance. >> >> Best regards, >> Pierre >> >> Le ven. 9 déc. 2022 à 12:24, Jarek Potiuk <[email protected]> a écrit : >> >>> > My concern about how we will actually test it works given we'd need a >>> cloudera account/install/instance would be good to comment on though. >>> >>> This is a very good point Ash and I love you've made it as I think we >>> have a very good solution at hand. >>> >>> This simply calls for Cloudera's commitment to work on AIP-47 style >>> tests and providing a test bed for that. >>> >>> This has yet to be published by Google and Amazon - I know they are >>> progressing a lot on making the automation and publishing regular result of >>> the System tests from main in the way that we can verify that all tests >>> pass - all that is done outside of the community resources and maintenance >>> (i.e. this is entirely on the Amazon and Google teams to run and publish >>> those tests). >>> >>> So I have a PROPOSAL (I can send a formal vote on that shortly) >>> >>> For all the future (starting from Cloudera) we should make that as a >>> requirement that any of the providers accepted by the community MUST have >>> AIP-47 style System Tests and the service provider in question MUST provide >>> their own System Test environment with public access of status for the >>> community and commit to maintaining those for as long as the Provider is >>> released by the community. >>> >>> I think this is a very reasonable ask for Cloudera (and anyone else in >>> the future) and a very, very good compromise (win-win for both sides while >>> also requiring both sides to commit to a long term cooperation). This way >>> we make sure we have to cooperate with the service provider rather than >>> letting the Service provider "throw the code over the fence" and put all >>> the burden of maintenance on the community. >>> >>> * with AIP-47 we provided a very solid foundation for fully-automated >>> system testing of precisely this kind of external service providers >>> * we (community) take on our shoulders the burden of reviewing and >>> releasing the code, and at the same time the service gets community >>> recognition and becomes part of the "Airflow Community supported" >>> * similarly the Service Provider takes on their shoulders the burden or >>> running and keeping in check the System Tests Bed for their system tests >>> submitted to the community and make sure they succeed before the release >>> happens >>> * whenever we release such a service provider - we hold on with the >>> release for that provider until the system tests for such provider are >>> green (and it's on the service provider to fix the problems with those >>> before we release). >>> * I know both Google and Amazon are committed to do so, I also know >>> Databricks is looking into it and in the future we might decide to apply it >>> to all "external service providers". >>> >>> Philippe - what do you think about such an arrangement? Is that >>> something that Cloudera will be able to commit to? >>> >>> J. >>> >>> >>> On Fri, Dec 9, 2022 at 12:00 PM Ash Berlin-Taylor <[email protected]> >>> wrote: >>> >>>> As per the original vote email: >>>> >>>> Please note that this vote is about the fact to add this new provider >>>> not about the code itself, which will be reviewed as part of the PR >>>> >>>> >>>> So it's not a veto (as vetos can only apply to code). >>>> >>>> My concern about how we will actually test it works given we'd need a >>>> cloudera account/install/instance would be good to comment on though. >>>> >>>> -ash >>>> >>>> On Dec 7 2022, at 1:43 pm, Jarek Potiuk <[email protected]> wrote: >>>> >>>> Yeah. I would really want to understand that (and maybe others have an >>>> opinion here): >>>> >>>> https://www.apache.org/foundation/voting.html >>>> >>>> * Is this a "code modification" - where -1 is veto >>>> * or is it a "procedural issue" - where -1 is just a vote and majority >>>> rules >>>> >>>> I personally think that "code modification" is really on "PR review" >>>> level - when we see that the code submitted is not good. But this case >>>> seems to be more of a procedural issue than code modification. For me this >>>> is more "are we ok to accept a provider from cloudera?" rather than "do we >>>> accept this code". >>>> >>>> Ash - how do you treat your -1 ? >>>> >>>> And others - what do you think of that ? >>>> >>>> I think the next course of action depends if we have consensus on how >>>> we treat the issue of "adding a new provider". >>>> >>>> J. >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Dec 7, 2022 at 1:45 PM Philippe Lanoe >>>> <[email protected]> wrote: >>>> >>>> Hello Airflow community, >>>> >>>> Following up on this -1. I'm assuming that's a veto? >>>> >>>> If it is, would it be possible to decouple the provider sustainability >>>> discussion from this proposal (Cloudera provider addition request)? >>>> >>>> I do think sustainability discussions make full sense but I feel that >>>> this new provider is following the current rules that the community has >>>> established so far. The original thread [1] in which we discussed Cloudera >>>> provider addition (we were not ready with the PR at that time) led to the >>>> new provider discussion [2] and finally the lazy consensus [3] on mixed >>>> governance model. The outcome was a new mixed governance rule which was >>>> introduced [4], with an aim to (a) reduce the maintenance burden for the >>>> community and (b) allow more providers in since point (a) became >>>> acceptable. >>>> >>>> Let me know if it is acceptable to break up these two discussions and >>>> have this vote move forward. >>>> >>>> Thank you, >>>> Regards. >>>> Philippe >>>> >>>> [1] https://lists.apache.org/thread/2z0lvgj466ksxxrbvofx41qvn03jrwwb >>>> [2] https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn >>>> [3] https://lists.apache.org/thread/gq9vym17x0o8j8s9clkbmdz2nt38nnbt >>>> [4] https://github.com/apache/airflow/pull/24680 >>>> >>>> >>>> >>>> On Mon, Dec 5, 2022 at 1:54 PM Ash Berlin-Taylor <[email protected]> >>>> wrote: >>>> >>>> Just to break with the consensus: -1 >>>> >>>> Not because I don't think the provider would be useful or popular >>>> enough, precisely the opposite, and I'd like to see more companies maintain >>>> and manage their own providers and see an ecosystem of providers start to >>>> grow. >>>> >>>> Cloudera def has the means and resources to maintain their own >>>> provider, and the communication channels to let their users/customers know >>>> about its existence. And I have no problem with linking to the provider >>>> from our docs index. >>>> >>>> In generaly I am slightly worried about the workload we as maintainers >>>> are letting ourselves in for inthe long run with an ever growing number of >>>> providers. Particularly one that needs paid-for accounts that we don't have >>>> access to! >>>> >>>> -ash >>>> >>>> On Dec 4 2022, at 11:59 pm, Kaxil Naik <[email protected]> wrote: >>>> >>>> +1 binding >>>> >>>> On Sat, 3 Dec 2022 at 15:14, Holden Karau <[email protected]> wrote: >>>> >>>> non-binding +1 >>>> >>>> On Sat, Dec 3, 2022 at 3:55 AM Jarek Potiuk <[email protected]> wrote: >>>> >>>> I think cloudera is important player in our ecosystem and as long as >>>> it passes all the bars (i.e. 2.3.0+ compatibility and good >>>> non-conflicting dependencies, passing all the tests, I am +1. >>>> >>>> On Sat, Dec 3, 2022 at 12:51 PM Philippe Lanoe >>>> <[email protected]> wrote: >>>> > >>>> > Hello, >>>> > >>>> > Correction: since it is a vote on code modification, all committers' >>>> votes count, I was mistaken in my previous email (which mentioned only PMC >>>> votes are binding), quite new in this process. >>>> > Please let me know if a discussion thread is preferred. >>>> > >>>> > Thanks, >>>> > Regards, >>>> > Philippe >>>> > >>>> > On Wed, Nov 30, 2022 at 5:34 PM Philippe Lanoe <[email protected]> >>>> wrote: >>>> >> >>>> >> Hello Airflow community! >>>> >> >>>> >> As requested in our PR, I would like to start a vote for adding a >>>> new provider (Cloudera). Please note that this vote is about the fact to >>>> add this new provider not about the code itself, which will be reviewed as >>>> part of the PR. >>>> >> >>>> >> We would like to contribute the Cloudera provider to allow data >>>> practitioners out-of-the-box interactions with a multi-function analytics >>>> and hybrid platform, >>>> >> >>>> >> Our first two Operators are CdeRunJobOperator, to run a CDE job >>>> (Spark or Airflow within the Cloudera Data Engineering service) and >>>> CdwExecuteQueryOperator, to execute a query on a managed CDW cluster (Hive >>>> / Impala within the Cloudera Data Warehousing service). It also comes with >>>> a Sensor for CDW, in order to wait on a Hive partition. >>>> >> We are also planning to contribute more in the future, as we develop >>>> operators for other Cloudera services in Cloudera Data Platform (CDP), like >>>> Cloudera Machine Learning and others, to cover the various needs of data >>>> practitioners across the entire data lifecycle. >>>> >> >>>> >> Our code has been already used for quite some time internally and we >>>> would like to contribute it to Airflow, to give a better experience for the >>>> users as it would be another system that users can reach seamlessly in >>>> their pipelines. >>>> >> >>>> >> Another important Note: Cloudera already filed a CCLA as mentioned >>>> in this thread, so I think we are OK on the Legal side. >>>> >> >>>> >> You can find the PR here: >>>> >> https://github.com/apache/airflow/pull/27866 >>>> >> >>>> >> The voting will last for 6 days (until 6th of December 2022, 6pm >>>> UTC), and until at least 3 binding votes have been cast. I am sure about >>>> the timeframe which is needed for providers actually, please let me know if >>>> it is adequate. >>>> >> >>>> >> Please vote accordingly: >>>> >> >>>> >> [ ] + 1 approve >>>> >> [ ] + 0 no opinion >>>> >> [ ] - 1 disapprove with the reason >>>> >> >>>> >> Only votes from PMC members and committers are binding, but other >>>> members of the community are encouraged to check the AIP and vote with >>>> "(non-binding)". >>>> >> >>>> >> Thanks! >>>> >> >>>> >> Regards, >>>> >> Philippe >>>> >> >>>> >> >>>> >>>> >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>>>
