This has yet to be published by Google and Amazon - I know they are progressing 
a lot on making the automation and publishing regular result of the System 
tests from main in the way that we can verify that all tests pass - all that is 
done outside of the community resources and maintenance (i.e. this is entirely 
on the Amazon and Google teams to run and publish those tests).

Just as an update: we're still working hard on this, I promise :) It has taken 
MUCH longer than expected to get all the requisite internal approvals and 
agreement on how to share the results with the community. But we're zeroing in 
on an approach that everyone agrees on for publication. Please bear with us on 
this one!



In this case - anyone with any "Cloudera" account should be able to run it 
locally when contributing. But the idea of AIP-47 was to off-load regular 
execution of those tests and provide public "status" of those to those teams of 
those service providers that want to make sure that their provider still runs.

Agreed, the tests are written in a way that anyone can run them (with 
mechanisms to provide any pre-exisitng resources some tests required). But to 
expect the community to have the resources to regularly run the all the system 
tests for all providers is unreasonable, collaboration is really required here.

Cheers,
Niko



________________________________
From: Pierre Jeambrun <pierrejb...@gmail.com>
Sent: Friday, December 9, 2022 11:43:21 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL][VOTE] New Provider: Cloudera


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Thanks for taking time to give more details Jarek. This puts things in 
perspective.

Le ven. 9 déc. 2022 à 18:48, Collin McNulty <col...@astronomer.io.invalid> a 
écrit :
I concur with the concerns raised by Ash. Cloudera seems like an organization 
quite well suited to releasing its own provider. If such an organization is not 
expected to release outside the Apache process, who is? Maybe I'm 
misunderstanding, but I thought that the idea was that providers going forward 
would be mostly third party which allows for a larger and more vibrant 
ecosystem.

Collin McNulty

On Fri, Dec 9, 2022 at 6:04 AM Pierre Jeambrun 
<pierrejb...@gmail.com<mailto:pierrejb...@gmail.com>> wrote:
Hello,

I am really excited about a public official cloudera provider for airflow. This 
would be a great addition to the airflow ecosystem.

System tests would be an additional layer that would be great for the CI and 
release process, but would individual contributors be able to run these system 
tests locally ? From what I understand, such credentials would be stored in the 
CI, and only people with their own credentials would be able to test the code 
locally and therefore realistically help in maintaining the provider. 
(Iterating on CI failure wouldn't be great :p)

Ash point is echoing in me, remembering when I had to work on a specific 
provider where free accounts/quotas were not available. It was basically a shot 
in the dark, making code changes based on documentation and api specs without 
being able to actually test the code. Maybe this was the reason the issues 
stayed open for more than a year without being picked up.

Will the community really be able to contribute and support the provider, while 
most of us don't have a paid account ? Or is it 'stakeholders' maintained and 
'community' released at most. (Even reviewing code for release would be tricky 
without an account).

Maybe I misunderstood something and apologize in advance.

Best regards,
Pierre

Le ven. 9 déc. 2022 à 12:24, Jarek Potiuk 
<ja...@potiuk.com<mailto:ja...@potiuk.com>> a écrit :
> My concern about how we will actually test it works given we'd need a 
> cloudera account/install/instance would be good to comment on though.

This is a very good point Ash and I love you've made it as I think we have a 
very good solution at hand.

This simply calls for Cloudera's commitment to work on AIP-47 style tests and 
providing a test bed for that.

This has yet to be published by Google and Amazon - I know they are progressing 
a lot on making the automation and publishing regular result of the System 
tests from main in the way that we can verify that all tests pass - all that is 
done outside of the community resources and maintenance (i.e. this is entirely 
on the Amazon and Google teams to run and publish those tests).

So I have a PROPOSAL (I can send a formal vote on that shortly)

For all the future (starting from Cloudera) we should make that as a 
requirement that any of the providers accepted by the community MUST have 
AIP-47 style System Tests and the service provider in question MUST provide 
their own System Test environment with public access of status for the 
community and commit to maintaining those for as long as the Provider is 
released by the community.

I think this is a very reasonable ask for Cloudera (and anyone else in the 
future) and a very, very good compromise (win-win for both sides while also 
requiring both sides to commit to a long term cooperation). This way we make 
sure we have to cooperate with the service provider rather than letting the 
Service provider "throw the code over the fence" and put all the burden of 
maintenance on the community.

* with AIP-47 we provided a very solid foundation for fully-automated system 
testing of precisely this kind of external service providers
* we (community) take on our shoulders the burden of reviewing and releasing 
the code, and at the same time the service gets community recognition and 
becomes part of the "Airflow Community supported"
* similarly the Service Provider takes on their shoulders the burden or running 
and keeping in check the System Tests Bed for their system tests submitted to 
the community and make sure they succeed before the release happens
* whenever we release such a service provider - we hold on with the release for 
that provider until the system tests for such provider are green (and it's on 
the service provider to fix the problems with those before we release).
* I know both Google and Amazon are committed to do so, I also know Databricks 
is looking into it and in the future we might decide to apply it to all 
"external service providers".

Philippe - what do you think about such an arrangement? Is that something that 
Cloudera will be able to commit to?

J.


On Fri, Dec 9, 2022 at 12:00 PM Ash Berlin-Taylor 
<a...@apache.org<mailto:a...@apache.org>> wrote:
As per the original vote email:

Please note that this vote is about the fact to add this new provider not about 
the code itself, which will be reviewed as part of the PR

So it's not a veto (as vetos can only apply to code).

My concern about how we will actually test it works given we'd need a cloudera 
account/install/instance would be good to comment on though.

-ash

On Dec 7 2022, at 1:43 pm, Jarek Potiuk 
<ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote:
Yeah. I would really want to understand that (and maybe others have an opinion 
here):

https://www.apache.org/foundation/voting.html

* Is this a "code modification" - where -1 is veto
* or is it a "procedural issue" - where -1 is just a vote and majority rules

I personally think that "code modification" is really on "PR review" level - 
when we see that the code submitted is not good.  But this case seems to be 
more of a procedural issue than code modification. For me this is more "are we 
ok to accept a provider from cloudera?" rather than "do we accept this code".

Ash - how do you treat your -1 ?

And others - what do you think of that ?

I think the next course of action depends if we have consensus on how we treat 
the issue of "adding a new provider".

J.





On Wed, Dec 7, 2022 at 1:45 PM Philippe Lanoe <pla...@cloudera.com.invalid> 
wrote:
Hello Airflow community,

Following up on this -1. I'm assuming that's a veto?

If it is, would it be possible to decouple the provider sustainability 
discussion from this proposal (Cloudera provider addition request)?

I do think sustainability discussions make full sense but I feel that this new 
provider is following the current rules that the community has established so 
far. The original thread [1] in which we discussed Cloudera provider addition 
(we were not ready with the PR at that time) led to the new provider discussion 
[2] and finally the lazy consensus [3] on mixed governance model. The outcome 
was a new mixed governance rule which was introduced [4], with an aim to (a) 
reduce the maintenance burden for the community and (b) allow more providers in 
since point (a) became acceptable.

Let me know if it is acceptable to break up these two discussions and have this 
vote move forward.

Thank you,
Regards.
Philippe

[1] https://lists.apache.org/thread/2z0lvgj466ksxxrbvofx41qvn03jrwwb
[2] https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn
[3] https://lists.apache.org/thread/gq9vym17x0o8j8s9clkbmdz2nt38nnbt
[4] https://github.com/apache/airflow/pull/24680



On Mon, Dec 5, 2022 at 1:54 PM Ash Berlin-Taylor 
<a...@apache.org<mailto:a...@apache.org>> wrote:
Just to break with the consensus: -1

Not because I don't think the provider would be useful or popular enough, 
precisely the opposite, and I'd like to see more companies maintain and manage 
their own providers and see an ecosystem of providers start to grow.

Cloudera def has the means and resources to maintain their own provider, and 
the communication channels to let their users/customers know about its 
existence. And I have no problem with linking to the provider from our docs 
index.

In generaly I am slightly worried about the workload we as maintainers are 
letting ourselves in for inthe long run with an ever growing number of 
providers. Particularly one that needs paid-for accounts that we don't have 
access to!

-ash

On Dec 4 2022, at 11:59 pm, Kaxil Naik 
<kaxiln...@gmail.com<mailto:kaxiln...@gmail.com>> wrote:
+1 binding

On Sat, 3 Dec 2022 at 15:14, Holden Karau 
<hol...@pigscanfly.ca<mailto:hol...@pigscanfly.ca>> wrote:
non-binding +1

On Sat, Dec 3, 2022 at 3:55 AM Jarek Potiuk 
<ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote:
I think cloudera is important player in our ecosystem and as long as
it passes all the bars (i.e. 2.3.0+ compatibility and good
non-conflicting dependencies, passing all the tests, I am +1.

On Sat, Dec 3, 2022 at 12:51 PM Philippe Lanoe
<pla...@cloudera.com.invalid> wrote:
>
> Hello,
>
> Correction: since it is a vote on code modification, all committers' votes 
> count, I was mistaken in my previous email (which mentioned only PMC votes 
> are binding), quite new in this process.
> Please let me know if a discussion thread is preferred.
>
> Thanks,
> Regards,
> Philippe
>
> On Wed, Nov 30, 2022 at 5:34 PM Philippe Lanoe 
> <pla...@cloudera.com<mailto:pla...@cloudera.com>> wrote:
>>
>> Hello Airflow community!
>>
>> As requested in our PR, I would like to start a vote for adding a new 
>> provider (Cloudera). Please note that this vote is about the fact to add 
>> this new provider not about the code itself, which will be reviewed as part 
>> of the PR.
>>
>> We would like to contribute the Cloudera provider to allow data 
>> practitioners out-of-the-box interactions with a multi-function analytics 
>> and hybrid platform,
>>
>> Our first two Operators are CdeRunJobOperator, to run a CDE job (Spark or 
>> Airflow within the Cloudera Data Engineering service) and 
>> CdwExecuteQueryOperator, to execute a query on a managed CDW cluster (Hive / 
>> Impala within the Cloudera Data Warehousing service). It also comes with a 
>> Sensor for CDW, in order to wait on a Hive partition.
>> We are also planning to contribute more in the future, as we develop 
>> operators for other Cloudera services in Cloudera Data Platform (CDP), like 
>> Cloudera Machine Learning and others, to cover the various needs of data 
>> practitioners across the entire data lifecycle.
>>
>> Our code has been already used for quite some time internally and we would 
>> like to contribute it to Airflow, to give a better experience for the users 
>> as it would be another system that users can reach seamlessly in their 
>> pipelines.
>>
>> Another important Note: Cloudera already filed a CCLA as mentioned in this 
>> thread, so I think we are OK on the Legal side.
>>
>> You can find the PR here:
>> https://github.com/apache/airflow/pull/27866
>>
>> The voting will last for 6 days (until 6th of December 2022, 6pm UTC), and 
>> until at least 3 binding votes have been cast. I am sure about the timeframe 
>> which is needed for providers actually, please let me know if it is adequate.
>>
>> Please vote accordingly:
>>
>> [ ] + 1 approve
>> [ ] + 0 no opinion
>> [ ] - 1 disapprove with the reason
>>
>> Only votes from PMC members and committers are binding, but other members of 
>> the community are encouraged to check the AIP and vote with "(non-binding)".
>>
>> Thanks!
>>
>> Regards,
>> Philippe
>>
>>


--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to