Hello,

I am really excited about a public official cloudera provider for airflow.
This would be a great addition to the airflow ecosystem.

System tests would be an additional layer that would be great for the CI
and release process, but would individual contributors be able to run these
system tests locally ? From what I understand, such credentials would be
stored in the CI, and only people with their own credentials would be able
to test the code locally and therefore realistically help in maintaining
the provider. (Iterating on CI failure wouldn't be great :p)

Ash point is echoing in me, remembering when I had to work on a specific
provider where free accounts/quotas were not available. It was basically a
shot in the dark, making code changes based on documentation and api specs
without being able to actually test the code. Maybe this was the reason the
issues stayed open for more than a year without being picked up.

Will the community really be able to contribute and support the provider,
while most of us don't have a paid account ? Or is it 'stakeholders'
maintained and 'community' released at most. (Even reviewing code for
release would be tricky without an account).

Maybe I misunderstood something and apologize in advance.

Best regards,
Pierre

Le ven. 9 déc. 2022 à 12:24, Jarek Potiuk <ja...@potiuk.com> a écrit :

> > My concern about how we will actually test it works given we'd need a
> cloudera account/install/instance would be good to comment on though.
>
> This is a very good point Ash and I love you've made it as I think we have
> a very good solution at hand.
>
> This simply calls for Cloudera's commitment to work on AIP-47 style tests
> and providing a test bed for that.
>
> This has yet to be published by Google and Amazon - I know they are
> progressing a lot on making the automation and publishing regular result of
> the System tests from main in the way that we can verify that all tests
> pass - all that is done outside of the community resources and maintenance
> (i.e. this is entirely on the Amazon and Google teams to run and publish
> those tests).
>
> So I have a PROPOSAL (I can send a formal vote on that shortly)
>
> For all the future (starting from Cloudera) we should make that as a
> requirement that any of the providers accepted by the community MUST have
> AIP-47 style System Tests and the service provider in question MUST provide
> their own System Test environment with public access of status for the
> community and commit to maintaining those for as long as the Provider is
> released by the community.
>
> I think this is a very reasonable ask for Cloudera (and anyone else in the
> future) and a very, very good compromise (win-win for both sides while also
> requiring both sides to commit to a long term cooperation). This way we
> make sure we have to cooperate with the service provider rather than
> letting the Service provider "throw the code over the fence" and put all
> the burden of maintenance on the community.
>
> * with AIP-47 we provided a very solid foundation for fully-automated
> system testing of precisely this kind of external service providers
> * we (community) take on our shoulders the burden of reviewing and
> releasing the code, and at the same time the service gets community
> recognition and becomes part of the "Airflow Community supported"
> * similarly the Service Provider takes on their shoulders the burden or
> running and keeping in check the System Tests Bed for their system tests
> submitted to the community and make sure they succeed before the release
> happens
> * whenever we release such a service provider - we hold on with the
> release for that provider until the system tests for such provider are
> green (and it's on the service provider to fix the problems with those
> before we release).
> * I know both Google and Amazon are committed to do so, I also know
> Databricks is looking into it and in the future we might decide to apply it
> to all "external service providers".
>
> Philippe - what do you think about such an arrangement? Is that something
> that Cloudera will be able to commit to?
>
> J.
>
>
> On Fri, Dec 9, 2022 at 12:00 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
>> As per the original vote email:
>>
>> Please note that this vote is about the fact to add this new provider not
>> about the code itself, which will be reviewed as part of the PR
>>
>>
>> So it's not a veto (as vetos can only apply to code).
>>
>> My concern about how we will actually test it works given we'd need a
>> cloudera account/install/instance would be good to comment on though.
>>
>> -ash
>>
>> On Dec 7 2022, at 1:43 pm, Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> Yeah. I would really want to understand that (and maybe others have an
>> opinion here):
>>
>> https://www.apache.org/foundation/voting.html
>>
>> * Is this a "code modification" - where -1 is veto
>> * or is it a "procedural issue" - where -1 is just a vote and majority
>> rules
>>
>> I personally think that "code modification" is really on "PR review"
>> level - when we see that the code submitted is not good.  But this case
>> seems to be more of a procedural issue than code modification. For me this
>> is more "are we ok to accept a provider from cloudera?" rather than "do we
>> accept this code".
>>
>> Ash - how do you treat your -1 ?
>>
>> And others - what do you think of that ?
>>
>> I think the next course of action depends if we have consensus on how we
>> treat the issue of "adding a new provider".
>>
>> J.
>>
>>
>>
>>
>>
>> On Wed, Dec 7, 2022 at 1:45 PM Philippe Lanoe <pla...@cloudera.com.invalid>
>> wrote:
>>
>> Hello Airflow community,
>>
>> Following up on this -1. I'm assuming that's a veto?
>>
>> If it is, would it be possible to decouple the provider sustainability
>> discussion from this proposal (Cloudera provider addition request)?
>>
>> I do think sustainability discussions make full sense but I feel that
>> this new provider is following the current rules that the community has
>> established so far. The original thread [1] in which we discussed Cloudera
>> provider addition (we were not ready with the PR at that time) led to the
>> new provider discussion [2] and finally the lazy consensus [3] on mixed
>> governance model. The outcome was a new mixed governance rule which was
>> introduced [4], with an aim to (a) reduce the maintenance burden for the
>> community and (b) allow more providers in since point (a) became acceptable.
>>
>> Let me know if it is acceptable to break up these two discussions and
>> have this vote move forward.
>>
>> Thank you,
>> Regards.
>> Philippe
>>
>> [1] https://lists.apache.org/thread/2z0lvgj466ksxxrbvofx41qvn03jrwwb
>> [2] https://lists.apache.org/thread/nvfc75kj2w1tywvvkw8ho5wkx1dcvgrn
>> [3] https://lists.apache.org/thread/gq9vym17x0o8j8s9clkbmdz2nt38nnbt
>> [4] https://github.com/apache/airflow/pull/24680
>>
>>
>>
>> On Mon, Dec 5, 2022 at 1:54 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>>
>> Just to break with the consensus: -1
>>
>> Not because I don't think the provider would be useful or popular enough,
>> precisely the opposite, and I'd like to see more companies maintain and
>> manage their own providers and see an ecosystem of providers start to grow.
>>
>> Cloudera def has the means and resources to maintain their own provider,
>> and the communication channels to let their users/customers know about its
>> existence. And I have no problem with linking to the provider from our docs
>> index.
>>
>> In generaly I am slightly worried about the workload we as maintainers
>> are letting ourselves in for inthe long run with an ever growing number of
>> providers. Particularly one that needs paid-for accounts that we don't have
>> access to!
>>
>> -ash
>>
>> On Dec 4 2022, at 11:59 pm, Kaxil Naik <kaxiln...@gmail.com> wrote:
>>
>> +1 binding
>>
>> On Sat, 3 Dec 2022 at 15:14, Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>> non-binding +1
>>
>> On Sat, Dec 3, 2022 at 3:55 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> I think cloudera is important player in our ecosystem and as long as
>> it passes all the bars (i.e. 2.3.0+ compatibility and good
>> non-conflicting dependencies, passing all the tests, I am +1.
>>
>> On Sat, Dec 3, 2022 at 12:51 PM Philippe Lanoe
>> <pla...@cloudera.com.invalid> wrote:
>> >
>> > Hello,
>> >
>> > Correction: since it is a vote on code modification, all committers'
>> votes count, I was mistaken in my previous email (which mentioned only PMC
>> votes are binding), quite new in this process.
>> > Please let me know if a discussion thread is preferred.
>> >
>> > Thanks,
>> > Regards,
>> > Philippe
>> >
>> > On Wed, Nov 30, 2022 at 5:34 PM Philippe Lanoe <pla...@cloudera.com>
>> wrote:
>> >>
>> >> Hello Airflow community!
>> >>
>> >> As requested in our PR, I would like to start a vote for adding a new
>> provider (Cloudera). Please note that this vote is about the fact to add
>> this new provider not about the code itself, which will be reviewed as part
>> of the PR.
>> >>
>> >> We would like to contribute the Cloudera provider to allow data
>> practitioners out-of-the-box interactions with a multi-function analytics
>> and hybrid platform,
>> >>
>> >> Our first two Operators are CdeRunJobOperator, to run a CDE job (Spark
>> or Airflow within the Cloudera Data Engineering service) and
>> CdwExecuteQueryOperator, to execute a query on a managed CDW cluster (Hive
>> / Impala within the Cloudera Data Warehousing service). It also comes with
>> a Sensor for CDW, in order to wait on a Hive partition.
>> >> We are also planning to contribute more in the future, as we develop
>> operators for other Cloudera services in Cloudera Data Platform (CDP), like
>> Cloudera Machine Learning and others, to cover the various needs of data
>> practitioners across the entire data lifecycle.
>> >>
>> >> Our code has been already used for quite some time internally and we
>> would like to contribute it to Airflow, to give a better experience for the
>> users as it would be another system that users can reach seamlessly in
>> their pipelines.
>> >>
>> >> Another important Note: Cloudera already filed a CCLA as mentioned in
>> this thread, so I think we are OK on the Legal side.
>> >>
>> >> You can find the PR here:
>> >> https://github.com/apache/airflow/pull/27866
>> >>
>> >> The voting will last for 6 days (until 6th of December 2022, 6pm UTC),
>> and until at least 3 binding votes have been cast. I am sure about the
>> timeframe which is needed for providers actually, please let me know if it
>> is adequate.
>> >>
>> >> Please vote accordingly:
>> >>
>> >> [ ] + 1 approve
>> >> [ ] + 0 no opinion
>> >> [ ] - 1 disapprove with the reason
>> >>
>> >> Only votes from PMC members and committers are binding, but other
>> members of the community are encouraged to check the AIP and vote with
>> "(non-binding)".
>> >>
>> >> Thanks!
>> >>
>> >> Regards,
>> >> Philippe
>> >>
>> >>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>

Reply via email to