Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Pankaj Singh
+1 (non-binding).

ran some tests for amazon,google,ftp,snowflake,databricks,openlineage,mysql
providers.

On Thu, Jan 25, 2024 at 4:17 AM Jed Cunningham 
wrote:

> +1 (binding) checked binary reproduction, licences, signatures, and
> checksums.
>
> On my system the binary reproduction check for the source tarball did fail,
> but I spot verified the tarball contents are correct. I'm still
> investigating, but no reason to hold the release for this.
>


Re: [VOTE] January 2024 PR of the Month

2024-01-24 Thread Mehta, Shubham
+1 for #36537. 

Shubham


On 2024-01-23, 3:03 PM, "Scheffler Jens (XC-AS/EAE-ADA-T)" 
mailto:jens.scheff...@de.bosch.com.inva>LID> 
wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.






+1 for #22253 as I favor contributions from non-committers - but also high 
respect for Jareks PR #36537 which was a great step! W/o putting 22253 on the 
plate I would habe voted for 36537


Sent from Outlook for iOS 

From: Hussein Awala mailto:huss...@awala.fr>>
Sent: Tuesday, January 23, 2024 9:12 PM
To: dev@airflow.apache.org  
mailto:dev@airflow.apache.org>>
Subject: Re: [VOTE] January 2024 PR of the Month


+1 for #36537


On Tue, Jan 23, 2024 at 3:40 PM Andrey Anshin mailto:andrey.ans...@taragol.is>>
wrote:


> My vote goes to #36537.
> But I think we should mention PR #22253 as a PR which finally merged -
> "keep calm and add changes"
>
>
> On Tue, 23 Jan 2024 at 17:36, Wei Lee  > wrote:
>
> > My vote is for #36537. Love this packaging improvement!
> >
> > Best,
> > Wei
> >
> > > On Jan 23, 2024, at 10:30 PM, Ryan Hatter  > > 
> .INVALID>
> > wrote:
> > >
> > > Gotta agree with Constance and go with 22253 -- how cool that the
> author
> > > stuck with it all this time!
> > >
> > > On Tue, Jan 23, 2024 at 12:25 AM Aritra Basu  > > 
> >
> > > wrote:
> > >
> > >> My vote is for #36537 it's been a huge effort and it makes huge
> > >> improvements in our packaging. Great to see it make it into airflow.
> > >>
> > >> --
> > >> Regards,
> > >> Aritra Basu
> > >>
> > >> On Tue, Jan 23, 2024, 10:13 AM Amogh Desai  > >> >
> > >> wrote:
> > >>
> > >>> Is there a possibility to vote for more than one? I guess not :/
> > >>>
> > >>> My vote goes to #36537 for the enhancements that have come in with
> it.
> > I
> > >>> have followed the discussions
> > >>> at a higher level and it surely wasn't easy :)
> > >>> (If I could vote again, it would surely be #36537 for the endless
> > >>> perseverance and dedication of the author)
> > >>>
> > >>> Thanks & Regards,
> > >>> Amogh Desai
> > >>>
> > >>> On Tue, Jan 23, 2024 at 3:21 AM Jarek Potiuk  > >>> >
> wrote:
> > >>>
> >  Heck, why not. I will shamelessly vote on my #36537. While it took
> > >> just a
> >  few weeks to merge, It leapfrogged our legacy packaging setup to
> >  more-or-less bleeding edge from what was there since the beginning
> of
> >  Airflow (almost 10 years) and was already "old-ish" when I joined
> the
> >  project more than 4 years ago. And with hatch and cleanups in
> extras,
> > >> it
> >  has a positive impact on both - contributors and users (or so I
> hope).
> > 
> >  On Mon, Jan 22, 2024 at 7:48 PM Constance Martineau
> >   >  lid> wrote:
> > 
> > > +1 #22253
> > >
> > > The PR was opened in March 2022, and was finally merged last week!
> I
> >  admire
> > > the author's persistence in getting this merged in, and think the
> > > simplifications to the interface make the Operator more
> user-friendly
> > >>> for
> > > our Data Science users.
> > >
> > > On Mon, Jan 22, 2024 at 1:29 PM Briana Okyere
> > >  > > lid> wrote:
> > >
> > >> Hey All,
> > >>
> > >> It’s once again time to vote for the PR of the Month.
> > >>
> > >> With the help of the `get_important_pr_candidates` script in
> > >>> dev/stats,
> > >> we've identified the following candidates:
> > >>
> > >> PR #36513: Include plugins in the architecture diagrams.
> > >> >
> > >>  
> > >> 

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Jed Cunningham
+1 (binding) checked binary reproduction, licences, signatures, and
checksums.

On my system the binary reproduction check for the source tarball did fail,
but I spot verified the tarball contents are correct. I'm still
investigating, but no reason to hold the release for this.


Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Hussein Awala
+1 (binding) checked source code, licences, signatures, and checksums and
tested my changes; all look good.

On Wed, Jan 24, 2024 at 4:22 PM Aritra Basu 
wrote:

> +1 non-binding
>
> --
> Regards,
> Aritra Basu
>
> On Wed, Jan 24, 2024, 8:02 PM Utkarsh Sharma
>  wrote:
>
> > +1 Non-binding.
> >
> > Thanks,
> > Utkarsh Sharma
> >
> > On Wed, Jan 24, 2024 at 6:11 PM Pankaj Koti
> >  wrote:
> >
> > > +1 (non-binding)
> > >
> > >
> > > On Wed, 24 Jan 2024, 13:48 rom sharon,  wrote:
> > >
> > > > +1 non-binding
> > > >
> > > > ‫בתאריך יום ד׳, 24 בינו׳ 2024 ב-10:17 מאת ‪Rahul Vats‬‏ <‪
> > > > rah.sharm...@gmail.com‬‏>:‬
> > > >
> > > > > +1 non-binding
> > > > >
> > > > > Verified below providers with our example DAGS.
> > > > >
> > > > >-  apache-airflow-providers-amazon==8.17.0rc1
> > > > >-  apache-airflow-providers-apache-hive==6.5.0rc1
> > > > >-  apache-airflow-providers-common-sql==1.11.0rc1
> > > > >-  apache-airflow-providers-databricks==6.1.0rc1
> > > > >-  apache-airflow-providers-dbt-cloud==3.6.0rc1
> > > > >-  apache-airflow-providers-elasticsearch==5.3.1
> > > > >-  apache-airflow-providers-google==10.14.0rc1
> > > > >-  apache-airflow-providers-http==4.9.0rc1
> > > > >-  apache-airflow-providers-snowflake==5.3.0rc1
> > > > >-  apache-airflow-providers-ftp==3.8.0rc1
> > > > >-  apache-airflow-providers-mysql==5.5.2rc1
> > > > >-  apache-airflow-providers-cohere==1.2.0rc1
> > > > >-  apache-airflow-providers-pinecone==1.2.0rc1
> > > > >-  apache-airflow-providers-weaviate==1.3.1rc1
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rahul Vats
> > > > > 9953794332
> > > > >
> > > > >
> > > > > On Wed, 24 Jan 2024 at 09:56, Amogh Desai <
> amoghdesai@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 non binding
> > > > > >
> > > > > > Tested some example dags in CNCF provider and Hive Provider.
> Looks
> > > > good!
> > > > > >
> > > > > > Thanks & Regards,
> > > > > > Amogh Desai
> > > > > >
> > > > > > On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk 
> > > wrote:
> > > > > >
> > > > > > > +1 (binding) - tested my changes, checked binary
> reproducibility,
> > > > > > licences,
> > > > > > > signatures, checksums - all looks good.
> > > > > > >
> > > > > > > On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif  >
> > > > wrote:
> > > > > > >
> > > > > > > > Hey all,
> > > > > > > >
> > > > > > > > I have just cut the new wave Airflow Providers packages. This
> > > email
> > > > > is
> > > > > > > > calling a vote on the release,
> > > > > > > > which will last for 72 hours - which means that it will end
> on
> > > > > January
> > > > > > > 25,
> > > > > > > > 2024 12:10 PM UTC and until 3 binding +1 votes have been
> > > received.
> > > > > > > >
> > > > > > > > Consider this my (binding) +1.
> > > > > > > >
> > > > > > > > Airflow Providers are available at:
> > > > > > > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > > > > > > >
> > > > > > > > *apache-airflow-providers--*.tar.gz* are the binary
> > > > > > > >  Python "sdist" release - they are also official "sources"
> for
> > > the
> > > > > > > provider
> > > > > > > > packages.
> > > > > > > >
> > > > > > > > *apache_airflow_providers_-*.whl are the binary
> > > > > > > >  Python "wheel" release.
> > > > > > > >
> > > > > > > > The test procedure for PMC members is described in
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > > > > > > >
> > > > > > > > The test procedure for and Contributors who would like to
> test
> > > this
> > > > > RC
> > > > > > is
> > > > > > > > described in:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > > > > > > >
> > > > > > > >
> > > > > > > > Public keys are available at:
> > > > > > > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > > > > > > >
> > > > > > > > Please vote accordingly:
> > > > > > > >
> > > > > > > > [ ] +1 approve
> > > > > > > > [ ] +0 no opinion
> > > > > > > > [ ] -1 disapprove with the reason
> > > > > > > >
> > > > > > > > Only votes from PMC members are binding, but members of the
> > > > community
> > > > > > are
> > > > > > > > encouraged to test the release and vote with "(non-binding)".
> > > > > > > >
> > > > > > > > Please note that the version number excludes the 'rcX'
> string.
> > > > > > > > This will allow us to rename the artifact without modifying
> > > > > > > > the artifact checksums when we actually release.
> > > > > > > >
> > > > > > > > The status of testing the providers by the community is kept
> > > here:
> > > > > > > > https://github.com/apache/airflow/issues/36948
> > > > > > > >
> > > > > > > > The issue is also the easiest way to see important 

Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-24 Thread Tornike Gurgenidze
What I meant by update/delete operations was referring to Dataset objects
themselves, not DatasetEvents. I also see no issue in allowing dataset
changes to be registered externally. I admit that deleting datasets is
probably irrelevant as even now they are not deleted, but instead orphaned
after reference counting, but U in CRUD is still very much relevant imho.
There's a field called extra in DatasetModel for example which has no use
inside airflow, but it still might be used from user code in all sorts of
ways.

I'm not saying it's impossible for these interfaces to coexist if you
isolate them from one another, especially when multiple dag-processors
already do something similar for dags even now (isolating sets of objects
from one another using processor_subdir value), it just feels unnatural to
have a declarative (dag code) and imperative (API/UI) interfaces for
interacting with one type of objects.

On Wed, Jan 24, 2024 at 11:35 PM Constance Martineau
 wrote:

> You're right. I didn't mean to say that the Connections and Datasets
> facilitate the same thing - they don't. I meant that Connections are also
> "useless" if no task is using that Connection - but we allow them to be
> created independently of dags. From that angle - I don't see how allowing
> Datasets to be created independently is any different.
>
> Also happy to hear from others about this.
>
> On Wed, Jan 24, 2024 at 1:55 PM Jarek Potiuk  wrote:
>
> > I'd love to hear what others - especially those who are involved in
> dataset
> > creation and discussion more than me. I personally believe that
> > conceptually connections and datasets are as far from each other as
> > possible (I have no idea where the similarities of connections - which
> are
> > essentially static configuration of credentials) and datasets (which are
> > dynamic reflection of data being passed live between tasks) comes from.
> The
> > only similarity I see is that they are both stored by Airflow in some
> table
> > (and even not that if you use SecretsManager). So comparing those two is
> an
> > apple to pear comparison if you ask me.
> >
> > But (despite my 4 years experience of creating Airflow) my actual
> > experience with Datasets is limited, I've been mainly observing what was
> > going on, so I would love to hear what those who created (and continue to
> > think about future of) the datasets :).
> >
> > J,
> >
> > On Wed, Jan 24, 2024 at 7:27 PM Constance Martineau
> >  wrote:
> >
> > > Right. That is why I was trying to make a distinction in the PR and in
> > this
> > > discussion between CRUD-ing Dataset Objects/Definitions vs creating and
> > > deleting Dataset Events from the queue. Happy to standardize on
> whatever
> > > terminology to make sure things are understood and we can have a
> > productive
> > > conversation.
> > >
> > > For Dataset Events - creating, reading and deleting them via API is
> IMHO
> > > not controversial.
> > > - For creating: This has been discussed in various places, and that the
> > > endpoint could be used to trigger dependent dags
> > > - For deleting: It is easy for DAGs with multiple upstream dependencies
> > to
> > > go out of sync, and there is no way to recover from that without
> > > manipulating the DB directory. See here
> > >  and here
> > > <
> > >
> >
> https://forum.astronomer.io/t/airflow-datasets-can-they-be-cleared-or-reset/2801
> > > >
> > >
> > > For CRUD-ing Dataset Definitions via API:
> > >
> > > > IMHO Airflow should only manage it's own entities and at most it
> should
> > > > emit events (dataset listeners, openlineage API) to inform others
> about
> > > > state changes of things that Airflow manages, but it should not be
> > abused
> > > > to store "other" datasets, that Airflow DAGs know nothing about.
> > >
> > >
> > > I disagree that it is an abuse. If I as an internal data producer
> > publish a
> > > dataset that I expect internal Airflow users to use, it is not abusing
> > > Airflow to create a dataset and make it visible in Airflow. At some
> point
> > > in the near future, users will start referencing them in their dags -
> > it's
> > > just a sequencing question. We don't enforce connections being tied to
> a
> > > dag - and conceptually - this is no different. It is also no different
> > than
> > > adding the definition as part of a dag file and having that dataset
> show
> > up
> > > in the dataset list, without forcing it to be a task output as part of
> a
> > > dag. The only valid reason to now allow it IMHO is because they were
> > > designed to be defined within a dag file, similar to a dag, and we
> don't
> > > want to deal with the impediment I laid out.
> > >
> > > On Wed, Jan 24, 2024 at 12:45 PM Jarek Potiuk 
> wrote:
> > >
> > > > On Wed, Jan 24, 2024 at 5:33 PM Constance Martineau
> > > >  wrote:
> > > >
> > > > > I also think it makes sense to allow people to create/update/delete
> > > > > Datasets via the API and eventually 

Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-24 Thread Constance Martineau
You're right. I didn't mean to say that the Connections and Datasets
facilitate the same thing - they don't. I meant that Connections are also
"useless" if no task is using that Connection - but we allow them to be
created independently of dags. From that angle - I don't see how allowing
Datasets to be created independently is any different.

Also happy to hear from others about this.

On Wed, Jan 24, 2024 at 1:55 PM Jarek Potiuk  wrote:

> I'd love to hear what others - especially those who are involved in dataset
> creation and discussion more than me. I personally believe that
> conceptually connections and datasets are as far from each other as
> possible (I have no idea where the similarities of connections - which are
> essentially static configuration of credentials) and datasets (which are
> dynamic reflection of data being passed live between tasks) comes from. The
> only similarity I see is that they are both stored by Airflow in some table
> (and even not that if you use SecretsManager). So comparing those two is an
> apple to pear comparison if you ask me.
>
> But (despite my 4 years experience of creating Airflow) my actual
> experience with Datasets is limited, I've been mainly observing what was
> going on, so I would love to hear what those who created (and continue to
> think about future of) the datasets :).
>
> J,
>
> On Wed, Jan 24, 2024 at 7:27 PM Constance Martineau
>  wrote:
>
> > Right. That is why I was trying to make a distinction in the PR and in
> this
> > discussion between CRUD-ing Dataset Objects/Definitions vs creating and
> > deleting Dataset Events from the queue. Happy to standardize on whatever
> > terminology to make sure things are understood and we can have a
> productive
> > conversation.
> >
> > For Dataset Events - creating, reading and deleting them via API is IMHO
> > not controversial.
> > - For creating: This has been discussed in various places, and that the
> > endpoint could be used to trigger dependent dags
> > - For deleting: It is easy for DAGs with multiple upstream dependencies
> to
> > go out of sync, and there is no way to recover from that without
> > manipulating the DB directory. See here
> >  and here
> > <
> >
> https://forum.astronomer.io/t/airflow-datasets-can-they-be-cleared-or-reset/2801
> > >
> >
> > For CRUD-ing Dataset Definitions via API:
> >
> > > IMHO Airflow should only manage it's own entities and at most it should
> > > emit events (dataset listeners, openlineage API) to inform others about
> > > state changes of things that Airflow manages, but it should not be
> abused
> > > to store "other" datasets, that Airflow DAGs know nothing about.
> >
> >
> > I disagree that it is an abuse. If I as an internal data producer
> publish a
> > dataset that I expect internal Airflow users to use, it is not abusing
> > Airflow to create a dataset and make it visible in Airflow. At some point
> > in the near future, users will start referencing them in their dags -
> it's
> > just a sequencing question. We don't enforce connections being tied to a
> > dag - and conceptually - this is no different. It is also no different
> than
> > adding the definition as part of a dag file and having that dataset show
> up
> > in the dataset list, without forcing it to be a task output as part of a
> > dag. The only valid reason to now allow it IMHO is because they were
> > designed to be defined within a dag file, similar to a dag, and we don't
> > want to deal with the impediment I laid out.
> >
> > On Wed, Jan 24, 2024 at 12:45 PM Jarek Potiuk  wrote:
> >
> > > On Wed, Jan 24, 2024 at 5:33 PM Constance Martineau
> > >  wrote:
> > >
> > > > I also think it makes sense to allow people to create/update/delete
> > > > Datasets via the API and eventually UI. Even if the dataset is not
> > > > initially connected to a DAG, it's nice to be able to see in one
> place
> > > all
> > > > the datasets and ML models that my dags can leverage. We allow people
> > to
> > > > create Connections and Variables via the API and UI without forcing
> > users
> > > > to use them as part of a task or dag. This isn't any different from
> > that
> > > > aspect.
> > > >
> > > > Airflow has some objects that cab
> > > > > be created by a dag processor (Dags, Datasets) and others that can
> be
> > > > > created with API/UI (Connections, Variables)
> > > >
> > > >
> > > A comment from my side. I think there is a big conceptual difference
> here
> > > that you yourself noticed - DAG code - via DAGProcessor - creates DAG
> and
> > > DataSets, and UI/API can allow to create and modify
> Connections/Variables
> > > that are then USED (but never created) by DAG code. This is why while I
> > see
> > > no fundamental security blocker with "Creating" Datasets via API - it
> > > definitely feels out-of-place to be able to manage them via API.
> > >
> > > And following the discussion from the PR -  Yes, we should discuss
> > create,
> > > update 

Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-24 Thread Jarek Potiuk
I'd love to hear what others - especially those who are involved in dataset
creation and discussion more than me. I personally believe that
conceptually connections and datasets are as far from each other as
possible (I have no idea where the similarities of connections - which are
essentially static configuration of credentials) and datasets (which are
dynamic reflection of data being passed live between tasks) comes from. The
only similarity I see is that they are both stored by Airflow in some table
(and even not that if you use SecretsManager). So comparing those two is an
apple to pear comparison if you ask me.

But (despite my 4 years experience of creating Airflow) my actual
experience with Datasets is limited, I've been mainly observing what was
going on, so I would love to hear what those who created (and continue to
think about future of) the datasets :).

J,

On Wed, Jan 24, 2024 at 7:27 PM Constance Martineau
 wrote:

> Right. That is why I was trying to make a distinction in the PR and in this
> discussion between CRUD-ing Dataset Objects/Definitions vs creating and
> deleting Dataset Events from the queue. Happy to standardize on whatever
> terminology to make sure things are understood and we can have a productive
> conversation.
>
> For Dataset Events - creating, reading and deleting them via API is IMHO
> not controversial.
> - For creating: This has been discussed in various places, and that the
> endpoint could be used to trigger dependent dags
> - For deleting: It is easy for DAGs with multiple upstream dependencies to
> go out of sync, and there is no way to recover from that without
> manipulating the DB directory. See here
>  and here
> <
> https://forum.astronomer.io/t/airflow-datasets-can-they-be-cleared-or-reset/2801
> >
>
> For CRUD-ing Dataset Definitions via API:
>
> > IMHO Airflow should only manage it's own entities and at most it should
> > emit events (dataset listeners, openlineage API) to inform others about
> > state changes of things that Airflow manages, but it should not be abused
> > to store "other" datasets, that Airflow DAGs know nothing about.
>
>
> I disagree that it is an abuse. If I as an internal data producer publish a
> dataset that I expect internal Airflow users to use, it is not abusing
> Airflow to create a dataset and make it visible in Airflow. At some point
> in the near future, users will start referencing them in their dags - it's
> just a sequencing question. We don't enforce connections being tied to a
> dag - and conceptually - this is no different. It is also no different than
> adding the definition as part of a dag file and having that dataset show up
> in the dataset list, without forcing it to be a task output as part of a
> dag. The only valid reason to now allow it IMHO is because they were
> designed to be defined within a dag file, similar to a dag, and we don't
> want to deal with the impediment I laid out.
>
> On Wed, Jan 24, 2024 at 12:45 PM Jarek Potiuk  wrote:
>
> > On Wed, Jan 24, 2024 at 5:33 PM Constance Martineau
> >  wrote:
> >
> > > I also think it makes sense to allow people to create/update/delete
> > > Datasets via the API and eventually UI. Even if the dataset is not
> > > initially connected to a DAG, it's nice to be able to see in one place
> > all
> > > the datasets and ML models that my dags can leverage. We allow people
> to
> > > create Connections and Variables via the API and UI without forcing
> users
> > > to use them as part of a task or dag. This isn't any different from
> that
> > > aspect.
> > >
> > > Airflow has some objects that cab
> > > > be created by a dag processor (Dags, Datasets) and others that can be
> > > > created with API/UI (Connections, Variables)
> > >
> > >
> > A comment from my side. I think there is a big conceptual difference here
> > that you yourself noticed - DAG code - via DAGProcessor - creates DAG and
> > DataSets, and UI/API can allow to create and modify Connections/Variables
> > that are then USED (but never created) by DAG code. This is why while I
> see
> > no fundamental security blocker with "Creating" Datasets via API - it
> > definitely feels out-of-place to be able to manage them via API.
> >
> > And following the discussion from the PR -  Yes, we should discuss
> create,
> > update and delete differently. Because conceptually they are NOT typical
> > CRUD (which the Connection / Variables API UI is).
> > I think there is a huge difference between "Updating" and "Deleting"
> > datasets via the API and the `UD` in CRUD:
> >
> > * Updating dataset does not actually "update" its definition, it informs
> > those who listen on dataset that it has changed. No more, no less.
> > Typically when you have CRUD operation, you pass the same data in "C" and
> > "U" - but in our case those two operations are different and serve
> > different purposes
> > * Deleting the dataset is also not what "D" in CRUD is - in this case it
> is
> 

Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-24 Thread Constance Martineau
Right. That is why I was trying to make a distinction in the PR and in this
discussion between CRUD-ing Dataset Objects/Definitions vs creating and
deleting Dataset Events from the queue. Happy to standardize on whatever
terminology to make sure things are understood and we can have a productive
conversation.

For Dataset Events - creating, reading and deleting them via API is IMHO
not controversial.
- For creating: This has been discussed in various places, and that the
endpoint could be used to trigger dependent dags
- For deleting: It is easy for DAGs with multiple upstream dependencies to
go out of sync, and there is no way to recover from that without
manipulating the DB directory. See here
 and here


For CRUD-ing Dataset Definitions via API:

> IMHO Airflow should only manage it's own entities and at most it should
> emit events (dataset listeners, openlineage API) to inform others about
> state changes of things that Airflow manages, but it should not be abused
> to store "other" datasets, that Airflow DAGs know nothing about.


I disagree that it is an abuse. If I as an internal data producer publish a
dataset that I expect internal Airflow users to use, it is not abusing
Airflow to create a dataset and make it visible in Airflow. At some point
in the near future, users will start referencing them in their dags - it's
just a sequencing question. We don't enforce connections being tied to a
dag - and conceptually - this is no different. It is also no different than
adding the definition as part of a dag file and having that dataset show up
in the dataset list, without forcing it to be a task output as part of a
dag. The only valid reason to now allow it IMHO is because they were
designed to be defined within a dag file, similar to a dag, and we don't
want to deal with the impediment I laid out.

On Wed, Jan 24, 2024 at 12:45 PM Jarek Potiuk  wrote:

> On Wed, Jan 24, 2024 at 5:33 PM Constance Martineau
>  wrote:
>
> > I also think it makes sense to allow people to create/update/delete
> > Datasets via the API and eventually UI. Even if the dataset is not
> > initially connected to a DAG, it's nice to be able to see in one place
> all
> > the datasets and ML models that my dags can leverage. We allow people to
> > create Connections and Variables via the API and UI without forcing users
> > to use them as part of a task or dag. This isn't any different from that
> > aspect.
> >
> > Airflow has some objects that cab
> > > be created by a dag processor (Dags, Datasets) and others that can be
> > > created with API/UI (Connections, Variables)
> >
> >
> A comment from my side. I think there is a big conceptual difference here
> that you yourself noticed - DAG code - via DAGProcessor - creates DAG and
> DataSets, and UI/API can allow to create and modify Connections/Variables
> that are then USED (but never created) by DAG code. This is why while I see
> no fundamental security blocker with "Creating" Datasets via API - it
> definitely feels out-of-place to be able to manage them via API.
>
> And following the discussion from the PR -  Yes, we should discuss create,
> update and delete differently. Because conceptually they are NOT typical
> CRUD (which the Connection / Variables API UI is).
> I think there is a huge difference between "Updating" and "Deleting"
> datasets via the API and the `UD` in CRUD:
>
> * Updating dataset does not actually "update" its definition, it informs
> those who listen on dataset that it has changed. No more, no less.
> Typically when you have CRUD operation, you pass the same data in "C" and
> "U" - but in our case those two operations are different and serve
> different purposes
> * Deleting the dataset is also not what "D" in CRUD is - in this case it is
> mostly a "retention". And there are some very specific things here. Should
> we delete a dataset that some of the DAGs still have as input/output ? IMHO
> - absolutely not. But  How do we know that? If we have only DAGs,
> implicitly creating Datasets by declaring whether they are used or not we
> can easily know that by reference counting. But when we allow the creation
> of the datasets via API - it's no longer that obvious and the number of
> cases to handle gets really big.
>
> After seeing the comments and discussion - I believe it's not a good idea
> to allow external Dataset creations, the use case does not justify it IMHO.
>
> Why ?
>
> We do not want Airflow to become a "dataset metadata storage" that you can
> query/update and find out what all kinds of datasets the whole 
> of yours has - this is not the purpose of Airflow, and will never be IMHO.
> It's a non-goal for Airflow to keep "other" datasets.
>
> IMHO Airflow should only manage it's own entities and at most it should
> emit events (dataset listeners, openlineage API) to inform others about
> state changes of 

Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-24 Thread Jarek Potiuk
On Wed, Jan 24, 2024 at 5:33 PM Constance Martineau
 wrote:

> I also think it makes sense to allow people to create/update/delete
> Datasets via the API and eventually UI. Even if the dataset is not
> initially connected to a DAG, it's nice to be able to see in one place all
> the datasets and ML models that my dags can leverage. We allow people to
> create Connections and Variables via the API and UI without forcing users
> to use them as part of a task or dag. This isn't any different from that
> aspect.
>
> Airflow has some objects that cab
> > be created by a dag processor (Dags, Datasets) and others that can be
> > created with API/UI (Connections, Variables)
>
>
A comment from my side. I think there is a big conceptual difference here
that you yourself noticed - DAG code - via DAGProcessor - creates DAG and
DataSets, and UI/API can allow to create and modify Connections/Variables
that are then USED (but never created) by DAG code. This is why while I see
no fundamental security blocker with "Creating" Datasets via API - it
definitely feels out-of-place to be able to manage them via API.

And following the discussion from the PR -  Yes, we should discuss create,
update and delete differently. Because conceptually they are NOT typical
CRUD (which the Connection / Variables API UI is).
I think there is a huge difference between "Updating" and "Deleting"
datasets via the API and the `UD` in CRUD:

* Updating dataset does not actually "update" its definition, it informs
those who listen on dataset that it has changed. No more, no less.
Typically when you have CRUD operation, you pass the same data in "C" and
"U" - but in our case those two operations are different and serve
different purposes
* Deleting the dataset is also not what "D" in CRUD is - in this case it is
mostly a "retention". And there are some very specific things here. Should
we delete a dataset that some of the DAGs still have as input/output ? IMHO
- absolutely not. But  How do we know that? If we have only DAGs,
implicitly creating Datasets by declaring whether they are used or not we
can easily know that by reference counting. But when we allow the creation
of the datasets via API - it's no longer that obvious and the number of
cases to handle gets really big.

After seeing the comments and discussion - I believe it's not a good idea
to allow external Dataset creations, the use case does not justify it IMHO.

Why ?

We do not want Airflow to become a "dataset metadata storage" that you can
query/update and find out what all kinds of datasets the whole 
of yours has - this is not the purpose of Airflow, and will never be IMHO.
It's a non-goal for Airflow to keep "other" datasets.

IMHO Airflow should only manage it's own entities and at most it should
emit events (dataset listeners, openlineage API) to inform others about
state changes of things that Airflow manages, but it should not be abused
to store "other" datasets, that Airflow DAGs know nothing about. This - in
a way contradicts the "Airflow as a Platform" approach of ours and the
whole concept of OpenLineage integration of Airflow. If you want to have
single place where you store all the datasets you manage are, have all your
components emit open-lineage events and use a dedicated solution (Marquez,
Amundsen, Google Data Catalog etc. etc. ) - all of the serious ones now
consume Open Lineage events that pretty much all serious components already
emit - and there you can have it all. This is our strategic direction - and
this is why we accepted AIP-53 Open Lineage:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow.
At the moment we accepted it, we also accepted the fact that Airflow is
just a producer of lineage data, not a storage nor consumer of it - because
this is the scope of AIP-53.

I think the only way a dataset should be created in Airflow DB is via
DagFileProcessor. With reference counting eventually and removal of
datasets that are not used by anyone any more possibly - if we decide we do
not want to keep old datasets in DB. That should be it IMHO.



>
> --
>
> Constance Martineau
> Senior Product Manager
>
> Email: consta...@astronomer.io
> Time zone: US Eastern (EST UTC-5 / EDT UTC-4)
>
>
> 
>


Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-24 Thread Constance Martineau
I also think it makes sense to allow people to create/update/delete
Datasets via the API and eventually UI. Even if the dataset is not
initially connected to a DAG, it's nice to be able to see in one place all
the datasets and ML models that my dags can leverage. We allow people to
create Connections and Variables via the API and UI without forcing users
to use them as part of a task or dag. This isn't any different from that
aspect.

Airflow has some objects that cab
> be created by a dag processor (Dags, Datasets) and others that can be
> created with API/UI (Connections, Variables)


@Tornike Gurgenidze  brings up a valid point
though: How would we handle changes coming from the API or UI for datasets
that are defined via a dag file? The difference afaik is that if I choose
to define a connection or variable via a dag file, I have to create a
session and explicitly save it to the DB versus instantiating a Connection
or Variable.

On Tue, Jan 23, 2024 at 8:13 AM Jarek Potiuk  wrote:

> Clarifying: There is no (and it has never been) a problem with opening up
> submitting "structured" DAGs.
>
> On Tue, Jan 23, 2024 at 2:12 PM Jarek Potiuk  wrote:
>
> > > I always assumed that this was the reason why it's impossible to create
> > dags from API, no one wanted to open this particular can of worms. I
> think
> > if you need to synchronize these objects, the cleaner way would be to
> > describe them in some sort of a shared config file and let respective
> > dag-processors create them independently of each other.
> >
> > Just to clarify this one: - creating DAGs via API has been resented
> mostly
> > because of security reasons - where you would want to submit Python DAG
> > code via API. There is (and it has never been) a problem with opening up
> > submitting "structured" DAGs. This has never been implemented, but if you
> > would like to limit to just modifying or creating resulting DAG
> structure,
> > that would be possible - for example there is no fundamental problem with
> > generating a DAG from (say) visual representation and submitting a
> > resulting DAG structure without creating a DAG python file (so
> essentially
> > playing the role of DAG file processor to serialize DAGs). It would have
> a
> > number of limitations (for example callbacks would not work., timetables
> > would be a challenge etc.), but other than that it's quite possible (and
> > possibly even in the future we might have something like that).
> >
> > Following that - there are no fundamental problems with submitting
> > datasets - because they are not Python code, they are pure "metadata"
> > objects.
> >
> > Still the questions remains how it plays with the DAG-created datasets is
> > an important aspect of the proposal.
> >
> > J.
> >
> >
> > On Tue, Jan 23, 2024 at 2:01 PM Tornike Gurgenidze <
> > togur...@freeuni.edu.ge> wrote:
> >
> >> Maybe I'm missing something, but I can't see how rest endpoints for
> >> datasets could work in practice. afaik, Airflow has some objects that
> cab
> >> be created by a dag processor (Dags, Datasets) and others that can be
> >> created with API/UI (Connections, Variables), but never both at the same
> >> time. How would update/delete endpoints work if Dataset was initially
> >> created declaratively from a dag file? Would it throw an exception or
> make
> >> an update that will then be reverted in a little while by a
> dag-processor
> >> anyway?
> >>
> >> I always assumed that this was the reason why it's impossible to create
> >> dags from API, no one wanted to open this particular can of worms. I
> think
> >> if you need to synchronize these objects, the cleaner way would be to
> >> describe them in some sort of a shared config file and let respective
> >> dag-processors create them independently of each other.
> >>
> >> On Tue, Jan 23, 2024 at 4:02 PM Jarek Potiuk  wrote:
> >>
> >> > I am also pretty cool with adding/updating/datasets externally,
> however
> >> I
> >> > know there are some ongoing discussions on how to improve/change
> >> datasets
> >> > and bind them together with multiple other features of Airflow - not
> >> sure
> >> > what the state of those, but it would be great those effort are
> >> coordinated
> >> > so that we are not pulling stuff in multiple directions.
> >> >
> >> > From what I've heard/overheard noticed about Datasets are those
> things:
> >> >
> >> > * AIP-60  -
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-60+Standard+URI+representation+for+Airflow+Datasets
> >> > - already almost passed
> >> > * Better coupling of datasets with OpenLineage
> >> > * Partial datasets - allowing to have datasets with data intervals
> >> > * Triggering dags on external dataset changes
> >> > * Objects Storage integration with datasets
> >> >
> >> > All of which sound very promising and are definitely important for
> >> Dataset
> >> > usage.
> >> >
> >> > So I think we really make sure when we are doing anything with
> datasets,
> >> > the 

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Aritra Basu
+1 non-binding

--
Regards,
Aritra Basu

On Wed, Jan 24, 2024, 8:02 PM Utkarsh Sharma
 wrote:

> +1 Non-binding.
>
> Thanks,
> Utkarsh Sharma
>
> On Wed, Jan 24, 2024 at 6:11 PM Pankaj Koti
>  wrote:
>
> > +1 (non-binding)
> >
> >
> > On Wed, 24 Jan 2024, 13:48 rom sharon,  wrote:
> >
> > > +1 non-binding
> > >
> > > ‫בתאריך יום ד׳, 24 בינו׳ 2024 ב-10:17 מאת ‪Rahul Vats‬‏ <‪
> > > rah.sharm...@gmail.com‬‏>:‬
> > >
> > > > +1 non-binding
> > > >
> > > > Verified below providers with our example DAGS.
> > > >
> > > >-  apache-airflow-providers-amazon==8.17.0rc1
> > > >-  apache-airflow-providers-apache-hive==6.5.0rc1
> > > >-  apache-airflow-providers-common-sql==1.11.0rc1
> > > >-  apache-airflow-providers-databricks==6.1.0rc1
> > > >-  apache-airflow-providers-dbt-cloud==3.6.0rc1
> > > >-  apache-airflow-providers-elasticsearch==5.3.1
> > > >-  apache-airflow-providers-google==10.14.0rc1
> > > >-  apache-airflow-providers-http==4.9.0rc1
> > > >-  apache-airflow-providers-snowflake==5.3.0rc1
> > > >-  apache-airflow-providers-ftp==3.8.0rc1
> > > >-  apache-airflow-providers-mysql==5.5.2rc1
> > > >-  apache-airflow-providers-cohere==1.2.0rc1
> > > >-  apache-airflow-providers-pinecone==1.2.0rc1
> > > >-  apache-airflow-providers-weaviate==1.3.1rc1
> > > >
> > > >
> > > > Regards,
> > > > Rahul Vats
> > > > 9953794332
> > > >
> > > >
> > > > On Wed, 24 Jan 2024 at 09:56, Amogh Desai 
> > > > wrote:
> > > >
> > > > > +1 non binding
> > > > >
> > > > > Tested some example dags in CNCF provider and Hive Provider. Looks
> > > good!
> > > > >
> > > > > Thanks & Regards,
> > > > > Amogh Desai
> > > > >
> > > > > On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk 
> > wrote:
> > > > >
> > > > > > +1 (binding) - tested my changes, checked binary reproducibility,
> > > > > licences,
> > > > > > signatures, checksums - all looks good.
> > > > > >
> > > > > > On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif 
> > > wrote:
> > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > I have just cut the new wave Airflow Providers packages. This
> > email
> > > > is
> > > > > > > calling a vote on the release,
> > > > > > > which will last for 72 hours - which means that it will end on
> > > > January
> > > > > > 25,
> > > > > > > 2024 12:10 PM UTC and until 3 binding +1 votes have been
> > received.
> > > > > > >
> > > > > > > Consider this my (binding) +1.
> > > > > > >
> > > > > > > Airflow Providers are available at:
> > > > > > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > > > > > >
> > > > > > > *apache-airflow-providers--*.tar.gz* are the binary
> > > > > > >  Python "sdist" release - they are also official "sources" for
> > the
> > > > > > provider
> > > > > > > packages.
> > > > > > >
> > > > > > > *apache_airflow_providers_-*.whl are the binary
> > > > > > >  Python "wheel" release.
> > > > > > >
> > > > > > > The test procedure for PMC members is described in
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > > > > > >
> > > > > > > The test procedure for and Contributors who would like to test
> > this
> > > > RC
> > > > > is
> > > > > > > described in:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > > > > > >
> > > > > > >
> > > > > > > Public keys are available at:
> > > > > > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > > > > > >
> > > > > > > Please vote accordingly:
> > > > > > >
> > > > > > > [ ] +1 approve
> > > > > > > [ ] +0 no opinion
> > > > > > > [ ] -1 disapprove with the reason
> > > > > > >
> > > > > > > Only votes from PMC members are binding, but members of the
> > > community
> > > > > are
> > > > > > > encouraged to test the release and vote with "(non-binding)".
> > > > > > >
> > > > > > > Please note that the version number excludes the 'rcX' string.
> > > > > > > This will allow us to rename the artifact without modifying
> > > > > > > the artifact checksums when we actually release.
> > > > > > >
> > > > > > > The status of testing the providers by the community is kept
> > here:
> > > > > > > https://github.com/apache/airflow/issues/36948
> > > > > > >
> > > > > > > The issue is also the easiest way to see important PRs included
> > in
> > > > the
> > > > > RC
> > > > > > > candidates.
> > > > > > > Detailed changelog for the providers will be published in the
> > > > > > documentation
> > > > > > > after the
> > > > > > > RC candidates are released.
> > > > > > >
> > > > > > > You can find the RC packages in PyPI following these links:
> > > > > > >
> > > > > > >
> > > https://pypi.org/project/apache-airflow-providers-airbyte/3.6.0rc1/
> > > > > > >
> > > 

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Utkarsh Sharma
+1 Non-binding.

Thanks,
Utkarsh Sharma

On Wed, Jan 24, 2024 at 6:11 PM Pankaj Koti
 wrote:

> +1 (non-binding)
>
>
> On Wed, 24 Jan 2024, 13:48 rom sharon,  wrote:
>
> > +1 non-binding
> >
> > ‫בתאריך יום ד׳, 24 בינו׳ 2024 ב-10:17 מאת ‪Rahul Vats‬‏ <‪
> > rah.sharm...@gmail.com‬‏>:‬
> >
> > > +1 non-binding
> > >
> > > Verified below providers with our example DAGS.
> > >
> > >-  apache-airflow-providers-amazon==8.17.0rc1
> > >-  apache-airflow-providers-apache-hive==6.5.0rc1
> > >-  apache-airflow-providers-common-sql==1.11.0rc1
> > >-  apache-airflow-providers-databricks==6.1.0rc1
> > >-  apache-airflow-providers-dbt-cloud==3.6.0rc1
> > >-  apache-airflow-providers-elasticsearch==5.3.1
> > >-  apache-airflow-providers-google==10.14.0rc1
> > >-  apache-airflow-providers-http==4.9.0rc1
> > >-  apache-airflow-providers-snowflake==5.3.0rc1
> > >-  apache-airflow-providers-ftp==3.8.0rc1
> > >-  apache-airflow-providers-mysql==5.5.2rc1
> > >-  apache-airflow-providers-cohere==1.2.0rc1
> > >-  apache-airflow-providers-pinecone==1.2.0rc1
> > >-  apache-airflow-providers-weaviate==1.3.1rc1
> > >
> > >
> > > Regards,
> > > Rahul Vats
> > > 9953794332
> > >
> > >
> > > On Wed, 24 Jan 2024 at 09:56, Amogh Desai 
> > > wrote:
> > >
> > > > +1 non binding
> > > >
> > > > Tested some example dags in CNCF provider and Hive Provider. Looks
> > good!
> > > >
> > > > Thanks & Regards,
> > > > Amogh Desai
> > > >
> > > > On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk 
> wrote:
> > > >
> > > > > +1 (binding) - tested my changes, checked binary reproducibility,
> > > > licences,
> > > > > signatures, checksums - all looks good.
> > > > >
> > > > > On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif 
> > wrote:
> > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > I have just cut the new wave Airflow Providers packages. This
> email
> > > is
> > > > > > calling a vote on the release,
> > > > > > which will last for 72 hours - which means that it will end on
> > > January
> > > > > 25,
> > > > > > 2024 12:10 PM UTC and until 3 binding +1 votes have been
> received.
> > > > > >
> > > > > > Consider this my (binding) +1.
> > > > > >
> > > > > > Airflow Providers are available at:
> > > > > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > > > > >
> > > > > > *apache-airflow-providers--*.tar.gz* are the binary
> > > > > >  Python "sdist" release - they are also official "sources" for
> the
> > > > > provider
> > > > > > packages.
> > > > > >
> > > > > > *apache_airflow_providers_-*.whl are the binary
> > > > > >  Python "wheel" release.
> > > > > >
> > > > > > The test procedure for PMC members is described in
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > > > > >
> > > > > > The test procedure for and Contributors who would like to test
> this
> > > RC
> > > > is
> > > > > > described in:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > > > > >
> > > > > >
> > > > > > Public keys are available at:
> > > > > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > > > > >
> > > > > > Please vote accordingly:
> > > > > >
> > > > > > [ ] +1 approve
> > > > > > [ ] +0 no opinion
> > > > > > [ ] -1 disapprove with the reason
> > > > > >
> > > > > > Only votes from PMC members are binding, but members of the
> > community
> > > > are
> > > > > > encouraged to test the release and vote with "(non-binding)".
> > > > > >
> > > > > > Please note that the version number excludes the 'rcX' string.
> > > > > > This will allow us to rename the artifact without modifying
> > > > > > the artifact checksums when we actually release.
> > > > > >
> > > > > > The status of testing the providers by the community is kept
> here:
> > > > > > https://github.com/apache/airflow/issues/36948
> > > > > >
> > > > > > The issue is also the easiest way to see important PRs included
> in
> > > the
> > > > RC
> > > > > > candidates.
> > > > > > Detailed changelog for the providers will be published in the
> > > > > documentation
> > > > > > after the
> > > > > > RC candidates are released.
> > > > > >
> > > > > > You can find the RC packages in PyPI following these links:
> > > > > >
> > > > > >
> > https://pypi.org/project/apache-airflow-providers-airbyte/3.6.0rc1/
> > > > > >
> > https://pypi.org/project/apache-airflow-providers-alibaba/2.7.2rc1/
> > > > > >
> > https://pypi.org/project/apache-airflow-providers-amazon/8.17.0rc1/
> > > > > >
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-beam/5.6.0rc1/
> > > > > >
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-druid/3.8.0rc1/
> > > > > >
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc1/
> > > > > 

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Pankaj Koti
+1 (non-binding)


On Wed, 24 Jan 2024, 13:48 rom sharon,  wrote:

> +1 non-binding
>
> ‫בתאריך יום ד׳, 24 בינו׳ 2024 ב-10:17 מאת ‪Rahul Vats‬‏ <‪
> rah.sharm...@gmail.com‬‏>:‬
>
> > +1 non-binding
> >
> > Verified below providers with our example DAGS.
> >
> >-  apache-airflow-providers-amazon==8.17.0rc1
> >-  apache-airflow-providers-apache-hive==6.5.0rc1
> >-  apache-airflow-providers-common-sql==1.11.0rc1
> >-  apache-airflow-providers-databricks==6.1.0rc1
> >-  apache-airflow-providers-dbt-cloud==3.6.0rc1
> >-  apache-airflow-providers-elasticsearch==5.3.1
> >-  apache-airflow-providers-google==10.14.0rc1
> >-  apache-airflow-providers-http==4.9.0rc1
> >-  apache-airflow-providers-snowflake==5.3.0rc1
> >-  apache-airflow-providers-ftp==3.8.0rc1
> >-  apache-airflow-providers-mysql==5.5.2rc1
> >-  apache-airflow-providers-cohere==1.2.0rc1
> >-  apache-airflow-providers-pinecone==1.2.0rc1
> >-  apache-airflow-providers-weaviate==1.3.1rc1
> >
> >
> > Regards,
> > Rahul Vats
> > 9953794332
> >
> >
> > On Wed, 24 Jan 2024 at 09:56, Amogh Desai 
> > wrote:
> >
> > > +1 non binding
> > >
> > > Tested some example dags in CNCF provider and Hive Provider. Looks
> good!
> > >
> > > Thanks & Regards,
> > > Amogh Desai
> > >
> > > On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk  wrote:
> > >
> > > > +1 (binding) - tested my changes, checked binary reproducibility,
> > > licences,
> > > > signatures, checksums - all looks good.
> > > >
> > > > On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif 
> wrote:
> > > >
> > > > > Hey all,
> > > > >
> > > > > I have just cut the new wave Airflow Providers packages. This email
> > is
> > > > > calling a vote on the release,
> > > > > which will last for 72 hours - which means that it will end on
> > January
> > > > 25,
> > > > > 2024 12:10 PM UTC and until 3 binding +1 votes have been received.
> > > > >
> > > > > Consider this my (binding) +1.
> > > > >
> > > > > Airflow Providers are available at:
> > > > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > > > >
> > > > > *apache-airflow-providers--*.tar.gz* are the binary
> > > > >  Python "sdist" release - they are also official "sources" for the
> > > > provider
> > > > > packages.
> > > > >
> > > > > *apache_airflow_providers_-*.whl are the binary
> > > > >  Python "wheel" release.
> > > > >
> > > > > The test procedure for PMC members is described in
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > > > >
> > > > > The test procedure for and Contributors who would like to test this
> > RC
> > > is
> > > > > described in:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > > > >
> > > > >
> > > > > Public keys are available at:
> > > > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > > > >
> > > > > Please vote accordingly:
> > > > >
> > > > > [ ] +1 approve
> > > > > [ ] +0 no opinion
> > > > > [ ] -1 disapprove with the reason
> > > > >
> > > > > Only votes from PMC members are binding, but members of the
> community
> > > are
> > > > > encouraged to test the release and vote with "(non-binding)".
> > > > >
> > > > > Please note that the version number excludes the 'rcX' string.
> > > > > This will allow us to rename the artifact without modifying
> > > > > the artifact checksums when we actually release.
> > > > >
> > > > > The status of testing the providers by the community is kept here:
> > > > > https://github.com/apache/airflow/issues/36948
> > > > >
> > > > > The issue is also the easiest way to see important PRs included in
> > the
> > > RC
> > > > > candidates.
> > > > > Detailed changelog for the providers will be published in the
> > > > documentation
> > > > > after the
> > > > > RC candidates are released.
> > > > >
> > > > > You can find the RC packages in PyPI following these links:
> > > > >
> > > > >
> https://pypi.org/project/apache-airflow-providers-airbyte/3.6.0rc1/
> > > > >
> https://pypi.org/project/apache-airflow-providers-alibaba/2.7.2rc1/
> > > > >
> https://pypi.org/project/apache-airflow-providers-amazon/8.17.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-beam/5.6.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-druid/3.8.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-hive/6.5.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-kafka/1.4.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-kylin/3.6.0rc1/
> > > > >
> > https://pypi.org/project/apache-airflow-providers-apache-pig/4.4.0rc1/
> > > > >
> > >
> 

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Wei Lee
+1 non-binding

Tested my changes and example DAGs without encountering errors.

Best,
Wei

> On Jan 24, 2024, at 4:18 PM, rom sharon  wrote:
> 
> +1 non-binding
> 
> ‫בתאריך יום ד׳, 24 בינו׳ 2024 ב-10:17 מאת ‪Rahul Vats‬‏ <‪
> rah.sharm...@gmail.com‬‏>:‬
> 
>> +1 non-binding
>> 
>> Verified below providers with our example DAGS.
>> 
>>   -  apache-airflow-providers-amazon==8.17.0rc1
>>   -  apache-airflow-providers-apache-hive==6.5.0rc1
>>   -  apache-airflow-providers-common-sql==1.11.0rc1
>>   -  apache-airflow-providers-databricks==6.1.0rc1
>>   -  apache-airflow-providers-dbt-cloud==3.6.0rc1
>>   -  apache-airflow-providers-elasticsearch==5.3.1
>>   -  apache-airflow-providers-google==10.14.0rc1
>>   -  apache-airflow-providers-http==4.9.0rc1
>>   -  apache-airflow-providers-snowflake==5.3.0rc1
>>   -  apache-airflow-providers-ftp==3.8.0rc1
>>   -  apache-airflow-providers-mysql==5.5.2rc1
>>   -  apache-airflow-providers-cohere==1.2.0rc1
>>   -  apache-airflow-providers-pinecone==1.2.0rc1
>>   -  apache-airflow-providers-weaviate==1.3.1rc1
>> 
>> 
>> Regards,
>> Rahul Vats
>> 9953794332
>> 
>> 
>> On Wed, 24 Jan 2024 at 09:56, Amogh Desai 
>> wrote:
>> 
>>> +1 non binding
>>> 
>>> Tested some example dags in CNCF provider and Hive Provider. Looks good!
>>> 
>>> Thanks & Regards,
>>> Amogh Desai
>>> 
>>> On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk  wrote:
>>> 
 +1 (binding) - tested my changes, checked binary reproducibility,
>>> licences,
 signatures, checksums - all looks good.
 
 On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif  wrote:
 
> Hey all,
> 
> I have just cut the new wave Airflow Providers packages. This email
>> is
> calling a vote on the release,
> which will last for 72 hours - which means that it will end on
>> January
 25,
> 2024 12:10 PM UTC and until 3 binding +1 votes have been received.
> 
> Consider this my (binding) +1.
> 
> Airflow Providers are available at:
> https://dist.apache.org/repos/dist/dev/airflow/providers/
> 
> *apache-airflow-providers--*.tar.gz* are the binary
> Python "sdist" release - they are also official "sources" for the
 provider
> packages.
> 
> *apache_airflow_providers_-*.whl are the binary
> Python "wheel" release.
> 
> The test procedure for PMC members is described in
> 
> 
 
>>> 
>> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> 
> The test procedure for and Contributors who would like to test this
>> RC
>>> is
> described in:
> 
> 
 
>>> 
>> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> 
> 
> Public keys are available at:
> https://dist.apache.org/repos/dist/release/airflow/KEYS
> 
> Please vote accordingly:
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove with the reason
> 
> Only votes from PMC members are binding, but members of the community
>>> are
> encouraged to test the release and vote with "(non-binding)".
> 
> Please note that the version number excludes the 'rcX' string.
> This will allow us to rename the artifact without modifying
> the artifact checksums when we actually release.
> 
> The status of testing the providers by the community is kept here:
> https://github.com/apache/airflow/issues/36948
> 
> The issue is also the easiest way to see important PRs included in
>> the
>>> RC
> candidates.
> Detailed changelog for the providers will be published in the
 documentation
> after the
> RC candidates are released.
> 
> You can find the RC packages in PyPI following these links:
> 
> https://pypi.org/project/apache-airflow-providers-airbyte/3.6.0rc1/
> https://pypi.org/project/apache-airflow-providers-alibaba/2.7.2rc1/
> https://pypi.org/project/apache-airflow-providers-amazon/8.17.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-beam/5.6.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-druid/3.8.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-hive/6.5.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-kafka/1.4.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-kylin/3.6.0rc1/
> 
>> https://pypi.org/project/apache-airflow-providers-apache-pig/4.4.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-pinot/4.4.0rc1/
> 
>>> https://pypi.org/project/apache-airflow-providers-apache-spark/4.8.0rc1/
> https://pypi.org/project/apache-airflow-providers-apprise/1.3.0rc1/
> 
 
>>> 
>> https://pypi.org/project/apache-airflow-providers-atlassian-jira/2.6.0rc1/

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread rom sharon
+1 non-binding

‫בתאריך יום ד׳, 24 בינו׳ 2024 ב-10:17 מאת ‪Rahul Vats‬‏ <‪
rah.sharm...@gmail.com‬‏>:‬

> +1 non-binding
>
> Verified below providers with our example DAGS.
>
>-  apache-airflow-providers-amazon==8.17.0rc1
>-  apache-airflow-providers-apache-hive==6.5.0rc1
>-  apache-airflow-providers-common-sql==1.11.0rc1
>-  apache-airflow-providers-databricks==6.1.0rc1
>-  apache-airflow-providers-dbt-cloud==3.6.0rc1
>-  apache-airflow-providers-elasticsearch==5.3.1
>-  apache-airflow-providers-google==10.14.0rc1
>-  apache-airflow-providers-http==4.9.0rc1
>-  apache-airflow-providers-snowflake==5.3.0rc1
>-  apache-airflow-providers-ftp==3.8.0rc1
>-  apache-airflow-providers-mysql==5.5.2rc1
>-  apache-airflow-providers-cohere==1.2.0rc1
>-  apache-airflow-providers-pinecone==1.2.0rc1
>-  apache-airflow-providers-weaviate==1.3.1rc1
>
>
> Regards,
> Rahul Vats
> 9953794332
>
>
> On Wed, 24 Jan 2024 at 09:56, Amogh Desai 
> wrote:
>
> > +1 non binding
> >
> > Tested some example dags in CNCF provider and Hive Provider. Looks good!
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> > On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk  wrote:
> >
> > > +1 (binding) - tested my changes, checked binary reproducibility,
> > licences,
> > > signatures, checksums - all looks good.
> > >
> > > On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif  wrote:
> > >
> > > > Hey all,
> > > >
> > > > I have just cut the new wave Airflow Providers packages. This email
> is
> > > > calling a vote on the release,
> > > > which will last for 72 hours - which means that it will end on
> January
> > > 25,
> > > > 2024 12:10 PM UTC and until 3 binding +1 votes have been received.
> > > >
> > > > Consider this my (binding) +1.
> > > >
> > > > Airflow Providers are available at:
> > > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > > >
> > > > *apache-airflow-providers--*.tar.gz* are the binary
> > > >  Python "sdist" release - they are also official "sources" for the
> > > provider
> > > > packages.
> > > >
> > > > *apache_airflow_providers_-*.whl are the binary
> > > >  Python "wheel" release.
> > > >
> > > > The test procedure for PMC members is described in
> > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > > >
> > > > The test procedure for and Contributors who would like to test this
> RC
> > is
> > > > described in:
> > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > > >
> > > >
> > > > Public keys are available at:
> > > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > > >
> > > > Please vote accordingly:
> > > >
> > > > [ ] +1 approve
> > > > [ ] +0 no opinion
> > > > [ ] -1 disapprove with the reason
> > > >
> > > > Only votes from PMC members are binding, but members of the community
> > are
> > > > encouraged to test the release and vote with "(non-binding)".
> > > >
> > > > Please note that the version number excludes the 'rcX' string.
> > > > This will allow us to rename the artifact without modifying
> > > > the artifact checksums when we actually release.
> > > >
> > > > The status of testing the providers by the community is kept here:
> > > > https://github.com/apache/airflow/issues/36948
> > > >
> > > > The issue is also the easiest way to see important PRs included in
> the
> > RC
> > > > candidates.
> > > > Detailed changelog for the providers will be published in the
> > > documentation
> > > > after the
> > > > RC candidates are released.
> > > >
> > > > You can find the RC packages in PyPI following these links:
> > > >
> > > > https://pypi.org/project/apache-airflow-providers-airbyte/3.6.0rc1/
> > > > https://pypi.org/project/apache-airflow-providers-alibaba/2.7.2rc1/
> > > > https://pypi.org/project/apache-airflow-providers-amazon/8.17.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-beam/5.6.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-druid/3.8.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-hive/6.5.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-kafka/1.4.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-kylin/3.6.0rc1/
> > > >
> https://pypi.org/project/apache-airflow-providers-apache-pig/4.4.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-pinot/4.4.0rc1/
> > > >
> > https://pypi.org/project/apache-airflow-providers-apache-spark/4.8.0rc1/
> > > > https://pypi.org/project/apache-airflow-providers-apprise/1.3.0rc1/
> > > >
> > >
> >
> https://pypi.org/project/apache-airflow-providers-atlassian-jira/2.6.0rc1/
> > > > https://pypi.org/project/apache-airflow-providers-celery/3.5.2rc1/
> 

Re: [VOTE] Airflow Providers prepared on January 22, 2024

2024-01-24 Thread Rahul Vats
+1 non-binding

Verified below providers with our example DAGS.

   -  apache-airflow-providers-amazon==8.17.0rc1
   -  apache-airflow-providers-apache-hive==6.5.0rc1
   -  apache-airflow-providers-common-sql==1.11.0rc1
   -  apache-airflow-providers-databricks==6.1.0rc1
   -  apache-airflow-providers-dbt-cloud==3.6.0rc1
   -  apache-airflow-providers-elasticsearch==5.3.1
   -  apache-airflow-providers-google==10.14.0rc1
   -  apache-airflow-providers-http==4.9.0rc1
   -  apache-airflow-providers-snowflake==5.3.0rc1
   -  apache-airflow-providers-ftp==3.8.0rc1
   -  apache-airflow-providers-mysql==5.5.2rc1
   -  apache-airflow-providers-cohere==1.2.0rc1
   -  apache-airflow-providers-pinecone==1.2.0rc1
   -  apache-airflow-providers-weaviate==1.3.1rc1


Regards,
Rahul Vats
9953794332


On Wed, 24 Jan 2024 at 09:56, Amogh Desai  wrote:

> +1 non binding
>
> Tested some example dags in CNCF provider and Hive Provider. Looks good!
>
> Thanks & Regards,
> Amogh Desai
>
> On Tue, Jan 23, 2024 at 8:44 PM Jarek Potiuk  wrote:
>
> > +1 (binding) - tested my changes, checked binary reproducibility,
> licences,
> > signatures, checksums - all looks good.
> >
> > On Mon, Jan 22, 2024 at 1:11 PM Elad Kalif  wrote:
> >
> > > Hey all,
> > >
> > > I have just cut the new wave Airflow Providers packages. This email is
> > > calling a vote on the release,
> > > which will last for 72 hours - which means that it will end on January
> > 25,
> > > 2024 12:10 PM UTC and until 3 binding +1 votes have been received.
> > >
> > > Consider this my (binding) +1.
> > >
> > > Airflow Providers are available at:
> > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > >
> > > *apache-airflow-providers--*.tar.gz* are the binary
> > >  Python "sdist" release - they are also official "sources" for the
> > provider
> > > packages.
> > >
> > > *apache_airflow_providers_-*.whl are the binary
> > >  Python "wheel" release.
> > >
> > > The test procedure for PMC members is described in
> > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > >
> > > The test procedure for and Contributors who would like to test this RC
> is
> > > described in:
> > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > >
> > >
> > > Public keys are available at:
> > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > >
> > > Please vote accordingly:
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove with the reason
> > >
> > > Only votes from PMC members are binding, but members of the community
> are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > Please note that the version number excludes the 'rcX' string.
> > > This will allow us to rename the artifact without modifying
> > > the artifact checksums when we actually release.
> > >
> > > The status of testing the providers by the community is kept here:
> > > https://github.com/apache/airflow/issues/36948
> > >
> > > The issue is also the easiest way to see important PRs included in the
> RC
> > > candidates.
> > > Detailed changelog for the providers will be published in the
> > documentation
> > > after the
> > > RC candidates are released.
> > >
> > > You can find the RC packages in PyPI following these links:
> > >
> > > https://pypi.org/project/apache-airflow-providers-airbyte/3.6.0rc1/
> > > https://pypi.org/project/apache-airflow-providers-alibaba/2.7.2rc1/
> > > https://pypi.org/project/apache-airflow-providers-amazon/8.17.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-beam/5.6.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-druid/3.8.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-hive/6.5.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-kafka/1.4.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-kylin/3.6.0rc1/
> > > https://pypi.org/project/apache-airflow-providers-apache-pig/4.4.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-pinot/4.4.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-apache-spark/4.8.0rc1/
> > > https://pypi.org/project/apache-airflow-providers-apprise/1.3.0rc1/
> > >
> >
> https://pypi.org/project/apache-airflow-providers-atlassian-jira/2.6.0rc1/
> > > https://pypi.org/project/apache-airflow-providers-celery/3.5.2rc1/
> > > https://pypi.org/project/apache-airflow-providers-cloudant/3.5.0rc1/
> > >
> > >
> >
> https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/7.14.0rc1/
> > > https://pypi.org/project/apache-airflow-providers-cohere/1.2.0rc1/
> > >
> https://pypi.org/project/apache-airflow-providers-common-sql/1.11.0rc1/
> > >