Re: [DISCUSS] AIP-63, AIP-64, and AIP-65: DAG Versioning

2024-05-28 Thread Bolke de Bruin
In context of AIP-71 - slightly directing your attetion there for discussion 
purposes I think it would be nice to do a

dag = dag_load(ObjectStoragePath("dagfs://mydag?version=1))

Having dag versioning as an fs implementation would open up additional 
interesting avenues for DAG manipulation.

BTW there is a data contract implementation that is gaining some traction: 
https://github.com/datacontract/datacontract-cli


Bolke


Sent from my iPhone

> On 28 May 2024, at 16:16, Constance Martineau 
>  wrote:
> 
> Agreed.  When Jed and team wrote the AIP, we intentionally limited the
> scope to DAGs since the AIPs were already really large, but the intention
> is to extend the concept to datasets.
> 
> Funny that you bring up point #2. A few of us met last week to talk about
> DAG Versioning, and that use-case came up. Not only should you be allowed
> to declare the state of each version, you should also be able to pick a
> version for normally scheduled runs that is not necessarily the most recent
> (for example the most recent version tagged as prod), while also running
> other versions adhoc, such as the draft version that may have just been
> deployed. Like Kaxil said, this will be covered by AIP-66.
> 
>> On Tue, May 28, 2024 at 5:52 AM Kaxil Naik  wrote:
>> 
>> Yes to both the below questions @Elad Kalif . The
>> upcoming Data-Awareness AIPs the first one and the 2nd should be covered by
>> AIP-66 once it is out of draft.
>> 
>> 1. Should datasets be also versioned?
>>> 2. Should we support executing more than 1 DAG version at a given time?
>> 
>> 
>>> On Tue, 28 May 2024 at 10:07, Elad Kalif  wrote:
>>> 
>>> I have a general question about (maybe somehow related to the DAG Bundle
>>> concept introduced in the AIPs)
>>> The way I see it DAGs are tightly coupled with Datasets. Tasks take
>>> dependency on dataset or/and produce a dataset.
>>> We are focused on the versions of the code (DAG) but to make this play
>>> nicely we should consider also applying versions to datasets.
>>> Granted not every change to DAG code means change in dataset version but
>> we
>>> should consider if we want to leave datasets versionless.
>>> 
>>> I previously worked with some data products that allow versioning of
>> tables
>>> and it was really nice! It enabled the concept of Data Contract (treating
>>> tables much like you treat API) and it made things much easier.
>>> I sometimes even had two versions of the same workflow running one for
>> the
>>> new version and one for the deprecated version thus allowing my customers
>>> the flexibility to migrate between the table versions before the
>> deprecated
>>> version is discontinued.
>>> 
>>> I am raising two main questions here:
>>> 1. Should datasets be also versioned?
>>> 2. Should we support executing more than 1 DAG version at a given time?
>>> (allow user to declare Draft/Production/Deprecated/Deleted) state for
>> each
>>> version.
>>> 
>>> On Wed, Mar 6, 2024 at 1:58 AM Jed Cunningham 
>>> wrote:
>>> 
 Hello everyone!
 
 I'm excited to start a discussion around DAG Versioning in Airflow.
>> It's
 been the most requested feature in the last 3 community surveys!
 
 AIP-63: DAG Versioning
 <
 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-63%3A+DAG+Versioning
> 
 
 As this topic quickly becomes rather large, I've made AIP-63 an
>> umbrella
 AIP and split the specifics into separate AIPs:
 
 AIP-64: Keep TaskInstance try history
 <
 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-64%3A+Keep+TaskInstance+try+history
> 
 AIP-65: Improve DAG history in UI
 <
 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-65%3A+Improve+DAG+history+in+UI
> 
 [WIP] AIP-66: Execution of specific DAG code versions
 <
 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-66%3A+Execution+of+specific+DAG+versions
> 
 
 AIP-64 and AIP-65 are ready to be discussed in depth, while AIP-66 is
>>> there
 to provide an intentionally high level vision of what we may want to
>>> tackle
 before Airflow's "DAG versioning" story is complete.
 
 Thanks,
 Jed
 
>>> 
>> 


Re: [DISCUSS] AIP-63, AIP-64, and AIP-65: DAG Versioning

2024-05-28 Thread Constance Martineau
Agreed.  When Jed and team wrote the AIP, we intentionally limited the
scope to DAGs since the AIPs were already really large, but the intention
is to extend the concept to datasets.

Funny that you bring up point #2. A few of us met last week to talk about
DAG Versioning, and that use-case came up. Not only should you be allowed
to declare the state of each version, you should also be able to pick a
version for normally scheduled runs that is not necessarily the most recent
(for example the most recent version tagged as prod), while also running
other versions adhoc, such as the draft version that may have just been
deployed. Like Kaxil said, this will be covered by AIP-66.

On Tue, May 28, 2024 at 5:52 AM Kaxil Naik  wrote:

> Yes to both the below questions @Elad Kalif . The
> upcoming Data-Awareness AIPs the first one and the 2nd should be covered by
> AIP-66 once it is out of draft.
>
> 1. Should datasets be also versioned?
> > 2. Should we support executing more than 1 DAG version at a given time?
>
>
> On Tue, 28 May 2024 at 10:07, Elad Kalif  wrote:
>
> > I have a general question about (maybe somehow related to the DAG Bundle
> > concept introduced in the AIPs)
> > The way I see it DAGs are tightly coupled with Datasets. Tasks take
> > dependency on dataset or/and produce a dataset.
> > We are focused on the versions of the code (DAG) but to make this play
> > nicely we should consider also applying versions to datasets.
> > Granted not every change to DAG code means change in dataset version but
> we
> > should consider if we want to leave datasets versionless.
> >
> > I previously worked with some data products that allow versioning of
> tables
> > and it was really nice! It enabled the concept of Data Contract (treating
> > tables much like you treat API) and it made things much easier.
> > I sometimes even had two versions of the same workflow running one for
> the
> > new version and one for the deprecated version thus allowing my customers
> > the flexibility to migrate between the table versions before the
> deprecated
> > version is discontinued.
> >
> > I am raising two main questions here:
> > 1. Should datasets be also versioned?
> > 2. Should we support executing more than 1 DAG version at a given time?
> > (allow user to declare Draft/Production/Deprecated/Deleted) state for
> each
> > version.
> >
> > On Wed, Mar 6, 2024 at 1:58 AM Jed Cunningham 
> > wrote:
> >
> > > Hello everyone!
> > >
> > > I'm excited to start a discussion around DAG Versioning in Airflow.
> It's
> > > been the most requested feature in the last 3 community surveys!
> > >
> > > AIP-63: DAG Versioning
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-63%3A+DAG+Versioning
> > > >
> > >
> > > As this topic quickly becomes rather large, I've made AIP-63 an
> umbrella
> > > AIP and split the specifics into separate AIPs:
> > >
> > > AIP-64: Keep TaskInstance try history
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-64%3A+Keep+TaskInstance+try+history
> > > >
> > > AIP-65: Improve DAG history in UI
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-65%3A+Improve+DAG+history+in+UI
> > > >
> > > [WIP] AIP-66: Execution of specific DAG code versions
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-66%3A+Execution+of+specific+DAG+versions
> > > >
> > >
> > > AIP-64 and AIP-65 are ready to be discussed in depth, while AIP-66 is
> > there
> > > to provide an intentionally high level vision of what we may want to
> > tackle
> > > before Airflow's "DAG versioning" story is complete.
> > >
> > > Thanks,
> > > Jed
> > >
> >
>


Re: [DISCUSS] AIP-63, AIP-64, and AIP-65: DAG Versioning

2024-05-28 Thread Kaxil Naik
Yes to both the below questions @Elad Kalif . The
upcoming Data-Awareness AIPs the first one and the 2nd should be covered by
AIP-66 once it is out of draft.

1. Should datasets be also versioned?
> 2. Should we support executing more than 1 DAG version at a given time?


On Tue, 28 May 2024 at 10:07, Elad Kalif  wrote:

> I have a general question about (maybe somehow related to the DAG Bundle
> concept introduced in the AIPs)
> The way I see it DAGs are tightly coupled with Datasets. Tasks take
> dependency on dataset or/and produce a dataset.
> We are focused on the versions of the code (DAG) but to make this play
> nicely we should consider also applying versions to datasets.
> Granted not every change to DAG code means change in dataset version but we
> should consider if we want to leave datasets versionless.
>
> I previously worked with some data products that allow versioning of tables
> and it was really nice! It enabled the concept of Data Contract (treating
> tables much like you treat API) and it made things much easier.
> I sometimes even had two versions of the same workflow running one for the
> new version and one for the deprecated version thus allowing my customers
> the flexibility to migrate between the table versions before the deprecated
> version is discontinued.
>
> I am raising two main questions here:
> 1. Should datasets be also versioned?
> 2. Should we support executing more than 1 DAG version at a given time?
> (allow user to declare Draft/Production/Deprecated/Deleted) state for each
> version.
>
> On Wed, Mar 6, 2024 at 1:58 AM Jed Cunningham 
> wrote:
>
> > Hello everyone!
> >
> > I'm excited to start a discussion around DAG Versioning in Airflow. It's
> > been the most requested feature in the last 3 community surveys!
> >
> > AIP-63: DAG Versioning
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-63%3A+DAG+Versioning
> > >
> >
> > As this topic quickly becomes rather large, I've made AIP-63 an umbrella
> > AIP and split the specifics into separate AIPs:
> >
> > AIP-64: Keep TaskInstance try history
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-64%3A+Keep+TaskInstance+try+history
> > >
> > AIP-65: Improve DAG history in UI
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-65%3A+Improve+DAG+history+in+UI
> > >
> > [WIP] AIP-66: Execution of specific DAG code versions
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-66%3A+Execution+of+specific+DAG+versions
> > >
> >
> > AIP-64 and AIP-65 are ready to be discussed in depth, while AIP-66 is
> there
> > to provide an intentionally high level vision of what we may want to
> tackle
> > before Airflow's "DAG versioning" story is complete.
> >
> > Thanks,
> > Jed
> >
>


Re: [DISCUSS] AIP-63, AIP-64, and AIP-65: DAG Versioning

2024-05-28 Thread Elad Kalif
I have a general question about (maybe somehow related to the DAG Bundle
concept introduced in the AIPs)
The way I see it DAGs are tightly coupled with Datasets. Tasks take
dependency on dataset or/and produce a dataset.
We are focused on the versions of the code (DAG) but to make this play
nicely we should consider also applying versions to datasets.
Granted not every change to DAG code means change in dataset version but we
should consider if we want to leave datasets versionless.

I previously worked with some data products that allow versioning of tables
and it was really nice! It enabled the concept of Data Contract (treating
tables much like you treat API) and it made things much easier.
I sometimes even had two versions of the same workflow running one for the
new version and one for the deprecated version thus allowing my customers
the flexibility to migrate between the table versions before the deprecated
version is discontinued.

I am raising two main questions here:
1. Should datasets be also versioned?
2. Should we support executing more than 1 DAG version at a given time?
(allow user to declare Draft/Production/Deprecated/Deleted) state for each
version.

On Wed, Mar 6, 2024 at 1:58 AM Jed Cunningham 
wrote:

> Hello everyone!
>
> I'm excited to start a discussion around DAG Versioning in Airflow. It's
> been the most requested feature in the last 3 community surveys!
>
> AIP-63: DAG Versioning
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-63%3A+DAG+Versioning
> >
>
> As this topic quickly becomes rather large, I've made AIP-63 an umbrella
> AIP and split the specifics into separate AIPs:
>
> AIP-64: Keep TaskInstance try history
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-64%3A+Keep+TaskInstance+try+history
> >
> AIP-65: Improve DAG history in UI
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-65%3A+Improve+DAG+history+in+UI
> >
> [WIP] AIP-66: Execution of specific DAG code versions
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-66%3A+Execution+of+specific+DAG+versions
> >
>
> AIP-64 and AIP-65 are ready to be discussed in depth, while AIP-66 is there
> to provide an intentionally high level vision of what we may want to tackle
> before Airflow's "DAG versioning" story is complete.
>
> Thanks,
> Jed
>


[DISCUSS] AIP-63, AIP-64, and AIP-65: DAG Versioning

2024-03-05 Thread Jed Cunningham
Hello everyone!

I'm excited to start a discussion around DAG Versioning in Airflow. It's
been the most requested feature in the last 3 community surveys!

AIP-63: DAG Versioning


As this topic quickly becomes rather large, I've made AIP-63 an umbrella
AIP and split the specifics into separate AIPs:

AIP-64: Keep TaskInstance try history

AIP-65: Improve DAG history in UI

[WIP] AIP-66: Execution of specific DAG code versions


AIP-64 and AIP-65 are ready to be discussed in depth, while AIP-66 is there
to provide an intentionally high level vision of what we may want to tackle
before Airflow's "DAG versioning" story is complete.

Thanks,
Jed