Agreed.  When Jed and team wrote the AIP, we intentionally limited the
scope to DAGs since the AIPs were already really large, but the intention
is to extend the concept to datasets.

Funny that you bring up point #2. A few of us met last week to talk about
DAG Versioning, and that use-case came up. Not only should you be allowed
to declare the state of each version, you should also be able to pick a
version for normally scheduled runs that is not necessarily the most recent
(for example the most recent version tagged as prod), while also running
other versions adhoc, such as the draft version that may have just been
deployed. Like Kaxil said, this will be covered by AIP-66.

On Tue, May 28, 2024 at 5:52 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Yes to both the below questions @Elad Kalif <elad...@apache.org>. The
> upcoming Data-Awareness AIPs the first one and the 2nd should be covered by
> AIP-66 once it is out of draft.
>
> 1. Should datasets be also versioned?
> > 2. Should we support executing more than 1 DAG version at a given time?
>
>
> On Tue, 28 May 2024 at 10:07, Elad Kalif <elad...@apache.org> wrote:
>
> > I have a general question about (maybe somehow related to the DAG Bundle
> > concept introduced in the AIPs)
> > The way I see it DAGs are tightly coupled with Datasets. Tasks take
> > dependency on dataset or/and produce a dataset.
> > We are focused on the versions of the code (DAG) but to make this play
> > nicely we should consider also applying versions to datasets.
> > Granted not every change to DAG code means change in dataset version but
> we
> > should consider if we want to leave datasets versionless.
> >
> > I previously worked with some data products that allow versioning of
> tables
> > and it was really nice! It enabled the concept of Data Contract (treating
> > tables much like you treat API) and it made things much easier.
> > I sometimes even had two versions of the same workflow running one for
> the
> > new version and one for the deprecated version thus allowing my customers
> > the flexibility to migrate between the table versions before the
> deprecated
> > version is discontinued.
> >
> > I am raising two main questions here:
> > 1. Should datasets be also versioned?
> > 2. Should we support executing more than 1 DAG version at a given time?
> > (allow user to declare Draft/Production/Deprecated/Deleted) state for
> each
> > version.
> >
> > On Wed, Mar 6, 2024 at 1:58 AM Jed Cunningham <jedcunning...@apache.org>
> > wrote:
> >
> > > Hello everyone!
> > >
> > > I'm excited to start a discussion around DAG Versioning in Airflow.
> It's
> > > been the most requested feature in the last 3 community surveys!
> > >
> > > AIP-63: DAG Versioning
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-63%3A+DAG+Versioning
> > > >
> > >
> > > As this topic quickly becomes rather large, I've made AIP-63 an
> umbrella
> > > AIP and split the specifics into separate AIPs:
> > >
> > > AIP-64: Keep TaskInstance try history
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-64%3A+Keep+TaskInstance+try+history
> > > >
> > > AIP-65: Improve DAG history in UI
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-65%3A+Improve+DAG+history+in+UI
> > > >
> > > [WIP] AIP-66: Execution of specific DAG code versions
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-66%3A+Execution+of+specific+DAG+versions
> > > >
> > >
> > > AIP-64 and AIP-65 are ready to be discussed in depth, while AIP-66 is
> > there
> > > to provide an intentionally high level vision of what we may want to
> > tackle
> > > before Airflow's "DAG versioning" story is complete.
> > >
> > > Thanks,
> > > Jed
> > >
> >
>

Reply via email to