Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-23 Thread Elad Kalif
Hi everyone,
great discussion.

I talked with several community members and I think that we should have a
non technical discussion thread before we proceed.
To give an example:
We have several actors: Cluster Admin, Dag Author, UI user. In Airflow 2
perspective all these actors are the same user.
For example: If a user wants advanced scheduling, we will tell him that
Airflow has a solution for this - Timetables. However to deploy timetable
the Dag Author must interact with the Cluster Admin to deploy the plugin of
the timetable. These two actors can be two different persons with two
different agendas/road maps/priorities etc..
This is an example of something that we consider as solved - but is it
really solved?

I think before we dive deep into technical stuff we should first have a
vision of who the Airflow user(s), and what their needs/requirements are.
It should give us better direction, clarity and confidence that we are
developing the right features.
If others think this is a good idea we can maybe form a small team that
works on this doc and share it to be reviewed with the community (happy to
participate on this)

On Mon, May 20, 2024 at 7:02 AM Amogh Desai 
wrote:

> I agree with Andrey too on this.
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, May 17, 2024 at 7:42 PM Kaxil Naik  wrote:
>
> > Agreed on your points @andrey.ans...@taragol.is <
> andrey.ans...@taragol.is>
> >
> > On Fri, 17 May 2024 at 15:01, Andrey Anshin 
> > wrote:
> >
> > > IMHO, In case if we decide to keep only Postgres support we need to
> have
> > > really powerful arguments to provide an interface which helps integrate
> > > with other DBs.
> > >
> > > In this case, we must clearly understand what the community is
> > responsible
> > > for in this case and how it can be sure that nothing is broken
> > >
> > > Especially if we take in account the Airflow has very tight
> integrations
> > > with specific Databases, requires a lot of effort to support additional
> > > ones (MS SQL case), and the DB part is not a part of the Public
> Interface
> > > of Airflow [1].
> > >
> > > So I would consider that this should be two separate decisions:
> > > 1. Keep only Postgres (vanila, not forks) as supported/tested backend
> in
> > > Production. SQLite remains as development DB.
> > > 2. Provide public interface to DB integrations between Airflow and DB
> for
> > > third parties
> > >
> > > [1]:
> > >
> > >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html
> > >
> > >
> > > On Tue, 14 May 2024 at 12:15, Kaxil Naik  wrote:
> > >
> > > > Yeah, that works for me
> > > >
> > > > Can we have it possible to have two (or maybe three -
> > > > > like a sub-committee) co-owners of topics?
> > > >
> > > >
> > > > On Tue, 14 May 2024 at 06:15, Vikram Koka
>  > >
> > > > wrote:
> > > >
> > > > > Definitely a fast moving thread on the mailing list. I haven’t been
> > > able
> > > > to
> > > > > respond for a few days and feel very far behind already.
> > > > >
> > > > > A few comments on topics discussed the last few days:
> > > > > - Jarek, in response to your comments around being more aggressive
> > than
> > > > in
> > > > > Airflow 2 about deprecation and drops of functionality, I am very
> > > > > supportive of that stance. I completely agree that we could have
> been
> > > > more
> > > > > aggressive as part of Airflow 2.
> > > > > However, I would like to ask that as we go forward, we make sure
> that
> > > we
> > > > > have clean interfaces to be able to add support, even if we choose
> a
> > > > single
> > > > > implementation. For example, with respect to dropping MySQL
> support.
> > I
> > > > can
> > > > > understand the perspective of the project that this should be
> > > deprecated
> > > > > from an Airflow OSS perspective. However, even if the only OSS
> > > supported
> > > > DB
> > > > > is Postgres, I would like to ensure that a clean interface exists
> for
> > > > > interaction with the DB, so that other databases such as MySQL or
> > > others
> > > > > CAN be supported by a third party or at a later date.
> > > > > I realize that this may seem onerous, but I believe that it enables
> > us
> > > to
> > > > > be more flexible in the long run, rather than locking us into a
> > single
> > > DB
> > > > > implementation.
> > > > >
> > > > > - Bolke, Daniel Standish, Ash, et al on the task execution
> contract,
> > > > > definitely looking forward to this.
> > > > >
> > > > > - To those that I proposed a couple of more detailed write ups, I
> > still
> > > > > plan to do that, at the latest by early next week.
> > > > >
> > > > > Vikram
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk 
> > wrote:
> > > > >
> > > > > > Super-excited about that.
> > > > > >
> > > > > > Question/Proposal: Can we have it possible to have two (or maybe
> > > three
> > > > -
> > > > > > like a sub-committee) co-owners of topics? I think it's a lot to
> > put
> > > on
> > > > > > one's 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-19 Thread Amogh Desai
I agree with Andrey too on this.
Thanks & Regards,
Amogh Desai


On Fri, May 17, 2024 at 7:42 PM Kaxil Naik  wrote:

> Agreed on your points @andrey.ans...@taragol.is 
>
> On Fri, 17 May 2024 at 15:01, Andrey Anshin 
> wrote:
>
> > IMHO, In case if we decide to keep only Postgres support we need to have
> > really powerful arguments to provide an interface which helps integrate
> > with other DBs.
> >
> > In this case, we must clearly understand what the community is
> responsible
> > for in this case and how it can be sure that nothing is broken
> >
> > Especially if we take in account the Airflow has very tight integrations
> > with specific Databases, requires a lot of effort to support additional
> > ones (MS SQL case), and the DB part is not a part of the Public Interface
> > of Airflow [1].
> >
> > So I would consider that this should be two separate decisions:
> > 1. Keep only Postgres (vanila, not forks) as supported/tested backend in
> > Production. SQLite remains as development DB.
> > 2. Provide public interface to DB integrations between Airflow and DB for
> > third parties
> >
> > [1]:
> >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html
> >
> >
> > On Tue, 14 May 2024 at 12:15, Kaxil Naik  wrote:
> >
> > > Yeah, that works for me
> > >
> > > Can we have it possible to have two (or maybe three -
> > > > like a sub-committee) co-owners of topics?
> > >
> > >
> > > On Tue, 14 May 2024 at 06:15, Vikram Koka  >
> > > wrote:
> > >
> > > > Definitely a fast moving thread on the mailing list. I haven’t been
> > able
> > > to
> > > > respond for a few days and feel very far behind already.
> > > >
> > > > A few comments on topics discussed the last few days:
> > > > - Jarek, in response to your comments around being more aggressive
> than
> > > in
> > > > Airflow 2 about deprecation and drops of functionality, I am very
> > > > supportive of that stance. I completely agree that we could have been
> > > more
> > > > aggressive as part of Airflow 2.
> > > > However, I would like to ask that as we go forward, we make sure that
> > we
> > > > have clean interfaces to be able to add support, even if we choose a
> > > single
> > > > implementation. For example, with respect to dropping MySQL support.
> I
> > > can
> > > > understand the perspective of the project that this should be
> > deprecated
> > > > from an Airflow OSS perspective. However, even if the only OSS
> > supported
> > > DB
> > > > is Postgres, I would like to ensure that a clean interface exists for
> > > > interaction with the DB, so that other databases such as MySQL or
> > others
> > > > CAN be supported by a third party or at a later date.
> > > > I realize that this may seem onerous, but I believe that it enables
> us
> > to
> > > > be more flexible in the long run, rather than locking us into a
> single
> > DB
> > > > implementation.
> > > >
> > > > - Bolke, Daniel Standish, Ash, et al on the task execution contract,
> > > > definitely looking forward to this.
> > > >
> > > > - To those that I proposed a couple of more detailed write ups, I
> still
> > > > plan to do that, at the latest by early next week.
> > > >
> > > > Vikram
> > > >
> > > >
> > > >
> > > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk 
> wrote:
> > > >
> > > > > Super-excited about that.
> > > > >
> > > > > Question/Proposal: Can we have it possible to have two (or maybe
> > three
> > > -
> > > > > like a sub-committee) co-owners of topics? I think it's a lot to
> put
> > on
> > > > > one's head to "own" a topic and given circumstances/ volunteer time
> > of
> > > > > people, interruptions (and life intervening), it might be a bit
> risky
> > > to
> > > > > put it on one's shoulders only.
> > > > >
> > > > > I know it's against the rule ("if it is owned by many, it's not
> owned
> > > by
> > > > > anyone") - but I think in our case there are at least some topics
> > that
> > > > > could benefit from having more than one owner. Especially when we
> > know
> > > > and
> > > > > trust that we can work together on some topics that we are
> passionate
> > > > > about. It might also encourage getting out of people's comfort
> zones.
> > > > >
> > > > > For example - I'd absolutely love to volunteer to co-own the
> > > "streamline
> > > > > the development" with Andrey if he would be willing to of course :D
> > > > (sorry
> > > > > Andrey for "volunteering you" on that one :D) - and maybe we could
> > get
> > > > > someone else to join us.
> > > > >
> > > > > That might have the added benefit of being able to break with the
> way
> > > > > we've been doing things. If I am owning it for one - I'd likely
> > > gravitate
> > > > > towards past choices, but with others joining me and taking
> decisions
> > > > (and
> > > > > responsibility in making sure we implement them) together, we could
> > > make
> > > > > better decisions and reduce bus factor for dev tooling/ CI in the
> > > future.
> > > > >
> > > > > BTW.  Shameless 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-17 Thread Kaxil Naik
Agreed on your points @andrey.ans...@taragol.is 

On Fri, 17 May 2024 at 15:01, Andrey Anshin 
wrote:

> IMHO, In case if we decide to keep only Postgres support we need to have
> really powerful arguments to provide an interface which helps integrate
> with other DBs.
>
> In this case, we must clearly understand what the community is responsible
> for in this case and how it can be sure that nothing is broken
>
> Especially if we take in account the Airflow has very tight integrations
> with specific Databases, requires a lot of effort to support additional
> ones (MS SQL case), and the DB part is not a part of the Public Interface
> of Airflow [1].
>
> So I would consider that this should be two separate decisions:
> 1. Keep only Postgres (vanila, not forks) as supported/tested backend in
> Production. SQLite remains as development DB.
> 2. Provide public interface to DB integrations between Airflow and DB for
> third parties
>
> [1]:
>
> https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html
>
>
> On Tue, 14 May 2024 at 12:15, Kaxil Naik  wrote:
>
> > Yeah, that works for me
> >
> > Can we have it possible to have two (or maybe three -
> > > like a sub-committee) co-owners of topics?
> >
> >
> > On Tue, 14 May 2024 at 06:15, Vikram Koka 
> > wrote:
> >
> > > Definitely a fast moving thread on the mailing list. I haven’t been
> able
> > to
> > > respond for a few days and feel very far behind already.
> > >
> > > A few comments on topics discussed the last few days:
> > > - Jarek, in response to your comments around being more aggressive than
> > in
> > > Airflow 2 about deprecation and drops of functionality, I am very
> > > supportive of that stance. I completely agree that we could have been
> > more
> > > aggressive as part of Airflow 2.
> > > However, I would like to ask that as we go forward, we make sure that
> we
> > > have clean interfaces to be able to add support, even if we choose a
> > single
> > > implementation. For example, with respect to dropping MySQL support. I
> > can
> > > understand the perspective of the project that this should be
> deprecated
> > > from an Airflow OSS perspective. However, even if the only OSS
> supported
> > DB
> > > is Postgres, I would like to ensure that a clean interface exists for
> > > interaction with the DB, so that other databases such as MySQL or
> others
> > > CAN be supported by a third party or at a later date.
> > > I realize that this may seem onerous, but I believe that it enables us
> to
> > > be more flexible in the long run, rather than locking us into a single
> DB
> > > implementation.
> > >
> > > - Bolke, Daniel Standish, Ash, et al on the task execution contract,
> > > definitely looking forward to this.
> > >
> > > - To those that I proposed a couple of more detailed write ups, I still
> > > plan to do that, at the latest by early next week.
> > >
> > > Vikram
> > >
> > >
> > >
> > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk  wrote:
> > >
> > > > Super-excited about that.
> > > >
> > > > Question/Proposal: Can we have it possible to have two (or maybe
> three
> > -
> > > > like a sub-committee) co-owners of topics? I think it's a lot to put
> on
> > > > one's head to "own" a topic and given circumstances/ volunteer time
> of
> > > > people, interruptions (and life intervening), it might be a bit risky
> > to
> > > > put it on one's shoulders only.
> > > >
> > > > I know it's against the rule ("if it is owned by many, it's not owned
> > by
> > > > anyone") - but I think in our case there are at least some topics
> that
> > > > could benefit from having more than one owner. Especially when we
> know
> > > and
> > > > trust that we can work together on some topics that we are passionate
> > > > about. It might also encourage getting out of people's comfort zones.
> > > >
> > > > For example - I'd absolutely love to volunteer to co-own the
> > "streamline
> > > > the development" with Andrey if he would be willing to of course :D
> > > (sorry
> > > > Andrey for "volunteering you" on that one :D) - and maybe we could
> get
> > > > someone else to join us.
> > > >
> > > > That might have the added benefit of being able to break with the way
> > > > we've been doing things. If I am owning it for one - I'd likely
> > gravitate
> > > > towards past choices, but with others joining me and taking decisions
> > > (and
> > > > responsibility in making sure we implement them) together, we could
> > make
> > > > better decisions and reduce bus factor for dev tooling/ CI in the
> > future.
> > > >
> > > > BTW.  Shameless promotion: tomorrow I am giving a talk about that
> very
> > > > topic (in the context of last few years not yet Airflow 3.0) at the
> NY
> > > > meetup hosted at Astronomer NY headquarters
> > > > https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ -
> > so
> > > if
> > > > you are in NY or around - I think you can stil sign up :D. I am also
> > > > getting to PyCon US in Pittsburgh next 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-17 Thread Andrey Anshin
IMHO, In case if we decide to keep only Postgres support we need to have
really powerful arguments to provide an interface which helps integrate
with other DBs.

In this case, we must clearly understand what the community is responsible
for in this case and how it can be sure that nothing is broken

Especially if we take in account the Airflow has very tight integrations
with specific Databases, requires a lot of effort to support additional
ones (MS SQL case), and the DB part is not a part of the Public Interface
of Airflow [1].

So I would consider that this should be two separate decisions:
1. Keep only Postgres (vanila, not forks) as supported/tested backend in
Production. SQLite remains as development DB.
2. Provide public interface to DB integrations between Airflow and DB for
third parties

[1]:
https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html


On Tue, 14 May 2024 at 12:15, Kaxil Naik  wrote:

> Yeah, that works for me
>
> Can we have it possible to have two (or maybe three -
> > like a sub-committee) co-owners of topics?
>
>
> On Tue, 14 May 2024 at 06:15, Vikram Koka 
> wrote:
>
> > Definitely a fast moving thread on the mailing list. I haven’t been able
> to
> > respond for a few days and feel very far behind already.
> >
> > A few comments on topics discussed the last few days:
> > - Jarek, in response to your comments around being more aggressive than
> in
> > Airflow 2 about deprecation and drops of functionality, I am very
> > supportive of that stance. I completely agree that we could have been
> more
> > aggressive as part of Airflow 2.
> > However, I would like to ask that as we go forward, we make sure that we
> > have clean interfaces to be able to add support, even if we choose a
> single
> > implementation. For example, with respect to dropping MySQL support. I
> can
> > understand the perspective of the project that this should be deprecated
> > from an Airflow OSS perspective. However, even if the only OSS supported
> DB
> > is Postgres, I would like to ensure that a clean interface exists for
> > interaction with the DB, so that other databases such as MySQL or others
> > CAN be supported by a third party or at a later date.
> > I realize that this may seem onerous, but I believe that it enables us to
> > be more flexible in the long run, rather than locking us into a single DB
> > implementation.
> >
> > - Bolke, Daniel Standish, Ash, et al on the task execution contract,
> > definitely looking forward to this.
> >
> > - To those that I proposed a couple of more detailed write ups, I still
> > plan to do that, at the latest by early next week.
> >
> > Vikram
> >
> >
> >
> > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk  wrote:
> >
> > > Super-excited about that.
> > >
> > > Question/Proposal: Can we have it possible to have two (or maybe three
> -
> > > like a sub-committee) co-owners of topics? I think it's a lot to put on
> > > one's head to "own" a topic and given circumstances/ volunteer time of
> > > people, interruptions (and life intervening), it might be a bit risky
> to
> > > put it on one's shoulders only.
> > >
> > > I know it's against the rule ("if it is owned by many, it's not owned
> by
> > > anyone") - but I think in our case there are at least some topics that
> > > could benefit from having more than one owner. Especially when we know
> > and
> > > trust that we can work together on some topics that we are passionate
> > > about. It might also encourage getting out of people's comfort zones.
> > >
> > > For example - I'd absolutely love to volunteer to co-own the
> "streamline
> > > the development" with Andrey if he would be willing to of course :D
> > (sorry
> > > Andrey for "volunteering you" on that one :D) - and maybe we could get
> > > someone else to join us.
> > >
> > > That might have the added benefit of being able to break with the way
> > > we've been doing things. If I am owning it for one - I'd likely
> gravitate
> > > towards past choices, but with others joining me and taking decisions
> > (and
> > > responsibility in making sure we implement them) together, we could
> make
> > > better decisions and reduce bus factor for dev tooling/ CI in the
> future.
> > >
> > > BTW.  Shameless promotion: tomorrow I am giving a talk about that very
> > > topic (in the context of last few years not yet Airflow 3.0) at the NY
> > > meetup hosted at Astronomer NY headquarters
> > > https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ -
> so
> > if
> > > you are in NY or around - I think you can stil sign up :D. I am also
> > > getting to PyCon US in Pittsburgh next week so don't expect too much
> from
> > > me. I will be gearing up for streamlining the development by talking to
> > the
> > > right people and listening to the latest things and best practices of
> the
> > > larger Python community :).
> > >
> > > J.
> > >
> > > On Tue, May 14, 2024 at 12:03 AM Kaxil Naik 
> wrote:
> > >
> > > > Thank you all, I am 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-14 Thread Kaxil Naik
Yeah, that works for me

Can we have it possible to have two (or maybe three -
> like a sub-committee) co-owners of topics?


On Tue, 14 May 2024 at 06:15, Vikram Koka 
wrote:

> Definitely a fast moving thread on the mailing list. I haven’t been able to
> respond for a few days and feel very far behind already.
>
> A few comments on topics discussed the last few days:
> - Jarek, in response to your comments around being more aggressive than in
> Airflow 2 about deprecation and drops of functionality, I am very
> supportive of that stance. I completely agree that we could have been more
> aggressive as part of Airflow 2.
> However, I would like to ask that as we go forward, we make sure that we
> have clean interfaces to be able to add support, even if we choose a single
> implementation. For example, with respect to dropping MySQL support. I can
> understand the perspective of the project that this should be deprecated
> from an Airflow OSS perspective. However, even if the only OSS supported DB
> is Postgres, I would like to ensure that a clean interface exists for
> interaction with the DB, so that other databases such as MySQL or others
> CAN be supported by a third party or at a later date.
> I realize that this may seem onerous, but I believe that it enables us to
> be more flexible in the long run, rather than locking us into a single DB
> implementation.
>
> - Bolke, Daniel Standish, Ash, et al on the task execution contract,
> definitely looking forward to this.
>
> - To those that I proposed a couple of more detailed write ups, I still
> plan to do that, at the latest by early next week.
>
> Vikram
>
>
>
> On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk  wrote:
>
> > Super-excited about that.
> >
> > Question/Proposal: Can we have it possible to have two (or maybe three -
> > like a sub-committee) co-owners of topics? I think it's a lot to put on
> > one's head to "own" a topic and given circumstances/ volunteer time of
> > people, interruptions (and life intervening), it might be a bit risky to
> > put it on one's shoulders only.
> >
> > I know it's against the rule ("if it is owned by many, it's not owned by
> > anyone") - but I think in our case there are at least some topics that
> > could benefit from having more than one owner. Especially when we know
> and
> > trust that we can work together on some topics that we are passionate
> > about. It might also encourage getting out of people's comfort zones.
> >
> > For example - I'd absolutely love to volunteer to co-own the "streamline
> > the development" with Andrey if he would be willing to of course :D
> (sorry
> > Andrey for "volunteering you" on that one :D) - and maybe we could get
> > someone else to join us.
> >
> > That might have the added benefit of being able to break with the way
> > we've been doing things. If I am owning it for one - I'd likely gravitate
> > towards past choices, but with others joining me and taking decisions
> (and
> > responsibility in making sure we implement them) together, we could make
> > better decisions and reduce bus factor for dev tooling/ CI in the future.
> >
> > BTW.  Shameless promotion: tomorrow I am giving a talk about that very
> > topic (in the context of last few years not yet Airflow 3.0) at the NY
> > meetup hosted at Astronomer NY headquarters
> > https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ - so
> if
> > you are in NY or around - I think you can stil sign up :D. I am also
> > getting to PyCon US in Pittsburgh next week so don't expect too much from
> > me. I will be gearing up for streamlining the development by talking to
> the
> > right people and listening to the latest things and best practices of the
> > larger Python community :).
> >
> > J.
> >
> > On Tue, May 14, 2024 at 12:03 AM Kaxil Naik  wrote:
> >
> > > Thank you all, I am very happy about the discussions.
> > >
> > > The mailing list moves fast :). The main reason I recommended starting
> > the
> > > dev calls in early June was to have some of these discussions on the
> > > mailing list.
> > >
> > > Since Michal already scheduled a call, let's start there to discuss
> > > various ideas. For the week after that, I have created an Airflow
> 2-style
> > > recurring open dev calls for anyone to join, info below:
> > >
> > > *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM
> > > GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST*
> and
> > > then
> > > *One-time registration Link*:
> > >
> > >
> >
> https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
> > > *Add to your calendar*:
> > >
> > >
> >
> https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add
> > >
> > > I will post the meeting notes on the dev mailing list as well as
> > Confluence
> > > for archival purposes (example
> > > ).
> > >
> > > Once we discuss various proposals next 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Vikram Koka
Definitely a fast moving thread on the mailing list. I haven’t been able to
respond for a few days and feel very far behind already.

A few comments on topics discussed the last few days:
- Jarek, in response to your comments around being more aggressive than in
Airflow 2 about deprecation and drops of functionality, I am very
supportive of that stance. I completely agree that we could have been more
aggressive as part of Airflow 2.
However, I would like to ask that as we go forward, we make sure that we
have clean interfaces to be able to add support, even if we choose a single
implementation. For example, with respect to dropping MySQL support. I can
understand the perspective of the project that this should be deprecated
from an Airflow OSS perspective. However, even if the only OSS supported DB
is Postgres, I would like to ensure that a clean interface exists for
interaction with the DB, so that other databases such as MySQL or others
CAN be supported by a third party or at a later date.
I realize that this may seem onerous, but I believe that it enables us to
be more flexible in the long run, rather than locking us into a single DB
implementation.

- Bolke, Daniel Standish, Ash, et al on the task execution contract,
definitely looking forward to this.

- To those that I proposed a couple of more detailed write ups, I still
plan to do that, at the latest by early next week.

Vikram



On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk  wrote:

> Super-excited about that.
>
> Question/Proposal: Can we have it possible to have two (or maybe three -
> like a sub-committee) co-owners of topics? I think it's a lot to put on
> one's head to "own" a topic and given circumstances/ volunteer time of
> people, interruptions (and life intervening), it might be a bit risky to
> put it on one's shoulders only.
>
> I know it's against the rule ("if it is owned by many, it's not owned by
> anyone") - but I think in our case there are at least some topics that
> could benefit from having more than one owner. Especially when we know and
> trust that we can work together on some topics that we are passionate
> about. It might also encourage getting out of people's comfort zones.
>
> For example - I'd absolutely love to volunteer to co-own the "streamline
> the development" with Andrey if he would be willing to of course :D (sorry
> Andrey for "volunteering you" on that one :D) - and maybe we could get
> someone else to join us.
>
> That might have the added benefit of being able to break with the way
> we've been doing things. If I am owning it for one - I'd likely gravitate
> towards past choices, but with others joining me and taking decisions (and
> responsibility in making sure we implement them) together, we could make
> better decisions and reduce bus factor for dev tooling/ CI in the future.
>
> BTW.  Shameless promotion: tomorrow I am giving a talk about that very
> topic (in the context of last few years not yet Airflow 3.0) at the NY
> meetup hosted at Astronomer NY headquarters
> https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ - so if
> you are in NY or around - I think you can stil sign up :D. I am also
> getting to PyCon US in Pittsburgh next week so don't expect too much from
> me. I will be gearing up for streamlining the development by talking to the
> right people and listening to the latest things and best practices of the
> larger Python community :).
>
> J.
>
> On Tue, May 14, 2024 at 12:03 AM Kaxil Naik  wrote:
>
> > Thank you all, I am very happy about the discussions.
> >
> > The mailing list moves fast :). The main reason I recommended starting
> the
> > dev calls in early June was to have some of these discussions on the
> > mailing list.
> >
> > Since Michal already scheduled a call, let's start there to discuss
> > various ideas. For the week after that, I have created an Airflow 2-style
> > recurring open dev calls for anyone to join, info below:
> >
> > *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM
> > GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* and
> > then
> > *One-time registration Link*:
> >
> >
> https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
> > *Add to your calendar*:
> >
> >
> https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add
> >
> > I will post the meeting notes on the dev mailing list as well as
> Confluence
> > for archival purposes (example
> > ).
> >
> > Once we discuss various proposals next week, I recommend that for each
> > "workstream", we have an owner who would want to lead that workstream.
> For
> > items, that does not have an owner we can put those into Airflow 3 Meta
> > issue  or cross-link
> over
> > there so someone in the community can take it on. If we don't have an
> owner
> > who will commit to working on it, 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Jarek Potiuk
Super-excited about that.

Question/Proposal: Can we have it possible to have two (or maybe three -
like a sub-committee) co-owners of topics? I think it's a lot to put on
one's head to "own" a topic and given circumstances/ volunteer time of
people, interruptions (and life intervening), it might be a bit risky to
put it on one's shoulders only.

I know it's against the rule ("if it is owned by many, it's not owned by
anyone") - but I think in our case there are at least some topics that
could benefit from having more than one owner. Especially when we know and
trust that we can work together on some topics that we are passionate
about. It might also encourage getting out of people's comfort zones.

For example - I'd absolutely love to volunteer to co-own the "streamline
the development" with Andrey if he would be willing to of course :D (sorry
Andrey for "volunteering you" on that one :D) - and maybe we could get
someone else to join us.

That might have the added benefit of being able to break with the way
we've been doing things. If I am owning it for one - I'd likely gravitate
towards past choices, but with others joining me and taking decisions (and
responsibility in making sure we implement them) together, we could make
better decisions and reduce bus factor for dev tooling/ CI in the future.

BTW.  Shameless promotion: tomorrow I am giving a talk about that very
topic (in the context of last few years not yet Airflow 3.0) at the NY
meetup hosted at Astronomer NY headquarters
https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ - so if
you are in NY or around - I think you can stil sign up :D. I am also
getting to PyCon US in Pittsburgh next week so don't expect too much from
me. I will be gearing up for streamlining the development by talking to the
right people and listening to the latest things and best practices of the
larger Python community :).

J.

On Tue, May 14, 2024 at 12:03 AM Kaxil Naik  wrote:

> Thank you all, I am very happy about the discussions.
>
> The mailing list moves fast :). The main reason I recommended starting the
> dev calls in early June was to have some of these discussions on the
> mailing list.
>
> Since Michal already scheduled a call, let's start there to discuss
> various ideas. For the week after that, I have created an Airflow 2-style
> recurring open dev calls for anyone to join, info below:
>
> *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM
> GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* and
> then
> *One-time registration Link*:
>
> https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
> *Add to your calendar*:
>
> https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add
>
> I will post the meeting notes on the dev mailing list as well as Confluence
> for archival purposes (example
> ).
>
> Once we discuss various proposals next week, I recommend that for each
> "workstream", we have an owner who would want to lead that workstream. For
> items, that does not have an owner we can put those into Airflow 3 Meta
> issue  or cross-link over
> there so someone in the community can take it on. If we don't have an owner
> who will commit to working on it, we park that item until we find the
> owner.
>
> At the end of each call, I would solicit ideas for the agenda for the next
> call and propose it to the broader group on the mailing list.
>
> Some of the items that should be discussed in the upcoming calls IMO:
>
>- Agreeing on Principles
>
>Based on the discussions, some potential items (all up for debate)
>   - Considering Airflow 3.0 for early adopters and* breaking (and
>   removing) things for AF 3.0*. Things can be re-added as needed in
>   upcoming minor releases
>   - Optimize to get *foundational pieces in* and not "let perfect be
>   the enemy of good"
>   - Working on features that solidify Airflow as the* modern
>   Orchestrator* that also has state of the art *support for Data, AI &
>   ML workloads*. This includes scalability & performance discussion
>   - Set up the codebase for the next 5 years. This encompasses all the
>   things we are discussing e.g removing MySQL to reduce the test
> matrix,
>   simplifying things architecturally, consolidating serialization
> methods, etc
>
>   - Workstream & Stream Owners
>- Airflow 2 support policy including scope (feature vs bug fixes +
>security only) & support period
>- Separate discussions for each big workstream including one for items
>to remove & refactor (e.g dropping MySQL)
>- Discussion to streamline the development of Airflow 3
>   - Separating dev for Providers & Airflow (something Jarek already
>   kick-started), and
>   - Separate branch for Airflow 2
>   - CI changes for 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Kaxil Naik
Thank you all, I am very happy about the discussions.

The mailing list moves fast :). The main reason I recommended starting the
dev calls in early June was to have some of these discussions on the
mailing list.

Since Michal already scheduled a call, let's start there to discuss
various ideas. For the week after that, I have created an Airflow 2-style
recurring open dev calls for anyone to join, info below:

*Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM
GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* and
then
*One-time registration Link*:
https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
*Add to your calendar*:
https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add

I will post the meeting notes on the dev mailing list as well as Confluence
for archival purposes (example
).

Once we discuss various proposals next week, I recommend that for each
"workstream", we have an owner who would want to lead that workstream. For
items, that does not have an owner we can put those into Airflow 3 Meta
issue  or cross-link over
there so someone in the community can take it on. If we don't have an owner
who will commit to working on it, we park that item until we find the owner.

At the end of each call, I would solicit ideas for the agenda for the next
call and propose it to the broader group on the mailing list.

Some of the items that should be discussed in the upcoming calls IMO:

   - Agreeing on Principles

   Based on the discussions, some potential items (all up for debate)
  - Considering Airflow 3.0 for early adopters and* breaking (and
  removing) things for AF 3.0*. Things can be re-added as needed in
  upcoming minor releases
  - Optimize to get *foundational pieces in* and not "let perfect be
  the enemy of good"
  - Working on features that solidify Airflow as the* modern
  Orchestrator* that also has state of the art *support for Data, AI &
  ML workloads*. This includes scalability & performance discussion
  - Set up the codebase for the next 5 years. This encompasses all the
  things we are discussing e.g removing MySQL to reduce the test matrix,
  simplifying things architecturally, consolidating serialization
methods, etc

  - Workstream & Stream Owners
   - Airflow 2 support policy including scope (feature vs bug fixes +
   security only) & support period
   - Separate discussions for each big workstream including one for items
   to remove & refactor (e.g dropping MySQL)
   - Discussion to streamline the development of Airflow 3
  - Separating dev for Providers & Airflow (something Jarek already
  kick-started), and
  - Separate branch for Airflow 2
  - CI changes for the above
   - Finalize Scope + Timelines
   - Migration Utilities
   - Progress check-ins

Looking forward to the exciting months ahead.

Regards,
Kaxil

On Mon, 13 May 2024 at 21:40, Bolke de Bruin  wrote:

> Declaring connections prior to task execution was already proposed in AIP-1
> :-). At that time, I had in mind to communicate over IPC to the task the
> required settings. Registration could then happen with a manifest. Maybe
> during DAG serialization this could be obtained unobtrusively? The benefit
> is that tasks become truly atomic or independent from Airflow as long as
> they communicate their exit codes (success, failed, and I think Ash had a
> couple of others in mind - the fewer the better).
>
> If you want two-way communication, maybe for variables as they can change
> during scheduling, this can happen with AIP-44. Although, I'd prefer it to
> happen with the *executor* rather than some centralized service. If the
> executor is used, IPC is the logical choice. The benefit of this is that
> you have better resiliency and you can start to think about no downtime
> upgrades
>
> So I hope Ash takes this to 2024 :-).
>
> B.
>
>
> On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor  wrote:
>
> > > That would require some mechanism of declaring prior to task execution
> > what connections would be used
> >
> > That’s exactly what I’m proposing in the proposal doc I’m working on
> (It’s
> > part of also overhauling and re-designing the “Task Execution interface”
> > that also gives us the ability to nicely have support for running tasks
> in
> > other languages — much more than just BashOperator)
> >
> > This is a bit of a fundamental shift in thinking about task execution in
> > Airflow, but I think it gives us some really nice properties that the
> > project is currently missing.
> >
> > Tl;dr; lets discuss this in my doc when it comes our (next week most
> > likely) please :)
> >
> > -ash
> >
> > > On 13 May 2024, at 18:15, Daniel Standish
> >  wrote:
> > >
> > > re
> > >
> > > As tasks require connection access, I assume connection 

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Bolke de Bruin
Declaring connections prior to task execution was already proposed in AIP-1
:-). At that time, I had in mind to communicate over IPC to the task the
required settings. Registration could then happen with a manifest. Maybe
during DAG serialization this could be obtained unobtrusively? The benefit
is that tasks become truly atomic or independent from Airflow as long as
they communicate their exit codes (success, failed, and I think Ash had a
couple of others in mind - the fewer the better).

If you want two-way communication, maybe for variables as they can change
during scheduling, this can happen with AIP-44. Although, I'd prefer it to
happen with the *executor* rather than some centralized service. If the
executor is used, IPC is the logical choice. The benefit of this is that
you have better resiliency and you can start to think about no downtime
upgrades

So I hope Ash takes this to 2024 :-).

B.


On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor  wrote:

> > That would require some mechanism of declaring prior to task execution
> what connections would be used
>
> That’s exactly what I’m proposing in the proposal doc I’m working on (It’s
> part of also overhauling and re-designing the “Task Execution interface”
> that also gives us the ability to nicely have support for running tasks in
> other languages — much more than just BashOperator)
>
> This is a bit of a fundamental shift in thinking about task execution in
> Airflow, but I think it gives us some really nice properties that the
> project is currently missing.
>
> Tl;dr; lets discuss this in my doc when it comes our (next week most
> likely) please :)
>
> -ash
>
> > On 13 May 2024, at 18:15, Daniel Standish
>  wrote:
> >
> > re
> >
> > As tasks require connection access, I assume connection data will somehow
> >> be passed as part of the
> >> metadata to task execution - whether it's part of the executor protocol
> or
> >> in some other way (I'm
> >> not an expert on that part of Airflow). Then, provided it's accessible
> as
> >> part of some execution
> >> context, and not only passed to the task's execute method, OpenLineage
> >> could utilize it.
> >>
> >
> > It's not strictly necessary that connection info be passed "as part of
> task
> > matadata".  That would require some mechanism of declaring prior to task
> > execution what connections would be used.  This is a thought that has
> come
> > up when thinking about execution of non-python tasks.  But it's not
> > required from a technical perspective by AIP-44 because the
> > `get_connection` function can be made to be an RPC call so a task could
> > continue to retrieve connections at runtime.
>
>

-- 

--
Bolke de Bruin
bdbr...@gmail.com


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Ash Berlin-Taylor
> That would require some mechanism of declaring prior to task execution what 
> connections would be used

That’s exactly what I’m proposing in the proposal doc I’m working on (It’s part 
of also overhauling and re-designing the “Task Execution interface” that also 
gives us the ability to nicely have support for running tasks in other 
languages — much more than just BashOperator)

This is a bit of a fundamental shift in thinking about task execution in 
Airflow, but I think it gives us some really nice properties that the project 
is currently missing.

Tl;dr; lets discuss this in my doc when it comes our (next week most likely) 
please :)

-ash

> On 13 May 2024, at 18:15, Daniel Standish 
>  wrote:
> 
> re
> 
> As tasks require connection access, I assume connection data will somehow
>> be passed as part of the
>> metadata to task execution - whether it's part of the executor protocol or
>> in some other way (I'm
>> not an expert on that part of Airflow). Then, provided it's accessible as
>> part of some execution
>> context, and not only passed to the task's execute method, OpenLineage
>> could utilize it.
>> 
> 
> It's not strictly necessary that connection info be passed "as part of task
> matadata".  That would require some mechanism of declaring prior to task
> execution what connections would be used.  This is a thought that has come
> up when thinking about execution of non-python tasks.  But it's not
> required from a technical perspective by AIP-44 because the
> `get_connection` function can be made to be an RPC call so a task could
> continue to retrieve connections at runtime.



Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Daniel Standish
re

As tasks require connection access, I assume connection data will somehow
> be passed as part of the
> metadata to task execution - whether it's part of the executor protocol or
> in some other way (I'm
> not an expert on that part of Airflow). Then, provided it's accessible as
> part of some execution
> context, and not only passed to the task's execute method, OpenLineage
> could utilize it.
>

It's not strictly necessary that connection info be passed "as part of task
matadata".  That would require some mechanism of declaring prior to task
execution what connections would be used.  This is a thought that has come
up when thinking about execution of non-python tasks.  But it's not
required from a technical perspective by AIP-44 because the
`get_connection` function can be made to be an RPC call so a task could
continue to retrieve connections at runtime.


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Michał Modras
making use of the current test harness for
> Airflow
> > 2
> > > +
> > > > > >> > Providers and extend it with Airflow 3 future-compatibility
> > > tests).
> > > > > That
> > > > > >> > means Breeze would be only staying in Airflow 2 + Providers
> repo
> > > as
> > > > we
> > > > > >> > should be able to achieve most of what we have there with
> local
> > > > venv/
> > > > > >> > tooling (especially with uv as underlying tooling).
> > > > > >> >
> > > > > >> > 2) *I think we only add very few new "important" features.
> > > *Absolute
> > > > > >> > minimum to make Airflow 3 appealing and add them only in
> Airflow
> > > 3:
> > > > > >> > versioning, multi-team, pluggable UI should only be Airflow 3
> -
> > it
> > > > > makes
> > > > > >> no
> > > > > >> > sense to invest into Airflow 2 if we already know Airflow 3 is
> > > > coming
> > > > > -
> > > > > >> > that generally triples effort needed to get them out. We
> should
> > > drop
> > > > > new
> > > > > >> > features development in Airflow 2. This will give users
> > incentive
> > > to
> > > > > move
> > > > > >> > to 3 if the new features will be worth it. Even paying
> > > > > >> > compatibility/migration price.
> > > > > >> >
> > > > > >> > Versionig, for example: I believe if we decide to go only with
> > > > > Airflow 3
> > > > > >> > and cut some of the above (Postgres only, Single versioning
> DAG
> > > > > storage)
> > > > > >> we
> > > > > >> > can make bolder decisions in versioning and support simpler
> > models
> > > > > from
> > > > > >> the
> > > > > >> > get go (and deliver it faster). And we should add only a few -
> > but
> > > > > >> > important - features that our users clearly asked for and
> focus
> > on
> > > > > >> > delivering Airflow 3 as soon as possible (instead of Airflow
> > 2.10
> > > or
> > > > > >> 2.11).
> > > > > >> > Similarly - multi-team can be simplified if we cut things from
> > the
> > > > > list
> > > > > >> > above and have Task isolation as first-class citizens in
> Airflow
> > > > (and
> > > > > the
> > > > > >> > only option).
> > > > > >> >
> > > > > >> > My candidates very much concur with the list shared by Kaxil
> in
> > > the
> > > > > doc +
> > > > > >> > I'd add multi-team (but simplified thanks to the cuts). But I
> > also
> > > > > here
> > > > > >> > would mostly revert to Astronomer, Google. AWS team to define
> > > > > >> collectively
> > > > > >> > what is the absolute minimum set of features that would get
> the
> > > > > "target"
> > > > > >> > part of their customers happy. And ONLY do that.
> > > > > >> >
> > > > > >> > So in short - I think the big part of our discussion should be
> > > what
> > > > we
> > > > > >> are
> > > > > >> > ready to drop when we start airflow 3 and be very bold. Once
> we
> > > know
> > > > > we
> > > > > >> > should figure out the absolute minimum of things that we can
> add
> > > > that
> > > > > >> will
> > > > > >> > benefit a significant part of our users (and make use of
> > increased
> > > > > speed
> > > > > >> > because we dropped things).
> > > > > >> >
> > > > > >> > J.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Mon, May 6, 2024 at 8:40 PM Constance Martineau
> > > > > >> >  > > consta...@astronomer.io.inva> <mailto:consta...@astronomer.io.inva
> > > <mailto:consta...@astronomer.io.inva>>
> > >

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Maciej Obuchowski
 local
> > > venv/
> > > > >> > tooling (especially with uv as underlying tooling).
> > > > >> >
> > > > >> > 2) *I think we only add very few new "important" features.
> > *Absolute
> > > > >> > minimum to make Airflow 3 appealing and add them only in Airflow
> > 3:
> > > > >> > versioning, multi-team, pluggable UI should only be Airflow 3 -
> it
> > > > makes
> > > > >> no
> > > > >> > sense to invest into Airflow 2 if we already know Airflow 3 is
> > > coming
> > > > -
> > > > >> > that generally triples effort needed to get them out. We should
> > drop
> > > > new
> > > > >> > features development in Airflow 2. This will give users
> incentive
> > to
> > > > move
> > > > >> > to 3 if the new features will be worth it. Even paying
> > > > >> > compatibility/migration price.
> > > > >> >
> > > > >> > Versionig, for example: I believe if we decide to go only with
> > > > Airflow 3
> > > > >> > and cut some of the above (Postgres only, Single versioning DAG
> > > > storage)
> > > > >> we
> > > > >> > can make bolder decisions in versioning and support simpler
> models
> > > > from
> > > > >> the
> > > > >> > get go (and deliver it faster). And we should add only a few -
> but
> > > > >> > important - features that our users clearly asked for and focus
> on
> > > > >> > delivering Airflow 3 as soon as possible (instead of Airflow
> 2.10
> > or
> > > > >> 2.11).
> > > > >> > Similarly - multi-team can be simplified if we cut things from
> the
> > > > list
> > > > >> > above and have Task isolation as first-class citizens in Airflow
> > > (and
> > > > the
> > > > >> > only option).
> > > > >> >
> > > > >> > My candidates very much concur with the list shared by Kaxil in
> > the
> > > > doc +
> > > > >> > I'd add multi-team (but simplified thanks to the cuts). But I
> also
> > > > here
> > > > >> > would mostly revert to Astronomer, Google. AWS team to define
> > > > >> collectively
> > > > >> > what is the absolute minimum set of features that would get the
> > > > "target"
> > > > >> > part of their customers happy. And ONLY do that.
> > > > >> >
> > > > >> > So in short - I think the big part of our discussion should be
> > what
> > > we
> > > > >> are
> > > > >> > ready to drop when we start airflow 3 and be very bold. Once we
> > know
> > > > we
> > > > >> > should figure out the absolute minimum of things that we can add
> > > that
> > > > >> will
> > > > >> > benefit a significant part of our users (and make use of
> increased
> > > > speed
> > > > >> > because we dropped things).
> > > > >> >
> > > > >> > J.
> > > > >> >
> > > > >> >
> > > > >> > On Mon, May 6, 2024 at 8:40 PM Constance Martineau
> > > > >> >  > consta...@astronomer.io.inva> <mailto:consta...@astronomer.io.inva
> > <mailto:consta...@astronomer.io.inva>>
> > > > >> <mailto:consta...@astronomer.io.inva  > consta...@astronomer.io.inva>  > > > consta...@astronomer.io.inva <mailto:consta...@astronomer.io.inva
> > >>>lid>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi Michal,
> > > > >> > >
> > > > >> > > Thanks for your thoughts on the Airflow 3 proposal. I
> appreciate
> > > > your
> > > > >> > > concerns about the migration overhead for our users with a
> major
> > > new
> > > > >> > > version and see the appeal in your suggestion to integrate
> many
> > of
> > > > the
> > > > >> > > proposed changes into Airflow 2 through separate AIPs. It’s a
> > > valid
> > > > >> point
> > > > >> > > an

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-13 Thread Michał Modras
> > and cut some of the above (Postgres only, Single versioning DAG
> > > storage)
> > > >> we
> > > >> > can make bolder decisions in versioning and support simpler models
> > > from
> > > >> the
> > > >> > get go (and deliver it faster). And we should add only a few - but
> > > >> > important - features that our users clearly asked for and focus on
> > > >> > delivering Airflow 3 as soon as possible (instead of Airflow 2.10
> or
> > > >> 2.11).
> > > >> > Similarly - multi-team can be simplified if we cut things from the
> > > list
> > > >> > above and have Task isolation as first-class citizens in Airflow
> > (and
> > > the
> > > >> > only option).
> > > >> >
> > > >> > My candidates very much concur with the list shared by Kaxil in
> the
> > > doc +
> > > >> > I'd add multi-team (but simplified thanks to the cuts). But I also
> > > here
> > > >> > would mostly revert to Astronomer, Google. AWS team to define
> > > >> collectively
> > > >> > what is the absolute minimum set of features that would get the
> > > "target"
> > > >> > part of their customers happy. And ONLY do that.
> > > >> >
> > > >> > So in short - I think the big part of our discussion should be
> what
> > we
> > > >> are
> > > >> > ready to drop when we start airflow 3 and be very bold. Once we
> know
> > > we
> > > >> > should figure out the absolute minimum of things that we can add
> > that
> > > >> will
> > > >> > benefit a significant part of our users (and make use of increased
> > > speed
> > > >> > because we dropped things).
> > > >> >
> > > >> > J.
> > > >> >
> > > >> >
> > > >> > On Mon, May 6, 2024 at 8:40 PM Constance Martineau
> > > >> >  consta...@astronomer.io.inva> <mailto:consta...@astronomer.io.inva
> <mailto:consta...@astronomer.io.inva>>
> > > >> <mailto:consta...@astronomer.io.inva  consta...@astronomer.io.inva>  > > consta...@astronomer.io.inva <mailto:consta...@astronomer.io.inva
> >>>lid>
> > > >> wrote:
> > > >> >
> > > >> > > Hi Michal,
> > > >> > >
> > > >> > > Thanks for your thoughts on the Airflow 3 proposal. I appreciate
> > > your
> > > >> > > concerns about the migration overhead for our users with a major
> > new
> > > >> > > version and see the appeal in your suggestion to integrate many
> of
> > > the
> > > >> > > proposed changes into Airflow 2 through separate AIPs. It’s a
> > valid
> > > >> point
> > > >> > > and certainly aligns with the value of making incremental
> > > improvements.
> > > >> > >
> > > >> > > However, after looking closely at the enhancements outlined for
> > > Airflow
> > > >> > 3,
> > > >> > > I'm convinced they warrant a new major release. Here’s why:
> > > >> > >
> > > >> > > 1. *Core Architectural Changes:* We’re looking at foundational
> > > changes
> > > >> > > with Airflow 3—like redefining task priorities, separating task
> > > >> > > definition
> > > >> > > and task execution, and new AIPs like DAG versioning. remote
> > > execution
> > > >> > > and restricting database access from workers. These aren’t just
> > > >> > > incremental
> > > >> > > improvements but major shifts that will set the stage for the
> next
> > > >> > > decade
> > > >> > > of Airflow’s architecture. Grouping these changes into a major
> > > release
> > > >> > > will
> > > >> > > help us make these transitions more cleanly and with fewer
> > > constraints
> > > >> > > from
> > > >> > > past decisions.
> > > >> > > 2. *Code Clean-Up*: Our main branch has accumulated over 140
> > > >> > deprecated
> > > >> > > issues, and this will only grow if we c

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-11 Thread Mehta, Shubham
t; incremental
> > >> > > improvements but major shifts that will set the stage for the next
> > >> > > decade
> > >> > > of Airflow’s architecture. Grouping these changes into a major
> > release
> > >> > > will
> > >> > > help us make these transitions more cleanly and with fewer
> > constraints
> > >> > > from
> > >> > > past decisions.
> > >> > > 2. *Code Clean-Up*: Our main branch has accumulated over 140
> > >> > deprecated
> > >> > > issues, and this will only grow if we continue without a major
> > >> > cleanup.
> > >> > > This makes it increasingly difficult to implement new features
> > >> > > effectively
> > >> > > while maintaining backward compatibility. A major release allows
> us
> > to
> > >> > > address these issues head-on, reducing technical debt and paving
> the
> > >> > way
> > >> > > for a more robust platform.
> > >> > > 3. *Managing Breaking Changes:* Let’s take the example of
> > restricting
> > >> > > database access from workers. It’s a necessary move for better
> > >> > security
> > >> > > and
> > >> > > also potentially scalability reasons (reduces DB load). Many users
> > >> > have
> > >> > > workflows that interact with the DB, either by using raw sql or by
> > >> > > leveraging a session object. We could implement this feature in
> > >> > Airflow
> > >> > > 2
> > >> > > and avoid breaking existing workflows by continuing to have the
> old
> > >> > > standard mode as default - much of the work is already done - but
> > that
> > >> > > would mean supporting both the new secure mode and the old
> standard
> > >> > mode
> > >> > > indefinitely and design new features with the assumption that most
> > >> > will
> > >> > > continue using the old standard mode. With Airflow 3, we can make
> > >> > secure
> > >> > > mode the default or even the only option, simplifying
> implementation
> > >> > and
> > >> > > future development. This is just one example where it is feasible
> to
> > >> > > implement in Airflow 2, but is better if we release it under the
> > >> > > context of
> > >> > > Airflow 3.
> > >> > > 4. *Future-Proofing for New Features:* Airflow 3 will open up
> > >> > > possibilities for handling workflows beyond batch processing.
> > Features
> > >> > > like
> > >> > > real-time DAG execution through API and multi-language task
> support
> > >> > are
> > >> > > big
> > >> > > steps forward, significantly expanding Airflow’s utility.
> > >> > >
> > >> > >
> > >> > > While integrating these updates into Airflow 2 might look less
> > >> disruptive
> > >> > > initially, the scale and nature of the required changes really
> > support
> > >> a
> > >> > > move to Airflow 3. It’s not just about adding new features; it’s
> > about
> > >> > > setting up Airflow so that it continues to remain relevant for the
> > next
> > >> > ten
> > >> > > years.
> > >> > >
> > >> > > Constance
> > >> > >
> > >> > > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor  > >> > > <mailto:a...@apache.org>
> > >> <mailto:a...@apache.org <mailto:a...@apache.org>> 
> > >> <mailto:a...@apache.org <mailto:a...@apache.org> <mailto:a...@apache.org 
> > >> <mailto:a...@apache.org>
> > >>>
> > >> wrote:
> > >> > >
> > >> > > > There's a lot of technical debt hiding in Airflow, especially
> the
> > >> > > > scheduler that makes it harder and harder to efficiently add new
> > >> > > features.
> > >> > > >
> > >> > > > At some point, very soon, we are going to have to remove some
> very
> > >> > > > infrequently used back compat shims that negatively affect
> > >> performance.
> > >> > > > Without doi

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-10 Thread Pierre Jeambrun
t; > > from
> > >> > > past decisions.
> > >> > > 2. *Code Clean-Up*: Our main branch has accumulated over 140
> > >> > deprecated
> > >> > > issues, and this will only grow if we continue without a major
> > >> > cleanup.
> > >> > > This makes it increasingly difficult to implement new features
> > >> > > effectively
> > >> > > while maintaining backward compatibility. A major release allows
> us
> > to
> > >> > > address these issues head-on, reducing technical debt and paving
> the
> > >> > way
> > >> > > for a more robust platform.
> > >> > > 3. *Managing Breaking Changes:* Let’s take the example of
> > restricting
> > >> > > database access from workers. It’s a necessary move for better
> > >> > security
> > >> > > and
> > >> > > also potentially scalability reasons (reduces DB load). Many users
> > >> > have
> > >> > > workflows that interact with the DB, either by using raw sql or by
> > >> > > leveraging a session object. We could implement this feature in
> > >> > Airflow
> > >> > > 2
> > >> > > and avoid breaking existing workflows by continuing to have the
> old
> > >> > > standard mode as default - much of the work is already done - but
> > that
> > >> > > would mean supporting both the new secure mode and the old
> standard
> > >> > mode
> > >> > > indefinitely and design new features with the assumption that most
> > >> > will
> > >> > > continue using the old standard mode. With Airflow 3, we can make
> > >> > secure
> > >> > > mode the default or even the only option, simplifying
> implementation
> > >> > and
> > >> > > future development. This is just one example where it is feasible
> to
> > >> > > implement in Airflow 2, but is better if we release it under the
> > >> > > context of
> > >> > > Airflow 3.
> > >> > > 4. *Future-Proofing for New Features:* Airflow 3 will open up
> > >> > > possibilities for handling workflows beyond batch processing.
> > Features
> > >> > > like
> > >> > > real-time DAG execution through API and multi-language task
> support
> > >> > are
> > >> > > big
> > >> > > steps forward, significantly expanding Airflow’s utility.
> > >> > >
> > >> > >
> > >> > > While integrating these updates into Airflow 2 might look less
> > >> disruptive
> > >> > > initially, the scale and nature of the required changes really
> > support
> > >> a
> > >> > > move to Airflow 3. It’s not just about adding new features; it’s
> > about
> > >> > > setting up Airflow so that it continues to remain relevant for the
> > next
> > >> > ten
> > >> > > years.
> > >> > >
> > >> > > Constance
> > >> > >
> > >> > > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor  > >> <mailto:a...@apache.org> <mailto:a...@apache.org <mailto:a...@apache.org
> > >>>
> > >> wrote:
> > >> > >
> > >> > > > There's a lot of technical debt hiding in Airflow, especially
> the
> > >> > > > scheduler that makes it harder and harder to efficiently add new
> > >> > > features.
> > >> > > >
> > >> > > > At some point, very soon, we are going to have to remove some
> very
> > >> > > > infrequently used back compat shims that negatively affect
> > >> performance.
> > >> > > > Without doing that the pace at which we can realistically add
> > some of
> > >> > the
> > >> > > > more exciting features tends towards zero. Developer speed of
> > >> > > contributors
> > >> > > > is a factor here too!
> > >> > > >
> > >> > > > So while we are still using SemVer, that necessitates v3.
> > >> > > >
> > >> > > > Ash
> > >> > > >
> > >> > > > On 6 May 2024 15:30:49 BST, "Michał Modras" <
>

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-09 Thread Amogh Desai
incremental
> >> > > improvements but major shifts that will set the stage for the next
> >> > > decade
> >> > > of Airflow’s architecture. Grouping these changes into a major
> release
> >> > > will
> >> > > help us make these transitions more cleanly and with fewer
> constraints
> >> > > from
> >> > > past decisions.
> >> > > 2. *Code Clean-Up*: Our main branch has accumulated over 140
> >> > deprecated
> >> > > issues, and this will only grow if we continue without a major
> >> > cleanup.
> >> > > This makes it increasingly difficult to implement new features
> >> > > effectively
> >> > > while maintaining backward compatibility. A major release allows us
> to
> >> > > address these issues head-on, reducing technical debt and paving the
> >> > way
> >> > > for a more robust platform.
> >> > > 3. *Managing Breaking Changes:* Let’s take the example of
> restricting
> >> > > database access from workers. It’s a necessary move for better
> >> > security
> >> > > and
> >> > > also potentially scalability reasons (reduces DB load). Many users
> >> > have
> >> > > workflows that interact with the DB, either by using raw sql or by
> >> > > leveraging a session object. We could implement this feature in
> >> > Airflow
> >> > > 2
> >> > > and avoid breaking existing workflows by continuing to have the old
> >> > > standard mode as default - much of the work is already done - but
> that
> >> > > would mean supporting both the new secure mode and the old standard
> >> > mode
> >> > > indefinitely and design new features with the assumption that most
> >> > will
> >> > > continue using the old standard mode. With Airflow 3, we can make
> >> > secure
> >> > > mode the default or even the only option, simplifying implementation
> >> > and
> >> > > future development. This is just one example where it is feasible to
> >> > > implement in Airflow 2, but is better if we release it under the
> >> > > context of
> >> > > Airflow 3.
> >> > > 4. *Future-Proofing for New Features:* Airflow 3 will open up
> >> > > possibilities for handling workflows beyond batch processing.
> Features
> >> > > like
> >> > > real-time DAG execution through API and multi-language task support
> >> > are
> >> > > big
> >> > > steps forward, significantly expanding Airflow’s utility.
> >> > >
> >> > >
> >> > > While integrating these updates into Airflow 2 might look less
> >> disruptive
> >> > > initially, the scale and nature of the required changes really
> support
> >> a
> >> > > move to Airflow 3. It’s not just about adding new features; it’s
> about
> >> > > setting up Airflow so that it continues to remain relevant for the
> next
> >> > ten
> >> > > years.
> >> > >
> >> > > Constance
> >> > >
> >> > > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor  >> <mailto:a...@apache.org> <mailto:a...@apache.org <mailto:a...@apache.org
> >>>
> >> wrote:
> >> > >
> >> > > > There's a lot of technical debt hiding in Airflow, especially the
> >> > > > scheduler that makes it harder and harder to efficiently add new
> >> > > features.
> >> > > >
> >> > > > At some point, very soon, we are going to have to remove some very
> >> > > > infrequently used back compat shims that negatively affect
> >> performance.
> >> > > > Without doing that the pace at which we can realistically add
> some of
> >> > the
> >> > > > more exciting features tends towards zero. Developer speed of
> >> > > contributors
> >> > > > is a factor here too!
> >> > > >
> >> > > > So while we are still using SemVer, that necessitates v3.
> >> > > >
> >> > > > Ash
> >> > > >
> >> > > > On 6 May 2024 15:30:49 BST, "Michał Modras" <
> michalmod...@google.com
> >> <mailto:michalmod...@google.com> <mailto:michalm

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-09 Thread Ash Berlin-Taylor
. Once we know we
>> > should figure out the absolute minimum of things that we can add that
>> will
>> > benefit a significant part of our users (and make use of increased speed
>> > because we dropped things).
>> >
>> > J.
>> >
>> >
>> > On Mon, May 6, 2024 at 8:40 PM Constance Martineau
>> > mailto:consta...@astronomer.io.inva>
>> <mailto:consta...@astronomer.io.inva 
>> <mailto:consta...@astronomer.io.inva>>lid>
>> wrote:
>> >
>> > > Hi Michal,
>> > >
>> > > Thanks for your thoughts on the Airflow 3 proposal. I appreciate your
>> > > concerns about the migration overhead for our users with a major new
>> > > version and see the appeal in your suggestion to integrate many of the
>> > > proposed changes into Airflow 2 through separate AIPs. It’s a valid
>> point
>> > > and certainly aligns with the value of making incremental improvements.
>> > >
>> > > However, after looking closely at the enhancements outlined for Airflow
>> > 3,
>> > > I'm convinced they warrant a new major release. Here’s why:
>> > >
>> > > 1. *Core Architectural Changes:* We’re looking at foundational changes
>> > > with Airflow 3—like redefining task priorities, separating task
>> > > definition
>> > > and task execution, and new AIPs like DAG versioning. remote execution
>> > > and restricting database access from workers. These aren’t just
>> > > incremental
>> > > improvements but major shifts that will set the stage for the next
>> > > decade
>> > > of Airflow’s architecture. Grouping these changes into a major release
>> > > will
>> > > help us make these transitions more cleanly and with fewer constraints
>> > > from
>> > > past decisions.
>> > > 2. *Code Clean-Up*: Our main branch has accumulated over 140
>> > deprecated
>> > > issues, and this will only grow if we continue without a major
>> > cleanup.
>> > > This makes it increasingly difficult to implement new features
>> > > effectively
>> > > while maintaining backward compatibility. A major release allows us to
>> > > address these issues head-on, reducing technical debt and paving the
>> > way
>> > > for a more robust platform.
>> > > 3. *Managing Breaking Changes:* Let’s take the example of restricting
>> > > database access from workers. It’s a necessary move for better
>> > security
>> > > and
>> > > also potentially scalability reasons (reduces DB load). Many users
>> > have
>> > > workflows that interact with the DB, either by using raw sql or by
>> > > leveraging a session object. We could implement this feature in
>> > Airflow
>> > > 2
>> > > and avoid breaking existing workflows by continuing to have the old
>> > > standard mode as default - much of the work is already done - but that
>> > > would mean supporting both the new secure mode and the old standard
>> > mode
>> > > indefinitely and design new features with the assumption that most
>> > will
>> > > continue using the old standard mode. With Airflow 3, we can make
>> > secure
>> > > mode the default or even the only option, simplifying implementation
>> > and
>> > > future development. This is just one example where it is feasible to
>> > > implement in Airflow 2, but is better if we release it under the
>> > > context of
>> > > Airflow 3.
>> > > 4. *Future-Proofing for New Features:* Airflow 3 will open up
>> > > possibilities for handling workflows beyond batch processing. Features
>> > > like
>> > > real-time DAG execution through API and multi-language task support
>> > are
>> > > big
>> > > steps forward, significantly expanding Airflow’s utility.
>> > >
>> > >
>> > > While integrating these updates into Airflow 2 might look less
>> disruptive
>> > > initially, the scale and nature of the required changes really support
>> a
>> > > move to Airflow 3. It’s not just about adding new features; it’s about
>> > > setting up Airflow so that it continues to remain relevant for the next
>> > ten
>> > > years.
>> > >
>> > > Constance
>> > >
>> > > On Mon, May 6, 2024 at 2:10 PM Ash Ber

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-08 Thread Jarek Potiuk
gt; deprecated
> > > issues, and this will only grow if we continue without a major
> > cleanup.
> > > This makes it increasingly difficult to implement new features
> > > effectively
> > > while maintaining backward compatibility. A major release allows us to
> > > address these issues head-on, reducing technical debt and paving the
> > way
> > > for a more robust platform.
> > > 3. *Managing Breaking Changes:* Let’s take the example of restricting
> > > database access from workers. It’s a necessary move for better
> > security
> > > and
> > > also potentially scalability reasons (reduces DB load). Many users
> > have
> > > workflows that interact with the DB, either by using raw sql or by
> > > leveraging a session object. We could implement this feature in
> > Airflow
> > > 2
> > > and avoid breaking existing workflows by continuing to have the old
> > > standard mode as default - much of the work is already done - but that
> > > would mean supporting both the new secure mode and the old standard
> > mode
> > > indefinitely and design new features with the assumption that most
> > will
> > > continue using the old standard mode. With Airflow 3, we can make
> > secure
> > > mode the default or even the only option, simplifying implementation
> > and
> > > future development. This is just one example where it is feasible to
> > > implement in Airflow 2, but is better if we release it under the
> > > context of
> > > Airflow 3.
> > > 4. *Future-Proofing for New Features:* Airflow 3 will open up
> > > possibilities for handling workflows beyond batch processing. Features
> > > like
> > > real-time DAG execution through API and multi-language task support
> > are
> > > big
> > > steps forward, significantly expanding Airflow’s utility.
> > >
> > >
> > > While integrating these updates into Airflow 2 might look less
> disruptive
> > > initially, the scale and nature of the required changes really support
> a
> > > move to Airflow 3. It’s not just about adding new features; it’s about
> > > setting up Airflow so that it continues to remain relevant for the next
> > ten
> > > years.
> > >
> > > Constance
> > >
> > > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor  <mailto:a...@apache.org> <mailto:a...@apache.org <mailto:a...@apache.org>>>
> wrote:
> > >
> > > > There's a lot of technical debt hiding in Airflow, especially the
> > > > scheduler that makes it harder and harder to efficiently add new
> > > features.
> > > >
> > > > At some point, very soon, we are going to have to remove some very
> > > > infrequently used back compat shims that negatively affect
> performance.
> > > > Without doing that the pace at which we can realistically add some of
> > the
> > > > more exciting features tends towards zero. Developer speed of
> > > contributors
> > > > is a factor here too!
> > > >
> > > > So while we are still using SemVer, that necessitates v3.
> > > >
> > > > Ash
> > > >
> > > > On 6 May 2024 15:30:49 BST, "Michał Modras"  <mailto:michalmod...@google.com> <mailto:michalmod...@google.com  michalmod...@google.com>>
> > > .INVALID>
> > > > wrote:
> > > > >+1 to Jens's & Bolke's points here and in the doc
> > > > >
> > > > >I agree we should work on clarifying the directions we would like
> > > Airflow
> > > > >to go. Introducing a new major Airflow version is a massive overhead
> > for
> > > > >users, who would need to plan for migrations, onboarding the new
> > Airflow
> > > > >(with a slightly different architecture), etc., and effectively
> > Airflow
> > > 2
> > > > >would live in parallel for a long time.
> > > > >
> > > > >Personally, I think most of the points in Kaxil's/Vikram's doc are
> > > > valuable
> > > > >projects of their own, and I could imagine all of them being
> delivered
> > > as
> > > > >separate AIPs within Airflow 2 (surely new minor versions of Airflow
> > > 2). I
> > > > >am not sure if the scope of changes and the goal we want to achieve
> is
> > > a)
> > > > >clear enough b) broad enough to call for a new major

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-08 Thread Bishundeo, Rajeshwar
oth the new secure mode and the old standard
> mode
> > indefinitely and design new features with the assumption that most
> will
> > continue using the old standard mode. With Airflow 3, we can make
> secure
> > mode the default or even the only option, simplifying implementation
> and
> > future development. This is just one example where it is feasible to
> > implement in Airflow 2, but is better if we release it under the
> > context of
> > Airflow 3.
> > 4. *Future-Proofing for New Features:* Airflow 3 will open up
> > possibilities for handling workflows beyond batch processing. Features
> > like
> > real-time DAG execution through API and multi-language task support
> are
> > big
> > steps forward, significantly expanding Airflow’s utility.
> >
> >
> > While integrating these updates into Airflow 2 might look less disruptive
> > initially, the scale and nature of the required changes really support a
> > move to Airflow 3. It’s not just about adding new features; it’s about
> > setting up Airflow so that it continues to remain relevant for the next
> ten
> > years.
> >
> > Constance
> >
> > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor  > <mailto:a...@apache.org> <mailto:a...@apache.org <mailto:a...@apache.org>>> 
> > wrote:
> >
> > > There's a lot of technical debt hiding in Airflow, especially the
> > > scheduler that makes it harder and harder to efficiently add new
> > features.
> > >
> > > At some point, very soon, we are going to have to remove some very
> > > infrequently used back compat shims that negatively affect performance.
> > > Without doing that the pace at which we can realistically add some of
> the
> > > more exciting features tends towards zero. Developer speed of
> > contributors
> > > is a factor here too!
> > >
> > > So while we are still using SemVer, that necessitates v3.
> > >
> > > Ash
> > >
> > > On 6 May 2024 15:30:49 BST, "Michał Modras"  > > <mailto:michalmod...@google.com> <mailto:michalmod...@google.com 
> > > <mailto:michalmod...@google.com>>
> > .INVALID>
> > > wrote:
> > > >+1 to Jens's & Bolke's points here and in the doc
> > > >
> > > >I agree we should work on clarifying the directions we would like
> > Airflow
> > > >to go. Introducing a new major Airflow version is a massive overhead
> for
> > > >users, who would need to plan for migrations, onboarding the new
> Airflow
> > > >(with a slightly different architecture), etc., and effectively
> Airflow
> > 2
> > > >would live in parallel for a long time.
> > > >
> > > >Personally, I think most of the points in Kaxil's/Vikram's doc are
> > > valuable
> > > >projects of their own, and I could imagine all of them being delivered
> > as
> > > >separate AIPs within Airflow 2 (surely new minor versions of Airflow
> > 2). I
> > > >am not sure if the scope of changes and the goal we want to achieve is
> > a)
> > > >clear enough b) broad enough to call for a new major version.
> > > >
> > > >Best,
> > > >Michal
> > > >
> > > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> > > > > > ><mailto:jens.scheff...@de.bosch.com.inva> 
> > > ><mailto:jens.scheff...@de.bosch.com.inva 
> > > ><mailto:jens.scheff...@de.bosch.com.inva>>lid> wrote:
> > > >
> > > >> Thanks for the document write-up, Kaxil. I assume this is mostly a
> > > vision
> > > >> statement.
> > > >>
> > > >> Looking forward for a larger addendum where we can collect things
> that
> > > we
> > > >> all can vote and agree on as targets.
> > > >>
> > > >> As I started earlier with a confluence page and it seems this is not
> > > >> accessible to all, shall we convert this to a Google Doc for better
> > > >> collaboration and item collection?
> > > >>
> > > >> Sent from Outlook for iOS<https://aka.ms/o0ukef> 
> > > >> <https://aka.ms/o0ukef;> <https://aka.ms/o0ukef;> 
> > > >> <https://aka.ms/o0ukefgt;;>
> > > >> 
> > > >> From: Vikram Koka  > > >> <mailto:vik...@astronomer.io.inva> <mailto:vik...@astronom

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-07 Thread Constance Martineau
> Airflow
> > > >(with a slightly different architecture), etc., and effectively
> Airflow
> > 2
> > > >would live in parallel for a long time.
> > > >
> > > >Personally, I think most of the points in Kaxil's/Vikram's doc are
> > > valuable
> > > >projects of their own, and I could imagine all of them being delivered
> > as
> > > >separate AIPs within Airflow 2 (surely new minor versions of Airflow
> > 2). I
> > > >am not sure if the scope of changes and the goal we want to achieve is
> > a)
> > > >clear enough b) broad enough to call for a new major version.
> > > >
> > > >Best,
> > > >Michal
> > > >
> > > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> > > > wrote:
> > > >
> > > >> Thanks for the document write-up, Kaxil. I assume this is mostly a
> > > vision
> > > >> statement.
> > > >>
> > > >> Looking forward for a larger addendum where we can collect things
> that
> > > we
> > > >> all can vote and agree on as targets.
> > > >>
> > > >> As I started earlier with a confluence page and it seems this is not
> > > >> accessible to all, shall we convert this to a Google Doc for better
> > > >> collaboration and item collection?
> > > >>
> > > >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > > >> 
> > > >> From: Vikram Koka 
> > > >> Sent: Sunday, May 5, 2024 3:34:33 AM
> > > >> To: dev@airflow.apache.org 
> > > >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > > >> strategic (Airflow 3) approach
> > > >>
> > > >> Thank you for your feedback, Bolke and Andrey!
> > > >>
> > > >> Bolke,
> > > >> I have replied to some of your comments in the doc.
> > > >> I will provide a detailed write up on the "Interactive DAG run" (or
> > > >> synchronous DAG run) capability, which has generated some early
> > > questions.
> > > >> I had intended to get an AIP published for that as a follow-up, but
> I
> > > >> believe that a simpler write up would be useful ahead of the AIP.
> > > >>
> > > >> Andrey,
> > > >> You raise an interesting point.
> > > >>
> > > >> As part of the Airflow 2.0 release, we as a community had decided to
> > > >> strictly adhere to Semver as detailed in the document you
> referenced.
> > We
> > > >> also consciously split out the "Core Airflow" releases from the
> > > "Provider"
> > > >> releases at that time. We had a clear expectation then for the
> cadence
> > > of
> > > >> both minor and patch releases, which we have generally adhered to
> > since
> > > >> then.
> > > >>
> > > >> Personally, I am more concerned about our Provider releases right
> now,
> > > as
> > > >> compared to the cadence of our major releases. I believe that one of
> > the
> > > >> proposed changes in the Airflow 3 document i.e. the clear separation
> > for
> > > >> Task Execution will help here, but more may be needed.
> > > >>
> > > >> Definitely interested in more feedback on this as well.
> > > >>
> > > >> Vikram
> > > >>
> > > >>
> > > >> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin <
> > andrey.ans...@taragol.is
> > > >
> > > >> wrote:
> > > >>
> > > >> > I would like to propose to change (at least discuss) release
> policy
> > > >> around
> > > >> > the Major version of Airflow.
> > > >> >
> > > >> > Right now it is described as "These releases do not happen with
> any
> > > >> regular
> > > >> > interval or on any predictable schedule." :
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4b

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-06 Thread Jarek Potiuk
. Many users have
>workflows that interact with the DB, either by using raw sql or by
>leveraging a session object. We could implement this feature in Airflow
> 2
>and avoid breaking existing workflows by continuing to have the old
>standard mode as default - much of the work is already done - but that
>would mean supporting both the new secure mode and the old standard mode
>indefinitely and design new features with the assumption that most will
>continue using the old standard mode. With Airflow 3, we can make secure
>mode the default or even the only option, simplifying implementation and
>future development. This is just one example where it is feasible to
>implement in Airflow 2, but is better if we release it under the
> context of
>Airflow 3.
>4. *Future-Proofing for New Features:* Airflow 3 will open up
>possibilities for handling workflows beyond batch processing. Features
> like
>real-time DAG execution through API and multi-language task support are
> big
>steps forward, significantly expanding Airflow’s utility.
>
>
> While integrating these updates into Airflow 2 might look less disruptive
> initially, the scale and nature of the required changes really support a
> move to Airflow 3. It’s not just about adding new features; it’s about
> setting up Airflow so that it continues to remain relevant for the next ten
> years.
>
> Constance
>
> On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor  wrote:
>
> > There's a lot of technical debt hiding in Airflow, especially the
> > scheduler that makes it harder and harder to efficiently add new
> features.
> >
> > At some point, very soon, we are going to have to remove some very
> > infrequently used back compat shims that negatively affect performance.
> > Without doing that the pace at which we can realistically add some of the
> > more exciting features tends towards zero. Developer speed of
> contributors
> > is a factor here too!
> >
> > So while we are still using SemVer, that necessitates v3.
> >
> > Ash
> >
> > On 6 May 2024 15:30:49 BST, "Michał Modras"  .INVALID>
> > wrote:
> > >+1 to Jens's & Bolke's points here and in the doc
> > >
> > >I agree we should work on clarifying the directions we would like
> Airflow
> > >to go. Introducing a new major Airflow version is a massive overhead for
> > >users, who would need to plan for migrations, onboarding the new Airflow
> > >(with a slightly different architecture), etc., and effectively Airflow
> 2
> > >would live in parallel for a long time.
> > >
> > >Personally, I think most of the points in Kaxil's/Vikram's doc are
> > valuable
> > >projects of their own, and I could imagine all of them being delivered
> as
> > >separate AIPs within Airflow 2 (surely new minor versions of Airflow
> 2). I
> > >am not sure if the scope of changes and the goal we want to achieve is
> a)
> > >clear enough b) broad enough to call for a new major version.
> > >
> > >Best,
> > >Michal
> > >
> > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> > > wrote:
> > >
> > >> Thanks for the document write-up, Kaxil. I assume this is mostly a
> > vision
> > >> statement.
> > >>
> > >> Looking forward for a larger addendum where we can collect things that
> > we
> > >> all can vote and agree on as targets.
> > >>
> > >> As I started earlier with a confluence page and it seems this is not
> > >> accessible to all, shall we convert this to a Google Doc for better
> > >> collaboration and item collection?
> > >>
> > >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > >> 
> > >> From: Vikram Koka 
> > >> Sent: Sunday, May 5, 2024 3:34:33 AM
> > >> To: dev@airflow.apache.org 
> > >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > >> strategic (Airflow 3) approach
> > >>
> > >> Thank you for your feedback, Bolke and Andrey!
> > >>
> > >> Bolke,
> > >> I have replied to some of your comments in the doc.
> > >> I will provide a detailed write up on the "Interactive DAG run" (or
> > >> synchronous DAG run) capability, which has generated some early
> > questions.
> > >> I had intended to get an AIP published for that as a follow-up, but I
> > >> believe that a sim

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-06 Thread Constance Martineau
n collect things that
> we
> >> all can vote and agree on as targets.
> >>
> >> As I started earlier with a confluence page and it seems this is not
> >> accessible to all, shall we convert this to a Google Doc for better
> >> collaboration and item collection?
> >>
> >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> >> 
> >> From: Vikram Koka 
> >> Sent: Sunday, May 5, 2024 3:34:33 AM
> >> To: dev@airflow.apache.org 
> >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> >> strategic (Airflow 3) approach
> >>
> >> Thank you for your feedback, Bolke and Andrey!
> >>
> >> Bolke,
> >> I have replied to some of your comments in the doc.
> >> I will provide a detailed write up on the "Interactive DAG run" (or
> >> synchronous DAG run) capability, which has generated some early
> questions.
> >> I had intended to get an AIP published for that as a follow-up, but I
> >> believe that a simpler write up would be useful ahead of the AIP.
> >>
> >> Andrey,
> >> You raise an interesting point.
> >>
> >> As part of the Airflow 2.0 release, we as a community had decided to
> >> strictly adhere to Semver as detailed in the document you referenced. We
> >> also consciously split out the "Core Airflow" releases from the
> "Provider"
> >> releases at that time. We had a clear expectation then for the cadence
> of
> >> both minor and patch releases, which we have generally adhered to since
> >> then.
> >>
> >> Personally, I am more concerned about our Provider releases right now,
> as
> >> compared to the cadence of our major releases. I believe that one of the
> >> proposed changes in the Airflow 3 document i.e. the clear separation for
> >> Task Execution will help here, but more may be needed.
> >>
> >> Definitely interested in more feedback on this as well.
> >>
> >> Vikram
> >>
> >>
> >> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin  >
> >> wrote:
> >>
> >> > I would like to propose to change (at least discuss) release policy
> >> around
> >> > the Major version of Airflow.
> >> >
> >> > Right now it is described as "These releases do not happen with any
> >> regular
> >> > interval or on any predictable schedule." :
> >> >
> >> >
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D=0
> >> <
> >>
> https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release
> >> >
> >> >
> >> > So maybe it is time to make it schedulable, e.g. one per two years or
> so.
> >> > This one could help us to avoid such a discussion in the future, like
> "We
> >> > don't know when Airflow 4 is coming.". At the moment when the new
> major
> >> > version will be released new features wouldn't be added in the old
> major
> >> > version, however we would support bug / security for a while, e.g. 1
> year
> >> > for bug fixes, 3 years for security fixes with a total 5 year
> lifecycle
> >> per
> >> > a major version. These just are approximate time periods for a
> definition
> >> > of current period, bugfix period and security fix period.
> >> >
> >> > In contributors' perspective it helps with dropping the deprecated
> stuff
> >> > which resolves some old problem: we have to support everything
> including
> >> > deprecated stuff and without schedulable lifecycle for the deprecated
> >> stuff
> >> > it could be showstopper for the new feature, because sometimes it
> hard to
> >> > support two different approaches for long period of time with no hope
> >> that
> >> > it will happen soon. For some fundamental stuff which do not require a
> >> lot
> >> > things time to support we could postponed removal for next after the
> next
> >> > release, e.g. deprecate in A

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-06 Thread Ash Berlin-Taylor
There's a lot of technical debt hiding in Airflow, especially the scheduler 
that makes it harder and harder to efficiently add new features.

At some point, very soon, we are going to have to remove some very infrequently 
used back compat shims that negatively affect performance. Without doing that 
the pace at which we can realistically add some of the more exciting features 
tends towards zero. Developer speed of contributors is a factor here too!

So while we are still using SemVer, that necessitates v3.

Ash 

On 6 May 2024 15:30:49 BST, "Michał Modras"  
wrote:
>+1 to Jens's & Bolke's points here and in the doc
>
>I agree we should work on clarifying the directions we would like Airflow
>to go. Introducing a new major Airflow version is a massive overhead for
>users, who would need to plan for migrations, onboarding the new Airflow
>(with a slightly different architecture), etc., and effectively Airflow 2
>would live in parallel for a long time.
>
>Personally, I think most of the points in Kaxil's/Vikram's doc are valuable
>projects of their own, and I could imagine all of them being delivered as
>separate AIPs within Airflow 2 (surely new minor versions of Airflow 2). I
>am not sure if the scope of changes and the goal we want to achieve is a)
>clear enough b) broad enough to call for a new major version.
>
>Best,
>Michal
>
>On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> wrote:
>
>> Thanks for the document write-up, Kaxil. I assume this is mostly a vision
>> statement.
>>
>> Looking forward for a larger addendum where we can collect things that we
>> all can vote and agree on as targets.
>>
>> As I started earlier with a confluence page and it seems this is not
>> accessible to all, shall we convert this to a Google Doc for better
>> collaboration and item collection?
>>
>> Sent from Outlook for iOS<https://aka.ms/o0ukef>
>> ________
>> From: Vikram Koka 
>> Sent: Sunday, May 5, 2024 3:34:33 AM
>> To: dev@airflow.apache.org 
>> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
>> strategic (Airflow 3) approach
>>
>> Thank you for your feedback, Bolke and Andrey!
>>
>> Bolke,
>> I have replied to some of your comments in the doc.
>> I will provide a detailed write up on the "Interactive DAG run" (or
>> synchronous DAG run) capability, which has generated some early questions.
>> I had intended to get an AIP published for that as a follow-up, but I
>> believe that a simpler write up would be useful ahead of the AIP.
>>
>> Andrey,
>> You raise an interesting point.
>>
>> As part of the Airflow 2.0 release, we as a community had decided to
>> strictly adhere to Semver as detailed in the document you referenced. We
>> also consciously split out the "Core Airflow" releases from the "Provider"
>> releases at that time. We had a clear expectation then for the cadence of
>> both minor and patch releases, which we have generally adhered to since
>> then.
>>
>> Personally, I am more concerned about our Provider releases right now, as
>> compared to the cadence of our major releases. I believe that one of the
>> proposed changes in the Airflow 3 document i.e. the clear separation for
>> Task Execution will help here, but more may be needed.
>>
>> Definitely interested in more feedback on this as well.
>>
>> Vikram
>>
>>
>> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin 
>> wrote:
>>
>> > I would like to propose to change (at least discuss) release policy
>> around
>> > the Major version of Airflow.
>> >
>> > Right now it is described as "These releases do not happen with any
>> regular
>> > interval or on any predictable schedule." :
>> >
>> >
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D=0
>> <
>> https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release
>> >
>> >
>> > So maybe it is time to make it schedulable, e.g. one per two years or so.
>> > This one could help us to avoid such a discussion in the future, like "We
>> > don't know when Airflow 4 is

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-06 Thread Michał Modras
+1 to Jens's & Bolke's points here and in the doc

I agree we should work on clarifying the directions we would like Airflow
to go. Introducing a new major Airflow version is a massive overhead for
users, who would need to plan for migrations, onboarding the new Airflow
(with a slightly different architecture), etc., and effectively Airflow 2
would live in parallel for a long time.

Personally, I think most of the points in Kaxil's/Vikram's doc are valuable
projects of their own, and I could imagine all of them being delivered as
separate AIPs within Airflow 2 (surely new minor versions of Airflow 2). I
am not sure if the scope of changes and the goal we want to achieve is a)
clear enough b) broad enough to call for a new major version.

Best,
Michal

On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
 wrote:

> Thanks for the document write-up, Kaxil. I assume this is mostly a vision
> statement.
>
> Looking forward for a larger addendum where we can collect things that we
> all can vote and agree on as targets.
>
> As I started earlier with a confluence page and it seems this is not
> accessible to all, shall we convert this to a Google Doc for better
> collaboration and item collection?
>
> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> 
> From: Vikram Koka 
> Sent: Sunday, May 5, 2024 3:34:33 AM
> To: dev@airflow.apache.org 
> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> strategic (Airflow 3) approach
>
> Thank you for your feedback, Bolke and Andrey!
>
> Bolke,
> I have replied to some of your comments in the doc.
> I will provide a detailed write up on the "Interactive DAG run" (or
> synchronous DAG run) capability, which has generated some early questions.
> I had intended to get an AIP published for that as a follow-up, but I
> believe that a simpler write up would be useful ahead of the AIP.
>
> Andrey,
> You raise an interesting point.
>
> As part of the Airflow 2.0 release, we as a community had decided to
> strictly adhere to Semver as detailed in the document you referenced. We
> also consciously split out the "Core Airflow" releases from the "Provider"
> releases at that time. We had a clear expectation then for the cadence of
> both minor and patch releases, which we have generally adhered to since
> then.
>
> Personally, I am more concerned about our Provider releases right now, as
> compared to the cadence of our major releases. I believe that one of the
> proposed changes in the Airflow 3 document i.e. the clear separation for
> Task Execution will help here, but more may be needed.
>
> Definitely interested in more feedback on this as well.
>
> Vikram
>
>
> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin 
> wrote:
>
> > I would like to propose to change (at least discuss) release policy
> around
> > the Major version of Airflow.
> >
> > Right now it is described as "These releases do not happen with any
> regular
> > interval or on any predictable schedule." :
> >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D=0
> <
> https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release
> >
> >
> > So maybe it is time to make it schedulable, e.g. one per two years or so.
> > This one could help us to avoid such a discussion in the future, like "We
> > don't know when Airflow 4 is coming.". At the moment when the new major
> > version will be released new features wouldn't be added in the old major
> > version, however we would support bug / security for a while, e.g. 1 year
> > for bug fixes, 3 years for security fixes with a total 5 year lifecycle
> per
> > a major version. These just are approximate time periods for a definition
> > of current period, bugfix period and security fix period.
> >
> > In contributors' perspective it helps with dropping the deprecated stuff
> > which resolves some old problem: we have to support everything including
> > deprecated stuff and without schedulable lifecycle for the deprecated
> stuff
> > it could be showstopper for the new feature, because sometimes it hard to
> > support two different approaches for long period of time with no hope
> that
> > it will happen soon.

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-05 Thread Scheffler Jens (XC-AS/EAE-ADA-T)
Thanks for the document write-up, Kaxil. I assume this is mostly a vision 
statement.

Looking forward for a larger addendum where we can collect things that we all 
can vote and agree on as targets.

As I started earlier with a confluence page and it seems this is not accessible 
to all, shall we convert this to a Google Doc for better collaboration and item 
collection?

Sent from Outlook for iOS<https://aka.ms/o0ukef>

From: Vikram Koka 
Sent: Sunday, May 5, 2024 3:34:33 AM
To: dev@airflow.apache.org 
Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic 
(Airflow 3) approach

Thank you for your feedback, Bolke and Andrey!

Bolke,
I have replied to some of your comments in the doc.
I will provide a detailed write up on the "Interactive DAG run" (or
synchronous DAG run) capability, which has generated some early questions.
I had intended to get an AIP published for that as a follow-up, but I
believe that a simpler write up would be useful ahead of the AIP.

Andrey,
You raise an interesting point.

As part of the Airflow 2.0 release, we as a community had decided to
strictly adhere to Semver as detailed in the document you referenced. We
also consciously split out the "Core Airflow" releases from the "Provider"
releases at that time. We had a clear expectation then for the cadence of
both minor and patch releases, which we have generally adhered to since
then.

Personally, I am more concerned about our Provider releases right now, as
compared to the cadence of our major releases. I believe that one of the
proposed changes in the Airflow 3 document i.e. the clear separation for
Task Execution will help here, but more may be needed.

Definitely interested in more feedback on this as well.

Vikram


On Sat, May 4, 2024 at 10:57 AM Andrey Anshin 
wrote:

> I would like to propose to change (at least discuss) release policy around
> the Major version of Airflow.
>
> Right now it is described as "These releases do not happen with any regular
> interval or on any predictable schedule." :
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D=0<https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release>
>
> So maybe it is time to make it schedulable, e.g. one per two years or so.
> This one could help us to avoid such a discussion in the future, like "We
> don't know when Airflow 4 is coming.". At the moment when the new major
> version will be released new features wouldn't be added in the old major
> version, however we would support bug / security for a while, e.g. 1 year
> for bug fixes, 3 years for security fixes with a total 5 year lifecycle per
> a major version. These just are approximate time periods for a definition
> of current period, bugfix period and security fix period.
>
> In contributors' perspective it helps with dropping the deprecated stuff
> which resolves some old problem: we have to support everything including
> deprecated stuff and without schedulable lifecycle for the deprecated stuff
> it could be showstopper for the new feature, because sometimes it hard to
> support two different approaches for long period of time with no hope that
> it will happen soon. For some fundamental stuff which do not require a lot
> things time to support we could postponed removal for next after the next
> release, e.g. deprecate in Airflow 3, but remove it in Airflow 5
>
> In the user perspective, they have at least bug fix support for a while, if
> someone want to use legacy version it their choice, however no new
> features, no new version of providers (after one year)
>
>
> 
> Best Wishes
> *Andrey Anshin*
>
>
>
> On Sat, 4 May 2024 at 19:17, Bolke de Bruin  wrote:
>
> > I have left several comments :-). And on interactive dag runs even after
> > the explanation of Vikram I still don't have a clue what we want to
> > accomplish there :-P.
> >
> > I would like to see a mantra or team for Airflow 3. That helps nudging
> > people in the same direction. Suggestions in the comments.
> >
> > Bolke
> > Sent from my iPhone
> >
> > > On 4 May 2024, at 01:14, Vikram Koka 
> > wrote:
> > >
> > > Good point Jed.
> > > I responded back to your comment in the doc as well and very open to
> > > changing the term in the doc.
> > >

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-04 Thread Vikram Koka
Thank you for your feedback, Bolke and Andrey!

Bolke,
I have replied to some of your comments in the doc.
I will provide a detailed write up on the "Interactive DAG run" (or
synchronous DAG run) capability, which has generated some early questions.
I had intended to get an AIP published for that as a follow-up, but I
believe that a simpler write up would be useful ahead of the AIP.

Andrey,
You raise an interesting point.

As part of the Airflow 2.0 release, we as a community had decided to
strictly adhere to Semver as detailed in the document you referenced. We
also consciously split out the "Core Airflow" releases from the "Provider"
releases at that time. We had a clear expectation then for the cadence of
both minor and patch releases, which we have generally adhered to since
then.

Personally, I am more concerned about our Provider releases right now, as
compared to the cadence of our major releases. I believe that one of the
proposed changes in the Airflow 3 document i.e. the clear separation for
Task Execution will help here, but more may be needed.

Definitely interested in more feedback on this as well.

Vikram


On Sat, May 4, 2024 at 10:57 AM Andrey Anshin 
wrote:

> I would like to propose to change (at least discuss) release policy around
> the Major version of Airflow.
>
> Right now it is described as "These releases do not happen with any regular
> interval or on any predictable schedule." :
>
> https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release
>
> So maybe it is time to make it schedulable, e.g. one per two years or so.
> This one could help us to avoid such a discussion in the future, like "We
> don't know when Airflow 4 is coming.". At the moment when the new major
> version will be released new features wouldn't be added in the old major
> version, however we would support bug / security for a while, e.g. 1 year
> for bug fixes, 3 years for security fixes with a total 5 year lifecycle per
> a major version. These just are approximate time periods for a definition
> of current period, bugfix period and security fix period.
>
> In contributors' perspective it helps with dropping the deprecated stuff
> which resolves some old problem: we have to support everything including
> deprecated stuff and without schedulable lifecycle for the deprecated stuff
> it could be showstopper for the new feature, because sometimes it hard to
> support two different approaches for long period of time with no hope that
> it will happen soon. For some fundamental stuff which do not require a lot
> things time to support we could postponed removal for next after the next
> release, e.g. deprecate in Airflow 3, but remove it in Airflow 5
>
> In the user perspective, they have at least bug fix support for a while, if
> someone want to use legacy version it their choice, however no new
> features, no new version of providers (after one year)
>
>
> 
> Best Wishes
> *Andrey Anshin*
>
>
>
> On Sat, 4 May 2024 at 19:17, Bolke de Bruin  wrote:
>
> > I have left several comments :-). And on interactive dag runs even after
> > the explanation of Vikram I still don't have a clue what we want to
> > accomplish there :-P.
> >
> > I would like to see a mantra or team for Airflow 3. That helps nudging
> > people in the same direction. Suggestions in the comments.
> >
> > Bolke
> > Sent from my iPhone
> >
> > > On 4 May 2024, at 01:14, Vikram Koka 
> > wrote:
> > >
> > > Good point Jed.
> > > I responded back to your comment in the doc as well and very open to
> > > changing the term in the doc.
> > >
> > > Used the term "interactive DAG run" as the ability to invoke or
> trigger a
> > > DAG run through the API, with the expectation of getting back a result
> > > immediately. An alternate term could be a "synchronous DAG run".
> > >
> > > Regardless, this is a significant change so a good term to indicate the
> > > expansion from "batch runs only" is warranted. Very open to different
> > terms
> > > here.
> > >
> > >> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham <
> jedcunning...@apache.org
> > >
> > >> wrote:
> > >>
> > >> Very exciting! Looks like we will have a busy period of time ahead of
> > us.
> > >> Overall I like the plan so far, especially using this year's Airflow
> > Summit
> > >> as an opportunity to announce and gather feedback, and the 2025
> version
> > to
> > >> pitch upgrading.
> > >>
> > >> I left a comment in the doc, but we might want to iterate on the
> > >> terminology we use for high priority or "synchronous" DAG runs to
> serve
> > LLM
> > >> responses - I find "interactive DAG runs" a bit confusing.
> > >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-04 Thread Andrey Anshin
I would like to propose to change (at least discuss) release policy around
the Major version of Airflow.

Right now it is described as "These releases do not happen with any regular
interval or on any predictable schedule." :
https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release

So maybe it is time to make it schedulable, e.g. one per two years or so.
This one could help us to avoid such a discussion in the future, like "We
don't know when Airflow 4 is coming.". At the moment when the new major
version will be released new features wouldn't be added in the old major
version, however we would support bug / security for a while, e.g. 1 year
for bug fixes, 3 years for security fixes with a total 5 year lifecycle per
a major version. These just are approximate time periods for a definition
of current period, bugfix period and security fix period.

In contributors' perspective it helps with dropping the deprecated stuff
which resolves some old problem: we have to support everything including
deprecated stuff and without schedulable lifecycle for the deprecated stuff
it could be showstopper for the new feature, because sometimes it hard to
support two different approaches for long period of time with no hope that
it will happen soon. For some fundamental stuff which do not require a lot
things time to support we could postponed removal for next after the next
release, e.g. deprecate in Airflow 3, but remove it in Airflow 5

In the user perspective, they have at least bug fix support for a while, if
someone want to use legacy version it their choice, however no new
features, no new version of providers (after one year)



Best Wishes
*Andrey Anshin*



On Sat, 4 May 2024 at 19:17, Bolke de Bruin  wrote:

> I have left several comments :-). And on interactive dag runs even after
> the explanation of Vikram I still don't have a clue what we want to
> accomplish there :-P.
>
> I would like to see a mantra or team for Airflow 3. That helps nudging
> people in the same direction. Suggestions in the comments.
>
> Bolke
> Sent from my iPhone
>
> > On 4 May 2024, at 01:14, Vikram Koka 
> wrote:
> >
> > Good point Jed.
> > I responded back to your comment in the doc as well and very open to
> > changing the term in the doc.
> >
> > Used the term "interactive DAG run" as the ability to invoke or trigger a
> > DAG run through the API, with the expectation of getting back a result
> > immediately. An alternate term could be a "synchronous DAG run".
> >
> > Regardless, this is a significant change so a good term to indicate the
> > expansion from "batch runs only" is warranted. Very open to different
> terms
> > here.
> >
> >> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham  >
> >> wrote:
> >>
> >> Very exciting! Looks like we will have a busy period of time ahead of
> us.
> >> Overall I like the plan so far, especially using this year's Airflow
> Summit
> >> as an opportunity to announce and gather feedback, and the 2025 version
> to
> >> pitch upgrading.
> >>
> >> I left a comment in the doc, but we might want to iterate on the
> >> terminology we use for high priority or "synchronous" DAG runs to serve
> LLM
> >> responses - I find "interactive DAG runs" a bit confusing.
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-04 Thread Bolke de Bruin
I have left several comments :-). And on interactive dag runs even after the 
explanation of Vikram I still don't have a clue what we want to accomplish 
there :-P.

I would like to see a mantra or team for Airflow 3. That helps nudging people 
in the same direction. Suggestions in the comments.

Bolke
Sent from my iPhone

> On 4 May 2024, at 01:14, Vikram Koka  wrote:
> 
> Good point Jed.
> I responded back to your comment in the doc as well and very open to
> changing the term in the doc.
> 
> Used the term "interactive DAG run" as the ability to invoke or trigger a
> DAG run through the API, with the expectation of getting back a result
> immediately. An alternate term could be a "synchronous DAG run".
> 
> Regardless, this is a significant change so a good term to indicate the
> expansion from "batch runs only" is warranted. Very open to different terms
> here.
> 
>> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham 
>> wrote:
>> 
>> Very exciting! Looks like we will have a busy period of time ahead of us.
>> Overall I like the plan so far, especially using this year's Airflow Summit
>> as an opportunity to announce and gather feedback, and the 2025 version to
>> pitch upgrading.
>> 
>> I left a comment in the doc, but we might want to iterate on the
>> terminology we use for high priority or "synchronous" DAG runs to serve LLM
>> responses - I find "interactive DAG runs" a bit confusing.
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-03 Thread Vikram Koka
Good point Jed.
I responded back to your comment in the doc as well and very open to
changing the term in the doc.

Used the term "interactive DAG run" as the ability to invoke or trigger a
DAG run through the API, with the expectation of getting back a result
immediately. An alternate term could be a "synchronous DAG run".

Regardless, this is a significant change so a good term to indicate the
expansion from "batch runs only" is warranted. Very open to different terms
here.

On Fri, May 3, 2024 at 4:05 PM Jed Cunningham 
wrote:

> Very exciting! Looks like we will have a busy period of time ahead of us.
> Overall I like the plan so far, especially using this year's Airflow Summit
> as an opportunity to announce and gather feedback, and the 2025 version to
> pitch upgrading.
>
> I left a comment in the doc, but we might want to iterate on the
> terminology we use for high priority or "synchronous" DAG runs to serve LLM
> responses - I find "interactive DAG runs" a bit confusing.
>


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-03 Thread Jed Cunningham
Very exciting! Looks like we will have a busy period of time ahead of us.
Overall I like the plan so far, especially using this year's Airflow Summit
as an opportunity to announce and gather feedback, and the 2025 version to
pitch upgrading.

I left a comment in the doc, but we might want to iterate on the
terminology we use for high priority or "synchronous" DAG runs to serve LLM
responses - I find "interactive DAG runs" a bit confusing.


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-03 Thread Kaxil Naik
t; > > > points etc. on this page we can have a rather short (2h) call with
> > > > contributors in the next time to pitch and discuss the points and
> > define
> > > > follow-up steps to a plan, vote and conclusion.
> > > >
> > > > Proposed Confluence discussion page:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3.0+Discussion+and+Planning
> > > >
> > > > As a starting point I tried to import the both emails I saw in the
> > thread
> > > > into the page as starter. As it is a call to collaborate, please
> start
> > > > editing and drop your points as well.
> > > >
> > > > Towards Jarek's mentioned trigger points:
> > > > Actually the dropped AIP-68 and AIP-69 are something that in my view
> do
> > > > NOT require Airflow to get to 3.0. I would see them either "Tactical"
> > or
> > > > "just functional enhancements". AIP-68 is "just" a bit of sugar to UI
> > and
> > > > extensions to Plugin interface in my view. AIP-69 is basically
> building
> > > > something on-top, based on the concept of Hybrid Executors. As long
> as
> > we
> > > > would assume AIP-69 does not need drastical changes, maybe only small
> > > > adjustments in the core (but concept not elaborated yet). I see this
> > > mainly
> > > > as "just another Executor" that should not need breaking changes. I
> did
> > > not
> > > > want to drop these two AIP's to start a fundamental discussion but
> > rather
> > > > to bring-in a new feature each.
> > > > The points as factors that are hard to achieve in Airflow 2.x world
> are
> > > > rather the "Multi Tenancy/Team" and "Dag Versioning" which in my eyes
> > > might
> > > > be able to move faster with a 3.0.
> > > >
> > > > P.S.: I do not get the point (yet?) Why GenAI is a trigger point that
> > > > forced structural breaking changes?
> > > >
> > > > Mit freundlichen Grüßen / Best regards
> > > >
> > > > Jens Scheffler
> > > >
> > > > Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> > > > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> > > > GERMANY | www.bosch.com
> > > > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > > > jens.scheff...@de.bosch.com
> > > >
> > > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > > > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > > > Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr.
> Markus
> > > > Forschner,
> > > > Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> > > >
> > > > -Original Message-
> > > > From: Vikram Koka 
> > > > Sent: Saturday, April 20, 2024 6:23 PM
> > > > To: dev@airflow.apache.org
> > > > Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > > > strategic (Airflow 3) approach
> > > >
> > > > A wonderful and exciting Saturday morning discussion!
> > > > Thank you Jarek for bringing the offline conversations into the
> mailing
> > > > list.
> > > >
> > > > I completely agree on the necessity of Airflow 3.
> > > > I also agree that Gen AI is the trigger i.e. the answer to "Why now"?
> > > >
> > > > Having been thinking about this for a while from a strategic
> > perspective,
> > > > as opposed to the tactical perspective of the bi-weekly and monthly
> > > > releases, I believe that our thinking as you articulated should have
> a
> > > > clear understanding of strategic vs. tactical, but I don't believe
> our
> > > > execution needs to necessarily be either or, but can actually be
> > blended.
> > > >
> > > > With that said,  I believe that there are the following four buckets
> > that
> > > > we should use as a framework for Airflow 3.
> > > >
> > > > 1. Gen AI / LLM support
> > > > 2. Airflow User Improvements
> > > > 3. Easy adoption of Airflow by new users 4. Integration improvements
> /
> > > > Provider maintainability
> > > >
> > > > Describing them in more detail below:
> >

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-04-28 Thread Amogh Desai
t; > > consider.
> > >
> > > I believe the discussion will take a moment and focus - and a
> > Reply-to-all
> > > chain will not be a good path as we will lose a lot of detail and focus
> > and
> > > emails will create a lot of noise which is hard to follow. In a perfect
> > > non-distributed world I'd call you to a half-day visioning workshop in
> a
> > > room and focus on the whiteboard. Not possible with this level of
> > > distribution. Next option would be a (large) ~4h conference call which
> is
> > > hard to make in a time-zone matching the sleep cycle for all. Perfect
> > would
> > > be if Summit would be close-by and plan a 1/2 day or full-day breakout
> > for
> > > contributors on Day4 or so. But September is far far away.
> > >
> > > Therefore - to reduce amount of emails - I propose to start points,
> > ideas,
> > > pain points etc. first on a Confluence page. Therefore I tried to start
> > one
> > > page as starting points (contrary ideas welcome!) to have a place to
> > > collaborate and sketch. A virtual whiteboard would also be OK but I had
> > > none at my hands to share... (like Miro, Mural etc.). If we collect
> > ideas,
> > > points etc. on this page we can have a rather short (2h) call with
> > > contributors in the next time to pitch and discuss the points and
> define
> > > follow-up steps to a plan, vote and conclusion.
> > >
> > > Proposed Confluence discussion page:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3.0+Discussion+and+Planning
> > >
> > > As a starting point I tried to import the both emails I saw in the
> thread
> > > into the page as starter. As it is a call to collaborate, please start
> > > editing and drop your points as well.
> > >
> > > Towards Jarek's mentioned trigger points:
> > > Actually the dropped AIP-68 and AIP-69 are something that in my view do
> > > NOT require Airflow to get to 3.0. I would see them either "Tactical"
> or
> > > "just functional enhancements". AIP-68 is "just" a bit of sugar to UI
> and
> > > extensions to Plugin interface in my view. AIP-69 is basically building
> > > something on-top, based on the concept of Hybrid Executors. As long as
> we
> > > would assume AIP-69 does not need drastical changes, maybe only small
> > > adjustments in the core (but concept not elaborated yet). I see this
> > mainly
> > > as "just another Executor" that should not need breaking changes. I did
> > not
> > > want to drop these two AIP's to start a fundamental discussion but
> rather
> > > to bring-in a new feature each.
> > > The points as factors that are hard to achieve in Airflow 2.x world are
> > > rather the "Multi Tenancy/Team" and "Dag Versioning" which in my eyes
> > might
> > > be able to move faster with a 3.0.
> > >
> > > P.S.: I do not get the point (yet?) Why GenAI is a trigger point that
> > > forced structural breaking changes?
> > >
> > > Mit freundlichen Grüßen / Best regards
> > >
> > > Jens Scheffler
> > >
> > > Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> > > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> > > GERMANY | www.bosch.com
> > > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > > jens.scheff...@de.bosch.com
> > >
> > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > > Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
> > > Forschner,
> > > Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> > >
> > > -Original Message-
> > > From: Vikram Koka 
> > > Sent: Saturday, April 20, 2024 6:23 PM
> > > To: dev@airflow.apache.org
> > > Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > > strategic (Airflow 3) approach
> > >
> > > A wonderful and exciting Saturday morning discussion!
> > > Thank you Jarek for bringing the offline conversations into the mailing
> > > list.
> > >
> > > I completely agree on the necessity of Airflow 3.
> > > I also agree that Gen AI is the trigger i.e. the answer to "Why now"?
> > >
> > > Having been thinking about this for a while

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-04-22 Thread Jarek Potiuk
I tried to import the both emails I saw in the thread
> > into the page as starter. As it is a call to collaborate, please start
> > editing and drop your points as well.
> >
> > Towards Jarek's mentioned trigger points:
> > Actually the dropped AIP-68 and AIP-69 are something that in my view do
> > NOT require Airflow to get to 3.0. I would see them either "Tactical" or
> > "just functional enhancements". AIP-68 is "just" a bit of sugar to UI and
> > extensions to Plugin interface in my view. AIP-69 is basically building
> > something on-top, based on the concept of Hybrid Executors. As long as we
> > would assume AIP-69 does not need drastical changes, maybe only small
> > adjustments in the core (but concept not elaborated yet). I see this
> mainly
> > as "just another Executor" that should not need breaking changes. I did
> not
> > want to drop these two AIP's to start a fundamental discussion but rather
> > to bring-in a new feature each.
> > The points as factors that are hard to achieve in Airflow 2.x world are
> > rather the "Multi Tenancy/Team" and "Dag Versioning" which in my eyes
> might
> > be able to move faster with a 3.0.
> >
> > P.S.: I do not get the point (yet?) Why GenAI is a trigger point that
> > forced structural breaking changes?
> >
> > Mit freundlichen Grüßen / Best regards
> >
> > Jens Scheffler
> >
> > Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> > GERMANY | www.bosch.com
> > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > jens.scheff...@de.bosch.com
> >
> > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
> > Forschner,
> > Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
> >
> > -Original Message-
> > From: Vikram Koka 
> > Sent: Saturday, April 20, 2024 6:23 PM
> > To: dev@airflow.apache.org
> > Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > strategic (Airflow 3) approach
> >
> > A wonderful and exciting Saturday morning discussion!
> > Thank you Jarek for bringing the offline conversations into the mailing
> > list.
> >
> > I completely agree on the necessity of Airflow 3.
> > I also agree that Gen AI is the trigger i.e. the answer to "Why now"?
> >
> > Having been thinking about this for a while from a strategic perspective,
> > as opposed to the tactical perspective of the bi-weekly and monthly
> > releases, I believe that our thinking as you articulated should have a
> > clear understanding of strategic vs. tactical, but I don't believe our
> > execution needs to necessarily be either or, but can actually be blended.
> >
> > With that said,  I believe that there are the following four buckets that
> > we should use as a framework for Airflow 3.
> >
> > 1. Gen AI / LLM support
> > 2. Airflow User Improvements
> > 3. Easy adoption of Airflow by new users 4. Integration improvements /
> > Provider maintainability
> >
> > Describing them in more detail below:
> > 1. Gen AI / LLM support
> > Reiterating the fact that this needs more work, I do believe this can be
> > incremental to Airflow. As Astronomer, we have worked on the LLM
> Providers
> > which we contributed to Airflow late last year. But clearly, there is so
> > more to do, both from building awareness of the patterns / templates to
> > use, as well as patterns to support in Airflow to make these easier to
> use
> > and adopt.
> >
> > 2. Airflow User Improvements
> > Clearly features and improvements desired by the Community are important
> > to continue to work on to make Airflow more approachable. The top two
> > features which leap to mind for me here are:
> > 2.1 DAG Versioning - the most requested feature in the Airflow User
> Survey,
> > 2.2 Modern UI - also comes up a lot
> > 2.3 Different DAG distribution processes
> > 2.4 Different execution mechanisms
> > I know there are many more which I don't currently recall.
> >
> > 3. Airflow adoption
> > We have discussed this many times, but we absolutely need to make the
> > individual first-time adoption of Airflow better.
> > I think the most common term I recall here is the notion of "Airflow
> > Standalone", but whatever the term may be, an ultra quick, simpl

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-04-21 Thread Kaxil Naik
, knowing what are
> the next steps. Naturally I was shocked a bit. So I decided to have a sleep
> over it. (Might be a bit shocking because Jarek dropped it  especially I
> heard his position about a 3.0 before and that always sounded to me like a
> strong position... haha)
> After having a sleep-over the post I think it is valid to raise the
> discussion. Especially as we are going to a 10th feature-release which was
> also a cut-over from 1.x to 2.x. At some point every software product needs
> a re-factoring and cleanup. Structures are never perfect. But a lot of
> emotions and work are included with such a step. And a risk to fail and to
> lose a lot of users and force them to migrate (or have them run-away). So
> my current outcome is: We should carefully consider. But we need to
> consider.
>
> I believe the discussion will take a moment and focus - and a Reply-to-all
> chain will not be a good path as we will lose a lot of detail and focus and
> emails will create a lot of noise which is hard to follow. In a perfect
> non-distributed world I'd call you to a half-day visioning workshop in a
> room and focus on the whiteboard. Not possible with this level of
> distribution. Next option would be a (large) ~4h conference call which is
> hard to make in a time-zone matching the sleep cycle for all. Perfect would
> be if Summit would be close-by and plan a 1/2 day or full-day breakout for
> contributors on Day4 or so. But September is far far away.
>
> Therefore - to reduce amount of emails - I propose to start points, ideas,
> pain points etc. first on a Confluence page. Therefore I tried to start one
> page as starting points (contrary ideas welcome!) to have a place to
> collaborate and sketch. A virtual whiteboard would also be OK but I had
> none at my hands to share... (like Miro, Mural etc.). If we collect ideas,
> points etc. on this page we can have a rather short (2h) call with
> contributors in the next time to pitch and discuss the points and define
> follow-up steps to a plan, vote and conclusion.
>
> Proposed Confluence discussion page:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3.0+Discussion+and+Planning
>
> As a starting point I tried to import the both emails I saw in the thread
> into the page as starter. As it is a call to collaborate, please start
> editing and drop your points as well.
>
> Towards Jarek's mentioned trigger points:
> Actually the dropped AIP-68 and AIP-69 are something that in my view do
> NOT require Airflow to get to 3.0. I would see them either "Tactical" or
> "just functional enhancements". AIP-68 is "just" a bit of sugar to UI and
> extensions to Plugin interface in my view. AIP-69 is basically building
> something on-top, based on the concept of Hybrid Executors. As long as we
> would assume AIP-69 does not need drastical changes, maybe only small
> adjustments in the core (but concept not elaborated yet). I see this mainly
> as "just another Executor" that should not need breaking changes. I did not
> want to drop these two AIP's to start a fundamental discussion but rather
> to bring-in a new feature each.
> The points as factors that are hard to achieve in Airflow 2.x world are
> rather the "Multi Tenancy/Team" and "Dag Versioning" which in my eyes might
> be able to move faster with a 3.0.
>
> P.S.: I do not get the point (yet?) Why GenAI is a trigger point that
> forced structural breaking changes?
>
> Mit freundlichen Grüßen / Best regards
>
> Jens Scheffler
>
> Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
> Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> GERMANY | www.bosch.com
> Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> jens.scheff...@de.bosch.com
>
> Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus
> Forschner,
> Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert
>
> -Original Message-
> From: Vikram Koka 
> Sent: Saturday, April 20, 2024 6:23 PM
> To: dev@airflow.apache.org
> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> strategic (Airflow 3) approach
>
> A wonderful and exciting Saturday morning discussion!
> Thank you Jarek for bringing the offline conversations into the mailing
> list.
>
> I completely agree on the necessity of Airflow 3.
> I also agree that Gen AI is the trigger i.e. the answer to "Why now"?
>
> Having been thinking about this for a while from a strategic perspective,
> as opposed to the tactical perspective of the bi-weekly and monthly
>

RE: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-04-21 Thread Scheffler Jens (XC-AS/EAE-ADA-T)
Hi Developers,

TLDR Summary: I propose to move the discussion from a Email Replay-to-all chain 
to a discussion collection in https://cwiki.apache.org/confluence/x/hQv9EQ

When I first saw this email from Jarek I was a bit surprised and actually the 
email was pulling me out of a kind of comfort zone, knowing what are the next 
steps. Naturally I was shocked a bit. So I decided to have a sleep over it. 
(Might be a bit shocking because Jarek dropped it  especially I heard his 
position about a 3.0 before and that always sounded to me like a strong 
position... haha)
After having a sleep-over the post I think it is valid to raise the discussion. 
Especially as we are going to a 10th feature-release which was also a cut-over 
from 1.x to 2.x. At some point every software product needs a re-factoring and 
cleanup. Structures are never perfect. But a lot of emotions and work are 
included with such a step. And a risk to fail and to lose a lot of users and 
force them to migrate (or have them run-away). So my current outcome is: We 
should carefully consider. But we need to consider.

I believe the discussion will take a moment and focus - and a Reply-to-all 
chain will not be a good path as we will lose a lot of detail and focus and 
emails will create a lot of noise which is hard to follow. In a perfect 
non-distributed world I'd call you to a half-day visioning workshop in a room 
and focus on the whiteboard. Not possible with this level of distribution. Next 
option would be a (large) ~4h conference call which is hard to make in a 
time-zone matching the sleep cycle for all. Perfect would be if Summit would be 
close-by and plan a 1/2 day or full-day breakout for contributors on Day4 or 
so. But September is far far away.

Therefore - to reduce amount of emails - I propose to start points, ideas, pain 
points etc. first on a Confluence page. Therefore I tried to start one page as 
starting points (contrary ideas welcome!) to have a place to collaborate and 
sketch. A virtual whiteboard would also be OK but I had none at my hands to 
share... (like Miro, Mural etc.). If we collect ideas, points etc. on this page 
we can have a rather short (2h) call with contributors in the next time to 
pitch and discuss the points and define follow-up steps to a plan, vote and 
conclusion.

Proposed Confluence discussion page:
https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3.0+Discussion+and+Planning

As a starting point I tried to import the both emails I saw in the thread into 
the page as starter. As it is a call to collaborate, please start editing and 
drop your points as well.

Towards Jarek's mentioned trigger points: 
Actually the dropped AIP-68 and AIP-69 are something that in my view do NOT 
require Airflow to get to 3.0. I would see them either "Tactical" or "just 
functional enhancements". AIP-68 is "just" a bit of sugar to UI and extensions 
to Plugin interface in my view. AIP-69 is basically building something on-top, 
based on the concept of Hybrid Executors. As long as we would assume AIP-69 
does not need drastical changes, maybe only small adjustments in the core (but 
concept not elaborated yet). I see this mainly as "just another Executor" that 
should not need breaking changes. I did not want to drop these two AIP's to 
start a fundamental discussion but rather to bring-in a new feature each.
The points as factors that are hard to achieve in Airflow 2.x world are rather 
the "Multi Tenancy/Team" and "Dag Versioning" which in my eyes might be able to 
move faster with a 3.0.

P.S.: I do not get the point (yet?) Why GenAI is a trigger point that forced 
structural breaking changes?

Mit freundlichen Grüßen / Best regards

Jens Scheffler

Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T)
Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | GERMANY | 
www.bosch.com
Tel. +49 711 811-91508 | Mobil +49 160 90417410 | jens.scheff...@de.bosch.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; 
Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus 
Forschner, 
Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert

-Original Message-
From: Vikram Koka  
Sent: Saturday, April 20, 2024 6:23 PM
To: dev@airflow.apache.org
Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic 
(Airflow 3) approach

A wonderful and exciting Saturday morning discussion!
Thank you Jarek for bringing the offline conversations into the mailing list.

I completely agree on the necessity of Airflow 3.
I also agree that Gen AI is the trigger i.e. the answer to "Why now"?

Having been thinking about this for a while from a strategic perspective, as 
opposed to the tactical perspective of the bi-weekly and monthly releases, I 
believe that our thinking as you articulated should have a clear under

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-04-20 Thread Vikram Koka
A wonderful and exciting Saturday morning discussion!
Thank you Jarek for bringing the offline conversations into the mailing
list.

I completely agree on the necessity of Airflow 3.
I also agree that Gen AI is the trigger i.e. the answer to "Why now"?

Having been thinking about this for a while from a strategic perspective,
as opposed to the
tactical perspective of the bi-weekly and monthly releases, I believe that
our thinking as you articulated should have a
clear understanding of strategic vs. tactical, but I don't believe our
execution needs to necessarily be either or, but can actually
be blended.

With that said,  I believe that there are the following four buckets that
we should use as a framework for Airflow 3.

1. Gen AI / LLM support
2. Airflow User Improvements
3. Easy adoption of Airflow by new users
4. Integration improvements / Provider maintainability

Describing them in more detail below:
1. Gen AI / LLM support
Reiterating the fact that this needs more work, I do believe this can be
incremental to Airflow. As Astronomer,
we have worked on the LLM Providers which we contributed to Airflow late
last year. But clearly, there is so more to do,
both from building awareness of the patterns / templates to use, as well as
patterns to support in Airflow to make these
easier to use and adopt.

2. Airflow User Improvements
Clearly features and improvements desired by the Community are important to
continue to work on to make Airflow more approachable. The top two features
which leap to mind for me here are:
2.1 DAG Versioning - the most requested feature in the Airflow User Survey,
2.2 Modern UI - also comes up a lot
2.3 Different DAG distribution processes
2.4 Different execution mechanisms
I know there are many more which I don't currently recall.

3. Airflow adoption
We have discussed this many times, but we absolutely need to make the
individual first-time adoption of Airflow better.
I think the most common term I recall here is the notion of "Airflow
Standalone", but whatever the term may be, an
ultra quick, simple install of Airflow and the getting started experience
is something we owe our community.

4. Integration / Providers
The changes we made as part of Airflow 2.0 to split the Core Airflow
releases from the Provider releases was clearly
a good choice and made a huge impact. However, the integration
maintainability balanced with growth still seems like it could
use a significant set of improvements. Elad and I spoke about this a couple
of days ago as well and I don't have a clear
set of next steps here, but definitely worth exploring.

Some of us at Astronomer have been discussing this quite a bit and planning
on bringing a more polished draft to the community, but an initial
discussion on a Saturday is fun as well :). We will definitely share our
Airflow 3 proposal as a document with the community within the next week,
as a request for comment.



On Sat, Apr 20, 2024 at 1:50 AM Jarek Potiuk  wrote:

> Hello here,
>
> I have been thinking a lot recently and discussing with some people and I
> am more and more convinced it's about the time we - as a community - should
> start doing changes considering "Airflow 2" current and "Airflow 3" future.
>
>
> *TL;DR: I think we should seriously start work on Airflow 3 and decide what
> it means for our AIPs  - to treat some of them as more "tactical" - things
> that should go into Airflow 2 and some "strategic" ones - being
> foundational for Airflow 3 - with different goals and criteria.*
>
> A lot of us already think that way and a lot of us have already talked
> about it for quite some time, so you should treat my mail mostly as a
> little trigger "let's start publicly discussing what it might mean for us
> and our community and let's make it clear about the target of the
> initiatives we do".
>
> Some might be surprised it comes from me as I've been often saying "no
> Airflow 3 without a good reason" or "possibly we will have no Airflow 3",
> but I think (and a number of people I spoke to have similar opinion) we
> have plenty of reasons to make some bold moves now.
>
> Over the last 4 years since Airflow 2 was out, a lot has changed and we
> have a number of different needs that current Airflow 2 cannot **really**
> do well
>
> - LLM/Gen-AI mainly as the important trigger
> - Cloud Native is the "way to go"
> - need to submit DAGs in other ways than dropping them to a shared DAG
> folder.
> - local testing and fast iteration on developing pipelines.
> - ability to run tasks with workflow with "affinity" so that they can share
> inputs/outputs in shared CPU/GPU memory
> - ability to integrate seamlessly with other workflow engines - making
> Airflow a "workflow of workflows
> - probably way more
> - all that while keeping a lot of the strengths of Airflow 2 - such as
> continuing to have the option of using the many thousands of operators with
> 90+ providers.
>
> All those above - we could implement better if we get rid of a number of
> 

[HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-04-20 Thread Jarek Potiuk
Hello here,

I have been thinking a lot recently and discussing with some people and I
am more and more convinced it's about the time we - as a community - should
start doing changes considering "Airflow 2" current and "Airflow 3" future.


*TL;DR: I think we should seriously start work on Airflow 3 and decide what
it means for our AIPs  - to treat some of them as more "tactical" - things
that should go into Airflow 2 and some "strategic" ones - being
foundational for Airflow 3 - with different goals and criteria.*

A lot of us already think that way and a lot of us have already talked
about it for quite some time, so you should treat my mail mostly as a
little trigger "let's start publicly discussing what it might mean for us
and our community and let's make it clear about the target of the
initiatives we do".

Some might be surprised it comes from me as I've been often saying "no
Airflow 3 without a good reason" or "possibly we will have no Airflow 3",
but I think (and a number of people I spoke to have similar opinion) we
have plenty of reasons to make some bold moves now.

Over the last 4 years since Airflow 2 was out, a lot has changed and we
have a number of different needs that current Airflow 2 cannot **really**
do well

- LLM/Gen-AI mainly as the important trigger
- Cloud Native is the "way to go"
- need to submit DAGs in other ways than dropping them to a shared DAG
folder.
- local testing and fast iteration on developing pipelines.
- ability to run tasks with workflow with "affinity" so that they can share
inputs/outputs in shared CPU/GPU memory
- ability to integrate seamlessly with other workflow engines - making
Airflow a "workflow of workflows
- probably way more
- all that while keeping a lot of the strengths of Airflow 2 - such as
continuing to have the option of using the many thousands of operators with
90+ providers.

All those above - we could implement better if we get rid of a number of
the implicit or explicit luggage we have in Airflow 2. I think the last two
proposals from Jens: AIP-68 and AIP-69 reflect very much that - both  would
have been much easier and straightforward if we got Airflow 3 re-designed
basically at a drawing board with boldly dropping some Airflow 3
assumptions.
And if we implemented core airflow 3 - taking the best part of what we have
now in Airflow 2, but generally dropping the luggage  in a new framework.

And it won't be possible without breaking some fundamental assumptions and
making Airflow 3 quite heavily incompatible with Airflow 2

>From "my" camp - dropping the need of having the 700+ dependencies for
Airflow + all providers in a single Python interpreter, dropinnig
dependency on Flask/Plugins/FAB would be a huge win on its own. Not
mentioning being able to split provider's development and contribution from
airflow core (while keeping the development of providers as well and
contributions) - this has been highly requested.

And I think we have a lot of people in our community who would be able (and
would love) to do it - I think a number of us (including myself) are a bit
burned out and tired of just maintaining things in Airflow in a
backwards-compatible way and would jump on the opportunity to
rebuilding Airflow.

But - we of course cannot forget about Airflow 2 users. We do not want to
"stop the world" for them. We want to keep fixing things and adding
incremental changes - and those things do not necessarily super
"future-proof". They should help  to "keep the lights on" for a while -
which means that in a number of cases it could be "band-aid". AIP-44
(internal-API), AIP-67 (multi-team) are more of those.

So - what I think we might want to do as a community:

* start working on Airflow 3 foundations (and decide what it means for our
users and developer community). Decide what to keep, what to drop, what to
redesign, assumptions to recreate.

* explicitly split the initiatives/AIPs we have to target Airflow 2 and
Airflow 3 and treat them a bit differently in terms of future-proofness

I would love to hear your thoughts on that (bracing for the storm of those).

J.