Thank you Jarek for the detailed input. I've taken some time to digest your
points before responding.

You've outlined a bold vision for Airflow 3, and I agree that being
decisive about the features and architectures will be the key to success.
However, before we make final decisions on what features to cut or retain,
it would be beneficial to have a more comprehensive understanding of how
the current features are utilized by the open-source community.

@Kaxil Naik <ka...@astronomer.io> recently initiated a discussion on
collecting telemetry from open-source deployments:
https://lists.apache.org/thread/7f6qyr8w2n8w34g63s7ybhzphgt8h43m. This data
could be critical in ensuring our decisions are well-informed and reflect
the real-world usage patterns of our users, not just those from managed
environments like Astro, MWAA or GCC.

It's essential that we challenge our assumptions and base our decisions on
a holistic view of feature usage. Identifying potential cuts is a critical
step, but let's ensure our strategy aligns with the needs and preferences
of the broader Airflow community.

On Mon, May 6, 2024 at 6:50 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I am currently on sick leave, and still recovering - hoping to be able to
> travel next week to the US as planned, so I just wanted to break out of it
> to make one comment here.
>
> I got a clearer head now a bit with medications hopefully working. I am
> still taking it that should help me to get over the current state, and I
> wanted to take a look at this discussion  unraveling first. Over last week
> I disconnected from "day-to-day" Airflow and put some thoughts (as much as
> I could in my current state) on it. The whole subject of this thread was
> started from that - how the current discussions on AIP-67 and others change
> if we consider Airflow 3 is "starting".
>
> The price for back-compat is speed of development and quality. More
> combinations to test, more unexpected issues uncovered, necessity to keep
> parallel paths (old/new) while adding new features. All what Constance
> wrote about and what Ash explained. We already started to trip over our own
> feet mutliple times in a few last releases. Have we tested all combinations
> of deployment in Airflow 2.8 and 2.9 - not really, I think we already see
> that in a number of "combos" of features things are not working in as
> stable a way as they did before.
>
> Airflow 3 is a bold move. We risk users will stay on Airflow 2 for a long
> time (or even move out) as they will not want to move to Airflow 3. A lot
> of the work implemented in AIP-44 and design of AIP-67 was done around
> back-compatibility. but yes -
> it would have been way easier if designed anew without back-compatibility
> in mind. And if we implement it and release it in Airflow 2 it will make
> new Airflow feature development even harder. That's why I wanted to treat
> it as "tactical" solution - hoping that in Airflow 3 we can make it
> "properly" - and that's why I started the discussion here when I sensed
> that we are "close" to Airflow 3 discussion, because I wanted to see what
> options we have there. This is why I have not yet concluded voting on
> AIP-67 waiting for the result of this discussion here.
>
> But if we are ready to go for Airflow.3 then I'd say there are two
> important things that should be part of the vision.
>
> 1)  *We should be far more opinionated and have far fewer options of
> running things in Airflow 3*. Even an order of magnitude more opinionated.
> Make choices, stick to it, perfect those opinionated choices to suit 80/20
> (or even 70/30 or maybe even 60/40) rule if you will. Risking not fitting
> the 20% that might choose to stay at Airflow 2. We can choose now which
> ~20% of cases we do not want to handle deliberately. And we should be very,
> very strict about it. Default should be "no choice". This will radically
> simplify deployment and should make it easier to simplify Airflow
> development and DAG authoring experience because we will have less cases to
> support. Even if we plan to add more options in the future, the first
> version of Airflow 3 should support one deployment approach only. This is
> the only way we can deliver it fast. And we should be very bold there.
> Choose one option and go for it in pretty much every place we have choices
> now. We should Aim for Airflow 3.0 to support only a subset of current
> users - but those who are most likely to migrate first and those with the
> biggest need for the new features. We can think 3.x to support more cases,
> but 3.0 should be as opinionated as humanly possible.
>
> And this deployment option should be also something ALL our stakeholders
> will feel OK with as a way forward in their offering.
>
> My candidates (and yes, some are bold):
>
> * *Drop MySQL*. If we have a single thing that makes us avoid our schema
> and DB migration - this is the case. Let's choose Postgres 15+ and use some
> of the great features there. This will also enable much faster async SQL
> implementation and a number of other optimisations - not to mention cutting
> every single change in development and testing time by literally half. And
> we should not look back to adding MySQL.
> * *Drop Celery/Sequential Executor* and start with Local + K8S only (and
> AWS/Google others can continue developing theirs of course in parallel and
> continue Hybrid executor work). Later - we figure out a better solution to
> support "small" tasks using some new K8S features and possibly non-k8s
> solutions (Ray-based?)
> * *Cut Connection and Variable Management from DB/UI*. Leave only Secrets
> Management. Later when we have a 100% extensible React UI, we can add a
> "local DB secrets manager" add-on
> * *Choose a single way for DAG storage that will support versioning from
> day one*. Bear in mind we can add others later. Bolke's idea of using
> FSspec is an interesting one, we should see if it is feasible.
> * *Drop FAB completely (including custom plugins) and invest in
> implementing Auth Manager based on a dedicated, external solution*
> (KeyCloak
> that we've discussed before as a likely candidate)
> * *Leave Providers with Airflow 2 and add tests to make sure they are
> Airflow 3 future-compatible *- develop a way where we continue development
> and contributions for Providers with Airflow 2 and add complete tests to
> run them with Airflow 3. This way we can continue developing Provider
> features independently, and make them work for Airflow 2 (and continue
> adding features for Airflow 2 users alongside Airflow 2 bugfixes), while
> also gradually fix any Airflow3 incompatibilities and instead of
> "back-compatibility" tests make provider "forward-compatibility" tests so
> that future Providers are tested and work on Airflow 3. Also it will make
> it easiest to continue Airflow 2 (bugfixes) + Providers tested without
> investing in changing the current CI / test harness.
> * *Simplify Test Harness for Airflow 3 from the start *- without providers
> and 790+ dependencies, we could vastly simplify Airflow3 testing (basically
> make CI jobs from scratch) using mostly standard Python tooling (while we
> can continue making use of the current test harness for Airflow 2 +
> Providers and extend it with Airflow 3 future-compatibility tests). That
> means Breeze would be only staying in Airflow 2 + Providers repo as we
> should be able to achieve most of what we have there with local venv/
> tooling (especially with uv as underlying tooling).
>
> 2) *I think we only add very few new "important" features. *Absolute
> minimum to make Airflow 3 appealing and add them only in Airflow 3:
> versioning, multi-team, pluggable UI should only be Airflow 3 - it makes no
> sense to invest into Airflow 2 if we already know Airflow 3 is coming -
> that generally triples effort needed to get them out. We should drop new
> features development in Airflow 2. This will give users incentive to move
> to 3 if the new features will be worth it. Even paying
> compatibility/migration price.
>
> Versionig, for example: I believe if we decide to go only with Airflow 3
> and cut some of the above (Postgres only, Single versioning DAG storage) we
> can make bolder decisions in versioning and support simpler models from the
> get go (and deliver it faster). And we should add only a few - but
> important - features that our users clearly asked for and focus on
> delivering Airflow 3 as soon as possible (instead of Airflow 2.10 or 2.11).
> Similarly - multi-team can be simplified if we cut things from the list
> above and have Task isolation as first-class citizens in Airflow (and the
> only option).
>
> My candidates very much concur with the list shared by Kaxil in the doc +
> I'd add multi-team (but simplified thanks to the cuts). But I also here
> would mostly revert to Astronomer, Google. AWS team to define collectively
> what is the absolute minimum set of features that would get the "target"
> part of their customers happy. And ONLY do that.
>
> So in short - I think the big part of our discussion should be what we are
> ready to drop when we start airflow 3 and be very bold. Once we know we
> should figure out the absolute minimum of things that we can add that will
> benefit a significant part of our users (and make use of increased speed
> because we dropped things).
>
> J.
>
>
> On Mon, May 6, 2024 at 8:40 PM Constance Martineau
> <consta...@astronomer.io.invalid> wrote:
>
> > Hi Michal,
> >
> > Thanks for your thoughts on the Airflow 3 proposal. I appreciate your
> > concerns about the migration overhead for our users with a major new
> > version and see the appeal in your suggestion to integrate many of the
> > proposed changes into Airflow 2 through separate AIPs. It’s a valid point
> > and certainly aligns with the value of making incremental improvements.
> >
> > However, after looking closely at the enhancements outlined for Airflow
> 3,
> > I'm convinced they warrant a new major release. Here’s why:
> >
> >    1. *Core Architectural Changes:* We’re looking at foundational changes
> >    with Airflow 3—like redefining task priorities, separating task
> > definition
> >    and task execution, and new AIPs like DAG versioning. remote execution
> >    and restricting database access from workers. These aren’t just
> > incremental
> >    improvements but major shifts that will set the stage for the next
> > decade
> >    of Airflow’s architecture. Grouping these changes into a major release
> > will
> >    help us make these transitions more cleanly and with fewer constraints
> > from
> >    past decisions.
> >    2. *Code Clean-Up*: Our main branch has accumulated over 140
> deprecated
> >    issues, and this will only grow if we continue without a major
> cleanup.
> >    This makes it increasingly difficult to implement new features
> > effectively
> >    while maintaining backward compatibility. A major release allows us to
> >    address these issues head-on, reducing technical debt and paving the
> way
> >    for a more robust platform.
> >    3. *Managing Breaking Changes:* Let’s take the example of restricting
> >    database access from workers. It’s a necessary move for better
> security
> > and
> >    also potentially scalability reasons (reduces DB load). Many users
> have
> >    workflows that interact with the DB, either by using raw sql or by
> >    leveraging a session object. We could implement this feature in
> Airflow
> > 2
> >    and avoid breaking existing workflows by continuing to have the old
> >    standard mode as default - much of the work is already done - but that
> >    would mean supporting both the new secure mode and the old standard
> mode
> >    indefinitely and design new features with the assumption that most
> will
> >    continue using the old standard mode. With Airflow 3, we can make
> secure
> >    mode the default or even the only option, simplifying implementation
> and
> >    future development. This is just one example where it is feasible to
> >    implement in Airflow 2, but is better if we release it under the
> > context of
> >    Airflow 3.
> >    4. *Future-Proofing for New Features:* Airflow 3 will open up
> >    possibilities for handling workflows beyond batch processing. Features
> > like
> >    real-time DAG execution through API and multi-language task support
> are
> > big
> >    steps forward, significantly expanding Airflow’s utility.
> >
> >
> > While integrating these updates into Airflow 2 might look less disruptive
> > initially, the scale and nature of the required changes really support a
> > move to Airflow 3. It’s not just about adding new features; it’s about
> > setting up Airflow so that it continues to remain relevant for the next
> ten
> > years.
> >
> > Constance
> >
> > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> > > There's a lot of technical debt hiding in Airflow, especially the
> > > scheduler that makes it harder and harder to efficiently add new
> > features.
> > >
> > > At some point, very soon, we are going to have to remove some very
> > > infrequently used back compat shims that negatively affect performance.
> > > Without doing that the pace at which we can realistically add some of
> the
> > > more exciting features tends towards zero. Developer speed of
> > contributors
> > > is a factor here too!
> > >
> > > So while we are still using SemVer, that necessitates v3.
> > >
> > > Ash
> > >
> > > On 6 May 2024 15:30:49 BST, "Michał Modras" <michalmod...@google.com
> > .INVALID>
> > > wrote:
> > > >+1 to Jens's & Bolke's points here and in the doc
> > > >
> > > >I agree we should work on clarifying the directions we would like
> > Airflow
> > > >to go. Introducing a new major Airflow version is a massive overhead
> for
> > > >users, who would need to plan for migrations, onboarding the new
> Airflow
> > > >(with a slightly different architecture), etc., and effectively
> Airflow
> > 2
> > > >would live in parallel for a long time.
> > > >
> > > >Personally, I think most of the points in Kaxil's/Vikram's doc are
> > > valuable
> > > >projects of their own, and I could imagine all of them being delivered
> > as
> > > >separate AIPs within Airflow 2 (surely new minor versions of Airflow
> > 2). I
> > > >am not sure if the scope of changes and the goal we want to achieve is
> > a)
> > > >clear enough b) broad enough to call for a new major version.
> > > >
> > > >Best,
> > > >Michal
> > > >
> > > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> > > ><jens.scheff...@de.bosch.com.invalid> wrote:
> > > >
> > > >> Thanks for the document write-up, Kaxil. I assume this is mostly a
> > > vision
> > > >> statement.
> > > >>
> > > >> Looking forward for a larger addendum where we can collect things
> that
> > > we
> > > >> all can vote and agree on as targets.
> > > >>
> > > >> As I started earlier with a confluence page and it seems this is not
> > > >> accessible to all, shall we convert this to a Google Doc for better
> > > >> collaboration and item collection?
> > > >>
> > > >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > > >> ________________________________
> > > >> From: Vikram Koka <vik...@astronomer.io.INVALID>
> > > >> Sent: Sunday, May 5, 2024 3:34:33 AM
> > > >> To: dev@airflow.apache.org <dev@airflow.apache.org>
> > > >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > > >> strategic (Airflow 3) approach
> > > >>
> > > >> Thank you for your feedback, Bolke and Andrey!
> > > >>
> > > >> Bolke,
> > > >> I have replied to some of your comments in the doc.
> > > >> I will provide a detailed write up on the "Interactive DAG run" (or
> > > >> synchronous DAG run) capability, which has generated some early
> > > questions.
> > > >> I had intended to get an AIP published for that as a follow-up, but
> I
> > > >> believe that a simpler write up would be useful ahead of the AIP.
> > > >>
> > > >> Andrey,
> > > >> You raise an interesting point.
> > > >>
> > > >> As part of the Airflow 2.0 release, we as a community had decided to
> > > >> strictly adhere to Semver as detailed in the document you
> referenced.
> > We
> > > >> also consciously split out the "Core Airflow" releases from the
> > > "Provider"
> > > >> releases at that time. We had a clear expectation then for the
> cadence
> > > of
> > > >> both minor and patch releases, which we have generally adhered to
> > since
> > > >> then.
> > > >>
> > > >> Personally, I am more concerned about our Provider releases right
> now,
> > > as
> > > >> compared to the cadence of our major releases. I believe that one of
> > the
> > > >> proposed changes in the Airflow 3 document i.e. the clear separation
> > for
> > > >> Task Execution will help here, but more may be needed.
> > > >>
> > > >> Definitely interested in more feedback on this as well.
> > > >>
> > > >> Vikram
> > > >>
> > > >>
> > > >> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin <
> > andrey.ans...@taragol.is
> > > >
> > > >> wrote:
> > > >>
> > > >> > I would like to propose to change (at least discuss) release
> policy
> > > >> around
> > > >> > the Major version of Airflow.
> > > >> >
> > > >> > Right now it is described as "These releases do not happen with
> any
> > > >> regular
> > > >> > interval or on any predictable schedule." :
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D&reserved=0
> > > >> <
> > > >>
> > >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release
> > > >> >
> > > >> >
> > > >> > So maybe it is time to make it schedulable, e.g. one per two years
> > or
> > > so.
> > > >> > This one could help us to avoid such a discussion in the future,
> > like
> > > "We
> > > >> > don't know when Airflow 4 is coming.". At the moment when the new
> > > major
> > > >> > version will be released new features wouldn't be added in the old
> > > major
> > > >> > version, however we would support bug / security for a while,
> e.g. 1
> > > year
> > > >> > for bug fixes, 3 years for security fixes with a total 5 year
> > > lifecycle
> > > >> per
> > > >> > a major version. These just are approximate time periods for a
> > > definition
> > > >> > of current period, bugfix period and security fix period.
> > > >> >
> > > >> > In contributors' perspective it helps with dropping the deprecated
> > > stuff
> > > >> > which resolves some old problem: we have to support everything
> > > including
> > > >> > deprecated stuff and without schedulable lifecycle for the
> > deprecated
> > > >> stuff
> > > >> > it could be showstopper for the new feature, because sometimes it
> > > hard to
> > > >> > support two different approaches for long period of time with no
> > hope
> > > >> that
> > > >> > it will happen soon. For some fundamental stuff which do not
> > require a
> > > >> lot
> > > >> > things time to support we could postponed removal for next after
> the
> > > next
> > > >> > release, e.g. deprecate in Airflow 3, but remove it in Airflow 5
> > > >> >
> > > >> > In the user perspective, they have at least bug fix support for a
> > > while,
> > > >> if
> > > >> > someone want to use legacy version it their choice, however no new
> > > >> > features, no new version of providers (after one year)
> > > >> >
> > > >> >
> > > >> > ----
> > > >> > Best Wishes
> > > >> > *Andrey Anshin*
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Sat, 4 May 2024 at 19:17, Bolke de Bruin <bdbr...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > I have left several comments :-). And on interactive dag runs
> even
> > > >> after
> > > >> > > the explanation of Vikram I still don't have a clue what we want
> > to
> > > >> > > accomplish there :-P.
> > > >> > >
> > > >> > > I would like to see a mantra or team for Airflow 3. That helps
> > > nudging
> > > >> > > people in the same direction. Suggestions in the comments.
> > > >> > >
> > > >> > > Bolke
> > > >> > > Sent from my iPhone
> > > >> > >
> > > >> > > > On 4 May 2024, at 01:14, Vikram Koka
> > <vik...@astronomer.io.invalid
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > Good point Jed.
> > > >> > > > I responded back to your comment in the doc as well and very
> > open
> > > to
> > > >> > > > changing the term in the doc.
> > > >> > > >
> > > >> > > > Used the term "interactive DAG run" as the ability to invoke
> or
> > > >> > trigger a
> > > >> > > > DAG run through the API, with the expectation of getting back
> a
> > > >> result
> > > >> > > > immediately. An alternate term could be a "synchronous DAG
> run".
> > > >> > > >
> > > >> > > > Regardless, this is a significant change so a good term to
> > > indicate
> > > >> the
> > > >> > > > expansion from "batch runs only" is warranted. Very open to
> > > different
> > > >> > > terms
> > > >> > > > here.
> > > >> > > >
> > > >> > > >> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham <
> > > >> > jedcunning...@apache.org
> > > >> > > >
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> Very exciting! Looks like we will have a busy period of time
> > > ahead
> > > >> of
> > > >> > > us.
> > > >> > > >> Overall I like the plan so far, especially using this year's
> > > Airflow
> > > >> > > Summit
> > > >> > > >> as an opportunity to announce and gather feedback, and the
> 2025
> > > >> > version
> > > >> > > to
> > > >> > > >> pitch upgrading.
> > > >> > > >>
> > > >> > > >> I left a comment in the doc, but we might want to iterate on
> > the
> > > >> > > >> terminology we use for high priority or "synchronous" DAG
> runs
> > to
> > > >> > serve
> > > >> > > LLM
> > > >> > > >> responses - I find "interactive DAG runs" a bit confusing.
> > > >> > > >>
> > > >> > >
> > > >> > >
> > > ---------------------------------------------------------------------
> > > >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > >> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Reply via email to