Hi Michal, Thanks for your thoughts on the Airflow 3 proposal. I appreciate your concerns about the migration overhead for our users with a major new version and see the appeal in your suggestion to integrate many of the proposed changes into Airflow 2 through separate AIPs. It’s a valid point and certainly aligns with the value of making incremental improvements.
However, after looking closely at the enhancements outlined for Airflow 3, I'm convinced they warrant a new major release. Here’s why: 1. *Core Architectural Changes:* We’re looking at foundational changes with Airflow 3—like redefining task priorities, separating task definition and task execution, and new AIPs like DAG versioning. remote execution and restricting database access from workers. These aren’t just incremental improvements but major shifts that will set the stage for the next decade of Airflow’s architecture. Grouping these changes into a major release will help us make these transitions more cleanly and with fewer constraints from past decisions. 2. *Code Clean-Up*: Our main branch has accumulated over 140 deprecated issues, and this will only grow if we continue without a major cleanup. This makes it increasingly difficult to implement new features effectively while maintaining backward compatibility. A major release allows us to address these issues head-on, reducing technical debt and paving the way for a more robust platform. 3. *Managing Breaking Changes:* Let’s take the example of restricting database access from workers. It’s a necessary move for better security and also potentially scalability reasons (reduces DB load). Many users have workflows that interact with the DB, either by using raw sql or by leveraging a session object. We could implement this feature in Airflow 2 and avoid breaking existing workflows by continuing to have the old standard mode as default - much of the work is already done - but that would mean supporting both the new secure mode and the old standard mode indefinitely and design new features with the assumption that most will continue using the old standard mode. With Airflow 3, we can make secure mode the default or even the only option, simplifying implementation and future development. This is just one example where it is feasible to implement in Airflow 2, but is better if we release it under the context of Airflow 3. 4. *Future-Proofing for New Features:* Airflow 3 will open up possibilities for handling workflows beyond batch processing. Features like real-time DAG execution through API and multi-language task support are big steps forward, significantly expanding Airflow’s utility. While integrating these updates into Airflow 2 might look less disruptive initially, the scale and nature of the required changes really support a move to Airflow 3. It’s not just about adding new features; it’s about setting up Airflow so that it continues to remain relevant for the next ten years. Constance On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor <a...@apache.org> wrote: > There's a lot of technical debt hiding in Airflow, especially the > scheduler that makes it harder and harder to efficiently add new features. > > At some point, very soon, we are going to have to remove some very > infrequently used back compat shims that negatively affect performance. > Without doing that the pace at which we can realistically add some of the > more exciting features tends towards zero. Developer speed of contributors > is a factor here too! > > So while we are still using SemVer, that necessitates v3. > > Ash > > On 6 May 2024 15:30:49 BST, "Michał Modras" <michalmod...@google.com.INVALID> > wrote: > >+1 to Jens's & Bolke's points here and in the doc > > > >I agree we should work on clarifying the directions we would like Airflow > >to go. Introducing a new major Airflow version is a massive overhead for > >users, who would need to plan for migrations, onboarding the new Airflow > >(with a slightly different architecture), etc., and effectively Airflow 2 > >would live in parallel for a long time. > > > >Personally, I think most of the points in Kaxil's/Vikram's doc are > valuable > >projects of their own, and I could imagine all of them being delivered as > >separate AIPs within Airflow 2 (surely new minor versions of Airflow 2). I > >am not sure if the scope of changes and the goal we want to achieve is a) > >clear enough b) broad enough to call for a new major version. > > > >Best, > >Michal > > > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T) > ><jens.scheff...@de.bosch.com.invalid> wrote: > > > >> Thanks for the document write-up, Kaxil. I assume this is mostly a > vision > >> statement. > >> > >> Looking forward for a larger addendum where we can collect things that > we > >> all can vote and agree on as targets. > >> > >> As I started earlier with a confluence page and it seems this is not > >> accessible to all, shall we convert this to a Google Doc for better > >> collaboration and item collection? > >> > >> Sent from Outlook for iOS<https://aka.ms/o0ukef> > >> ________________________________ > >> From: Vikram Koka <vik...@astronomer.io.INVALID> > >> Sent: Sunday, May 5, 2024 3:34:33 AM > >> To: dev@airflow.apache.org <dev@airflow.apache.org> > >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs > >> strategic (Airflow 3) approach > >> > >> Thank you for your feedback, Bolke and Andrey! > >> > >> Bolke, > >> I have replied to some of your comments in the doc. > >> I will provide a detailed write up on the "Interactive DAG run" (or > >> synchronous DAG run) capability, which has generated some early > questions. > >> I had intended to get an AIP published for that as a follow-up, but I > >> believe that a simpler write up would be useful ahead of the AIP. > >> > >> Andrey, > >> You raise an interesting point. > >> > >> As part of the Airflow 2.0 release, we as a community had decided to > >> strictly adhere to Semver as detailed in the document you referenced. We > >> also consciously split out the "Core Airflow" releases from the > "Provider" > >> releases at that time. We had a clear expectation then for the cadence > of > >> both minor and patch releases, which we have generally adhered to since > >> then. > >> > >> Personally, I am more concerned about our Provider releases right now, > as > >> compared to the cadence of our major releases. I believe that one of the > >> proposed changes in the Airflow 3 document i.e. the clear separation for > >> Task Execution will help here, but more may be needed. > >> > >> Definitely interested in more feedback on this as well. > >> > >> Vikram > >> > >> > >> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin <andrey.ans...@taragol.is > > > >> wrote: > >> > >> > I would like to propose to change (at least discuss) release policy > >> around > >> > the Major version of Airflow. > >> > > >> > Right now it is described as "These releases do not happen with any > >> regular > >> > interval or on any predictable schedule." : > >> > > >> > > >> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D&reserved=0 > >> < > >> > https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release > >> > > >> > > >> > So maybe it is time to make it schedulable, e.g. one per two years or > so. > >> > This one could help us to avoid such a discussion in the future, like > "We > >> > don't know when Airflow 4 is coming.". At the moment when the new > major > >> > version will be released new features wouldn't be added in the old > major > >> > version, however we would support bug / security for a while, e.g. 1 > year > >> > for bug fixes, 3 years for security fixes with a total 5 year > lifecycle > >> per > >> > a major version. These just are approximate time periods for a > definition > >> > of current period, bugfix period and security fix period. > >> > > >> > In contributors' perspective it helps with dropping the deprecated > stuff > >> > which resolves some old problem: we have to support everything > including > >> > deprecated stuff and without schedulable lifecycle for the deprecated > >> stuff > >> > it could be showstopper for the new feature, because sometimes it > hard to > >> > support two different approaches for long period of time with no hope > >> that > >> > it will happen soon. For some fundamental stuff which do not require a > >> lot > >> > things time to support we could postponed removal for next after the > next > >> > release, e.g. deprecate in Airflow 3, but remove it in Airflow 5 > >> > > >> > In the user perspective, they have at least bug fix support for a > while, > >> if > >> > someone want to use legacy version it their choice, however no new > >> > features, no new version of providers (after one year) > >> > > >> > > >> > ---- > >> > Best Wishes > >> > *Andrey Anshin* > >> > > >> > > >> > > >> > On Sat, 4 May 2024 at 19:17, Bolke de Bruin <bdbr...@gmail.com> > wrote: > >> > > >> > > I have left several comments :-). And on interactive dag runs even > >> after > >> > > the explanation of Vikram I still don't have a clue what we want to > >> > > accomplish there :-P. > >> > > > >> > > I would like to see a mantra or team for Airflow 3. That helps > nudging > >> > > people in the same direction. Suggestions in the comments. > >> > > > >> > > Bolke > >> > > Sent from my iPhone > >> > > > >> > > > On 4 May 2024, at 01:14, Vikram Koka <vik...@astronomer.io.invalid > > > >> > > wrote: > >> > > > > >> > > > Good point Jed. > >> > > > I responded back to your comment in the doc as well and very open > to > >> > > > changing the term in the doc. > >> > > > > >> > > > Used the term "interactive DAG run" as the ability to invoke or > >> > trigger a > >> > > > DAG run through the API, with the expectation of getting back a > >> result > >> > > > immediately. An alternate term could be a "synchronous DAG run". > >> > > > > >> > > > Regardless, this is a significant change so a good term to > indicate > >> the > >> > > > expansion from "batch runs only" is warranted. Very open to > different > >> > > terms > >> > > > here. > >> > > > > >> > > >> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham < > >> > jedcunning...@apache.org > >> > > > > >> > > >> wrote: > >> > > >> > >> > > >> Very exciting! Looks like we will have a busy period of time > ahead > >> of > >> > > us. > >> > > >> Overall I like the plan so far, especially using this year's > Airflow > >> > > Summit > >> > > >> as an opportunity to announce and gather feedback, and the 2025 > >> > version > >> > > to > >> > > >> pitch upgrading. > >> > > >> > >> > > >> I left a comment in the doc, but we might want to iterate on the > >> > > >> terminology we use for high priority or "synchronous" DAG runs to > >> > serve > >> > > LLM > >> > > >> responses - I find "interactive DAG runs" a bit confusing. > >> > > >> > >> > > > >> > > > --------------------------------------------------------------------- > >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> > > For additional commands, e-mail: dev-h...@airflow.apache.org > >> > > > >> > > > >> > > >> >