Just one comment here - while maybe "shocking" for some cases - yes, this one has been clearly coming. Actually, it took a lot of my brain cycles recently to think "what's next". Too much, to the point that I started the thread. I thought it might be quite a valuable opening from someone who always said "well, we have to have **really** good reason to do Airflow 3" and "maybe there will not be Airflow 3".
And I quite agree with Kaxil - that trying to organise our thoughts around what to do and how our Approach for Airflow 3 based on just this thread is a bit too early. I do not think this one thread here will lead to us deciding what to do - if we try to do it now in a discussion thread or even a confluence doc, we might fail achieving the goal. My main point here was to really get the feel and open thoughts of those who are actively involved in Airflow - on what we should do next. And to see if this is the right time to start thinking in "two" modes: Airflow 2 and Airflow 3 (even if we do not know yet what Airflow 3 will be). I'd rather let a free stream of thoughts of what people think should happen here continue. Merely opening our minds to the possibility of Airflow 3. And I would love to keep it flowing for others - without the goal of organizing it or achieving consensus. And I think all that Kaxil writes about - starting a series of calls, organizing our discussions, getting "product manager(s)" working on organizing those discussions is the **right** thing to do. How exactly to do that, how to make sure everyone is involved, while we are not tied up in endless discussions and bike-shedding, should materialize from our discussion. But I would propose (and encourage) others' thoughts here as well - just a free stream of those - then it might provide valuable feedback to what's next. J. On Mon, Apr 22, 2024 at 4:14 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Hello all, > > I didn't anticipate reading an Airflow 3 email from a sunny beach in Nice, > France <https://en.wikipedia.org/wiki/Nice> -- I had a great time there > over the weekend, highly recommended :D > > I say that because, as Vikram pointed out, some of us at Astronomer have > been polishing up the doc to propose Airflow 3 to the community in the > coming week. Such is the beauty of the open-source project that multiple > people (in the form of developers, committers, PMC members and various > Stakeholders) think the same. From the Astronomer front, Constance had been > championing a doc with Vikram & myself, with inputs from various other > committers & users, to have a good blend of different perspectives -- > Product, PMC member & Industry leaders that cover several areas from > User-facing pain points, Industry trends in the Orchestration space, > Innovation in AI & ML space & opportunities as well as maintainability of > the current codebase. We would love to share it this week that goes into > some of the details to share our perspective on Why Airflow 3.0 & Why now > etc. > > I would like to reiterate my statement from last year's panel session at > the Airflow Summit with Marc, Jarek & Pierre > <https://airflowsummit.org/sessions/2023/panels/panel-faces-airflow/>: > "We, > as the Airflow project, have maintained a great balance of Innovation & > Stability", and I truly believe in that and is clearly visible in the > number of downloads and Airflow's popularity as the leader in the Workflow > Orchestration space. Our industry is rapidly evolving, especially in the > Data, AI & ML space. The role and expectations of the Data Orchestrator > (more specialized than a generic Workflow Orchestrator) are also evolving > as it is more and more utilized for Business critical applications than > just powering dashboards. So, IMO, we must continue catering to those > use-cases and innovate to aid the new use cases. Some of us from Astronomer > would like to create AIPs around these new use-cases LLM/Gen-AI, Data > Awareness, and some of the other things discussed above in the coming weeks > to receive feedback from everyone. > > Apart from the new use cases around Data, AI & ML, balancing them with > resolving user pain points with things like DAG Versioning, a more modern > and extensible UI, lack of permissions with Airflow CLI, simplifying the > first-user Learning Curve etc while cleaning up Tech-debt (example we have > around 100+ deprecations in our code-base), a more performant scheduler > like the async SqlAlchemy & Scheduler discussions on the mailing list, > dropping dependency on FAB, rethinking provider/core separation -- both for > users & developers -- will make for a powerful 3.0 release that users will > want to update and will provide a cleaner code-base for the contributors to > build new foundational pieces. > > Needless to say, similar to Airflow 2.0, we will have to provide our users > with utilities like the Airflow Upgrade check > < > https://airflow.apache.org/docs/apache-airflow/stable/howto/upgrading-from-1-10/upgrade-check.html > > > script > & other tools and docs to ease the migration. > > Regarding the proposal to move the discussion to Confluence: In my opinion, > Confluence is a good place once things become more concrete and defined, > and we are looking for feedback. For a thing as big as Airflow 3, I would > humbly suggest the same route as what we did for Airflow 2 -- to have > a few recurring > Dev calls > <https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes> to > gather areas of interest and various upcoming AIPs from different > stakeholders and get aligned on Why Airflow 3? and Defining the high-level > scope > < > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158869992 > >. > This way, we can iterate much faster, and since all of the calls will be > recorded and summary notes will be added on Confluence (example > <https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes>) and > posted on the mailing list > <https://lists.apache.org/list?dev@airflow.apache.org:2020-9:dev%20call>, > we will have a written record of the important things. For any decision > points, we will bring things to the mailing list for Vote or Lazy > consensus. We could also re-use the Town Hall calls if needed. Once we have > good enough alignment and scope, utilizing Confluence and mailing list on > the individual items & AIPs would be more valuable, in my opinion. > > PS: It makes me very excited that we are discussing a new major phase of > Airflow. Looking forward to it. > > Regards, > Kaxil > > On Sun, 21 Apr 2024 at 17:21, Scheffler Jens (XC-AS/EAE-ADA-T) > <jens.scheff...@de.bosch.com.invalid> wrote: > > > Hi Developers, > > > > TLDR Summary: I propose to move the discussion from a Email Replay-to-all > > chain to a discussion collection in > > https://cwiki.apache.org/confluence/x/hQv9EQ > > > > When I first saw this email from Jarek I was a bit surprised and actually > > the email was pulling me out of a kind of comfort zone, knowing what are > > the next steps. Naturally I was shocked a bit. So I decided to have a > sleep > > over it. (Might be a bit shocking because Jarek dropped it 😃 especially > I > > heard his position about a 3.0 before and that always sounded to me like > a > > strong position... haha) > > After having a sleep-over the post I think it is valid to raise the > > discussion. Especially as we are going to a 10th feature-release which > was > > also a cut-over from 1.x to 2.x. At some point every software product > needs > > a re-factoring and cleanup. Structures are never perfect. But a lot of > > emotions and work are included with such a step. And a risk to fail and > to > > lose a lot of users and force them to migrate (or have them run-away). So > > my current outcome is: We should carefully consider. But we need to > > consider. > > > > I believe the discussion will take a moment and focus - and a > Reply-to-all > > chain will not be a good path as we will lose a lot of detail and focus > and > > emails will create a lot of noise which is hard to follow. In a perfect > > non-distributed world I'd call you to a half-day visioning workshop in a > > room and focus on the whiteboard. Not possible with this level of > > distribution. Next option would be a (large) ~4h conference call which is > > hard to make in a time-zone matching the sleep cycle for all. Perfect > would > > be if Summit would be close-by and plan a 1/2 day or full-day breakout > for > > contributors on Day4 or so. But September is far far away. > > > > Therefore - to reduce amount of emails - I propose to start points, > ideas, > > pain points etc. first on a Confluence page. Therefore I tried to start > one > > page as starting points (contrary ideas welcome!) to have a place to > > collaborate and sketch. A virtual whiteboard would also be OK but I had > > none at my hands to share... (like Miro, Mural etc.). If we collect > ideas, > > points etc. on this page we can have a rather short (2h) call with > > contributors in the next time to pitch and discuss the points and define > > follow-up steps to a plan, vote and conclusion. > > > > Proposed Confluence discussion page: > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3.0+Discussion+and+Planning > > > > As a starting point I tried to import the both emails I saw in the thread > > into the page as starter. As it is a call to collaborate, please start > > editing and drop your points as well. > > > > Towards Jarek's mentioned trigger points: > > Actually the dropped AIP-68 and AIP-69 are something that in my view do > > NOT require Airflow to get to 3.0. I would see them either "Tactical" or > > "just functional enhancements". AIP-68 is "just" a bit of sugar to UI and > > extensions to Plugin interface in my view. AIP-69 is basically building > > something on-top, based on the concept of Hybrid Executors. As long as we > > would assume AIP-69 does not need drastical changes, maybe only small > > adjustments in the core (but concept not elaborated yet). I see this > mainly > > as "just another Executor" that should not need breaking changes. I did > not > > want to drop these two AIP's to start a fundamental discussion but rather > > to bring-in a new feature each. > > The points as factors that are hard to achieve in Airflow 2.x world are > > rather the "Multi Tenancy/Team" and "Dag Versioning" which in my eyes > might > > be able to move faster with a 3.0. > > > > P.S.: I do not get the point (yet?) Why GenAI is a trigger point that > > forced structural breaking changes? > > > > Mit freundlichen Grüßen / Best regards > > > > Jens Scheffler > > > > Alliance: Enabler - Tech Lead (XC-AS/EAE-ADA-T) > > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen | > > GERMANY | www.bosch.com > > Tel. +49 711 811-91508 | Mobil +49 160 90417410 | > > jens.scheff...@de.bosch.com > > > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; > > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; > > Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Dr. Markus > > Forschner, > > Stefan Grosch, Dr. Markus Heyn, Dr. Frank Meyer, Dr. Tanja Rückert > > > > -----Original Message----- > > From: Vikram Koka <vik...@astronomer.io.INVALID> > > Sent: Saturday, April 20, 2024 6:23 PM > > To: dev@airflow.apache.org > > Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs > > strategic (Airflow 3) approach > > > > A wonderful and exciting Saturday morning discussion! > > Thank you Jarek for bringing the offline conversations into the mailing > > list. > > > > I completely agree on the necessity of Airflow 3. > > I also agree that Gen AI is the trigger i.e. the answer to "Why now"? > > > > Having been thinking about this for a while from a strategic perspective, > > as opposed to the tactical perspective of the bi-weekly and monthly > > releases, I believe that our thinking as you articulated should have a > > clear understanding of strategic vs. tactical, but I don't believe our > > execution needs to necessarily be either or, but can actually be blended. > > > > With that said, I believe that there are the following four buckets that > > we should use as a framework for Airflow 3. > > > > 1. Gen AI / LLM support > > 2. Airflow User Improvements > > 3. Easy adoption of Airflow by new users 4. Integration improvements / > > Provider maintainability > > > > Describing them in more detail below: > > 1. Gen AI / LLM support > > Reiterating the fact that this needs more work, I do believe this can be > > incremental to Airflow. As Astronomer, we have worked on the LLM > Providers > > which we contributed to Airflow late last year. But clearly, there is so > > more to do, both from building awareness of the patterns / templates to > > use, as well as patterns to support in Airflow to make these easier to > use > > and adopt. > > > > 2. Airflow User Improvements > > Clearly features and improvements desired by the Community are important > > to continue to work on to make Airflow more approachable. The top two > > features which leap to mind for me here are: > > 2.1 DAG Versioning - the most requested feature in the Airflow User > Survey, > > 2.2 Modern UI - also comes up a lot > > 2.3 Different DAG distribution processes > > 2.4 Different execution mechanisms > > I know there are many more which I don't currently recall. > > > > 3. Airflow adoption > > We have discussed this many times, but we absolutely need to make the > > individual first-time adoption of Airflow better. > > I think the most common term I recall here is the notion of "Airflow > > Standalone", but whatever the term may be, an ultra quick, simple install > > of Airflow and the getting started experience is something we owe our > > community. > > > > 4. Integration / Providers > > The changes we made as part of Airflow 2.0 to split the Core Airflow > > releases from the Provider releases was clearly a good choice and made a > > huge impact. However, the integration maintainability balanced with > growth > > still seems like it could use a significant set of improvements. Elad > and I > > spoke about this a couple of days ago as well and I don't have a clear > set > > of next steps here, but definitely worth exploring. > > > > Some of us at Astronomer have been discussing this quite a bit and > > planning on bringing a more polished draft to the community, but an > initial > > discussion on a Saturday is fun as well :). We will definitely share our > > Airflow 3 proposal as a document with the community within the next week, > > as a request for comment. > > > > > > > > On Sat, Apr 20, 2024 at 1:50 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > Hello here, > > > > > > I have been thinking a lot recently and discussing with some people > > > and I am more and more convinced it's about the time we - as a > > > community - should start doing changes considering "Airflow 2" current > > and "Airflow 3" future. > > > > > > > > > *TL;DR: I think we should seriously start work on Airflow 3 and decide > > > what it means for our AIPs - to treat some of them as more "tactical" > > > - things that should go into Airflow 2 and some "strategic" ones - > > > being foundational for Airflow 3 - with different goals and criteria.* > > > > > > A lot of us already think that way and a lot of us have already talked > > > about it for quite some time, so you should treat my mail mostly as a > > > little trigger "let's start publicly discussing what it might mean for > > > us and our community and let's make it clear about the target of the > > > initiatives we do". > > > > > > Some might be surprised it comes from me as I've been often saying "no > > > Airflow 3 without a good reason" or "possibly we will have no Airflow > > > 3", but I think (and a number of people I spoke to have similar > > > opinion) we have plenty of reasons to make some bold moves now. > > > > > > Over the last 4 years since Airflow 2 was out, a lot has changed and > > > we have a number of different needs that current Airflow 2 cannot > > > **really** do well > > > > > > - LLM/Gen-AI mainly as the important trigger > > > - Cloud Native is the "way to go" > > > - need to submit DAGs in other ways than dropping them to a shared DAG > > > folder. > > > - local testing and fast iteration on developing pipelines. > > > - ability to run tasks with workflow with "affinity" so that they can > > > share inputs/outputs in shared CPU/GPU memory > > > - ability to integrate seamlessly with other workflow engines - making > > > Airflow a "workflow of workflows > > > - probably way more > > > - all that while keeping a lot of the strengths of Airflow 2 - such as > > > continuing to have the option of using the many thousands of operators > > > with > > > 90+ providers. > > > > > > All those above - we could implement better if we get rid of a number > > > of the implicit or explicit luggage we have in Airflow 2. I think the > > > last two proposals from Jens: AIP-68 and AIP-69 reflect very much that > > > - both would have been much easier and straightforward if we got > > > Airflow 3 re-designed basically at a drawing board with boldly > > > dropping some Airflow 3 assumptions. > > > And if we implemented core airflow 3 - taking the best part of what we > > > have now in Airflow 2, but generally dropping the luggage in a new > > framework. > > > > > > And it won't be possible without breaking some fundamental assumptions > > > and making Airflow 3 quite heavily incompatible with Airflow 2 > > > > > > From "my" camp - dropping the need of having the 700+ dependencies for > > > Airflow + all providers in a single Python interpreter, dropinnig > > > dependency on Flask/Plugins/FAB would be a huge win on its own. Not > > > mentioning being able to split provider's development and contribution > > > from airflow core (while keeping the development of providers as well > > > and > > > contributions) - this has been highly requested. > > > > > > And I think we have a lot of people in our community who would be able > > > (and would love) to do it - I think a number of us (including myself) > > > are a bit burned out and tired of just maintaining things in Airflow > > > in a backwards-compatible way and would jump on the opportunity to > > > rebuilding Airflow. > > > > > > But - we of course cannot forget about Airflow 2 users. We do not want > > > to "stop the world" for them. We want to keep fixing things and adding > > > incremental changes - and those things do not necessarily super > > > "future-proof". They should help to "keep the lights on" for a while > > > - which means that in a number of cases it could be "band-aid". AIP-44 > > > (internal-API), AIP-67 (multi-team) are more of those. > > > > > > So - what I think we might want to do as a community: > > > > > > * start working on Airflow 3 foundations (and decide what it means for > > > our users and developer community). Decide what to keep, what to drop, > > > what to redesign, assumptions to recreate. > > > > > > * explicitly split the initiatives/AIPs we have to target Airflow 2 > > > and Airflow 3 and treat them a bit differently in terms of > > > future-proofness > > > > > > I would love to hear your thoughts on that (bracing for the storm of > > > those). > > > > > > J. > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > >