Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

Vikram Koka Sat, 20 Apr 2024 09:23:18 -0700

A wonderful and exciting Saturday morning discussion!
Thank you Jarek for bringing the offline conversations into the mailing
list.

I completely agree on the necessity of Airflow 3.
I also agree that Gen AI is the trigger i.e. the answer to "Why now"?

Having been thinking about this for a while from a strategic perspective,
as opposed to the
tactical perspective of the bi-weekly and monthly releases, I believe that
our thinking as you articulated should have a
clear understanding of strategic vs. tactical, but I don't believe our
execution needs to necessarily be either or, but can actually
be blended.

With that said,  I believe that there are the following four buckets that
we should use as a framework for Airflow 3.

1. Gen AI / LLM support
2. Airflow User Improvements
3. Easy adoption of Airflow by new users
4. Integration improvements / Provider maintainability

Describing them in more detail below:
1. Gen AI / LLM support
Reiterating the fact that this needs more work, I do believe this can be
incremental to Airflow. As Astronomer,
we have worked on the LLM Providers which we contributed to Airflow late
last year. But clearly, there is so more to do,
both from building awareness of the patterns / templates to use, as well as
patterns to support in Airflow to make these
easier to use and adopt.

2. Airflow User Improvements
Clearly features and improvements desired by the Community are important to
continue to work on to make Airflow more approachable. The top two features
which leap to mind for me here are:
2.1 DAG Versioning - the most requested feature in the Airflow User Survey,
2.2 Modern UI - also comes up a lot
2.3 Different DAG distribution processes
2.4 Different execution mechanisms
I know there are many more which I don't currently recall.

3. Airflow adoption
We have discussed this many times, but we absolutely need to make the
individual first-time adoption of Airflow better.
I think the most common term I recall here is the notion of "Airflow
Standalone", but whatever the term may be, an
ultra quick, simple install of Airflow and the getting started experience
is something we owe our community.

4. Integration / Providers
The changes we made as part of Airflow 2.0 to split the Core Airflow
releases from the Provider releases was clearly
a good choice and made a huge impact. However, the integration
maintainability balanced with growth still seems like it could
use a significant set of improvements. Elad and I spoke about this a couple
of days ago as well and I don't have a clear
set of next steps here, but definitely worth exploring.

Some of us at Astronomer have been discussing this quite a bit and planning
on bringing a more polished draft to the community, but an initial
discussion on a Saturday is fun as well :). We will definitely share our
Airflow 3 proposal as a document with the community within the next week,
as a request for comment.

On Sat, Apr 20, 2024 at 1:50 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello here,
>
> I have been thinking a lot recently and discussing with some people and I
> am more and more convinced it's about the time we - as a community - should
> start doing changes considering "Airflow 2" current and "Airflow 3" future.
>
>
> *TL;DR: I think we should seriously start work on Airflow 3 and decide what
> it means for our AIPs  - to treat some of them as more "tactical" - things
> that should go into Airflow 2 and some "strategic" ones - being
> foundational for Airflow 3 - with different goals and criteria.*
>
> A lot of us already think that way and a lot of us have already talked
> about it for quite some time, so you should treat my mail mostly as a
> little trigger "let's start publicly discussing what it might mean for us
> and our community and let's make it clear about the target of the
> initiatives we do".
>
> Some might be surprised it comes from me as I've been often saying "no
> Airflow 3 without a good reason" or "possibly we will have no Airflow 3",
> but I think (and a number of people I spoke to have similar opinion) we
> have plenty of reasons to make some bold moves now.
>
> Over the last 4 years since Airflow 2 was out, a lot has changed and we
> have a number of different needs that current Airflow 2 cannot **really**
> do well
>
> - LLM/Gen-AI mainly as the important trigger
> - Cloud Native is the "way to go"
> - need to submit DAGs in other ways than dropping them to a shared DAG
> folder.
> - local testing and fast iteration on developing pipelines.
> - ability to run tasks with workflow with "affinity" so that they can share
> inputs/outputs in shared CPU/GPU memory
> - ability to integrate seamlessly with other workflow engines - making
> Airflow a "workflow of workflows
> - probably way more
> - all that while keeping a lot of the strengths of Airflow 2 - such as
> continuing to have the option of using the many thousands of operators with
> 90+ providers.
>
> All those above - we could implement better if we get rid of a number of
> the implicit or explicit luggage we have in Airflow 2. I think the last two
> proposals from Jens: AIP-68 and AIP-69 reflect very much that - both  would
> have been much easier and straightforward if we got Airflow 3 re-designed
> basically at a drawing board with boldly dropping some Airflow 3
> assumptions.
> And if we implemented core airflow 3 - taking the best part of what we have
> now in Airflow 2, but generally dropping the luggage  in a new framework.
>
> And it won't be possible without breaking some fundamental assumptions and
> making Airflow 3 quite heavily incompatible with Airflow 2
>
> From "my" camp - dropping the need of having the 700+ dependencies for
> Airflow + all providers in a single Python interpreter, dropinnig
> dependency on Flask/Plugins/FAB would be a huge win on its own. Not
> mentioning being able to split provider's development and contribution from
> airflow core (while keeping the development of providers as well and
> contributions) - this has been highly requested.
>
> And I think we have a lot of people in our community who would be able (and
> would love) to do it - I think a number of us (including myself) are a bit
> burned out and tired of just maintaining things in Airflow in a
> backwards-compatible way and would jump on the opportunity to
> rebuilding Airflow.
>
> But - we of course cannot forget about Airflow 2 users. We do not want to
> "stop the world" for them. We want to keep fixing things and adding
> incremental changes - and those things do not necessarily super
> "future-proof". They should help  to "keep the lights on" for a while -
> which means that in a number of cases it could be "band-aid". AIP-44
> (internal-API), AIP-67 (multi-team) are more of those.
>
> So - what I think we might want to do as a community:
>
> * start working on Airflow 3 foundations (and decide what it means for our
> users and developer community). Decide what to keep, what to drop, what to
> redesign, assumptions to recreate.
>
> * explicitly split the initiatives/AIPs we have to target Airflow 2 and
> Airflow 3 and treat them a bit differently in terms of future-proofness
>
> I would love to hear your thoughts on that (bracing for the storm of
> those).
>
> J.
>

Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

Reply via email to