Yes. Long time awaited - and indeed some implementation details would be
needed to get it to AIP. And I also think one important decision to
consider - should it be targeting Airflow 2?

On Sun, May 26, 2024 at 12:26 PM Elad Kalif <elad...@apache.org> wrote:

> > In order for this to become a reality, Backfills need to be handled by
> the
> Airflow Scheduler as a normal DAG execution
>
> I think it's a good idea.
> It should solve natively problems like
> https://github.com/apache/airflow/issues/11302
>
> On Fri, May 24, 2024 at 10:58 PM Vikram Koka <vik...@astronomer.io.invalid
> >
> wrote:
>
> > Fellow Airflowers,
> >
> > I am following up on some of the proposed changes in the Airflow 3
> proposal
> > <
> >
> https://docs.google.com/document/d/1MTr53101EISZaYidCUKcR6mRKshXGzW6DZFXGzetG3E/
> > >,
> > where more information was requested by the community.
> >
> > One specific topic was "Running Backfills at scale". This is not yet a
> full
> > fledged AIP, but a starting point for the discussion leading towards an
> AIP
> > with fully defined technical details.
> > Backfills at scale
> >
> > Backfills in Airflow 2.x are treated as an exception and executed by an
> > incarnation of the BackfillJob, rather than the regular Airflow Scheduler
> > itself. This results in unexpected interactions with the other DAGs being
> > run by the main Airflow Scheduler at the same time including resource
> > contention and possibly unexpected delays because established scalability
> > configuration settings such as Concurrency are not consistently applied,
> > and also code-level complexity by having two somewhat-similar
> > implementations of scheduling logic.
> >
> >
> > However, with ML model training, backfills are a common operation and
> need
> > to be treated as a regular Airflow DAG / Task execution operation and not
> > treated as an exception. It is also not possible to run a backfill unless
> > you have direct access to the Airflow database/SSH access to the Airflow
> > server , which is not possible for many/most data engineers.
> >
> >
> > In order for this to become a reality, Backfills need to be handled by
> the
> > Airflow Scheduler as a normal DAG execution, building on the Dynamic Task
> > Mapping execution pattern, rather than an exception. Additionally,
> Backfill
> > tasks will now ONLY be executed by the Airflow Workers, for obvious
> reasons
> > including scalability. A less obvious, but important reason is Security,
> > since it is ideal to have data connections to Enterprise data only happen
> > through Airflow Workers, rather than any Airflow system components.
> >
> >
> > As part of making Backfill support cleaner in Airflow, Backfill DAG
> > execution will also be supported in the Airflow REST API.
> >
> >
> > This proposal is purposefully light on exact implementation details but
> > will include at least:
> >
> >
> >
> >    -
> >
> >    Making the Airflow Scheduler responsible for scheduling decisions on
> all
> >    DagRuns (instead of the current where it purposefully ignores backfill
> > runs)
> >    -
> >
> >    A new API endpoint to submit a "backfill request".
> >
> >
> > --
> >
> >
> > Best regards,
> > Vikram Koka, Ash Berlin-Taylor, Kaxil Naik, and Constance Martineau
> >
>

Reply via email to