Re: [DISCUSS] Task Group Retries

Daniel Standish via dev Wed, 18 Feb 2026 14:22:04 -0800

Yeah I think arguing that there’s a need for it with use cases is a good
idea.



On Wed, Feb 18, 2026 at 12:02 PM Natanel <[email protected]> wrote:

> Hello, I have skimmed over the PR, overall I have to say that it looks
> good.
> I have yet to find a use case for this (as I just can't think of one) where
> I find the feature useful, and I will appreciate it if you could give an
> example use case for the feature, as it looks like quite a bit of changes
> have been introduced (including a new table and new dependency types) for a
> feature which allows for task groups to be retried.
>
> I would love to hear about what the use case of the feature is, as I just
> can't think of one, I think that it might be simpler to implement if we do
> something like a composite task instance, yet I do not want to propose
> anything before I hear mroe about the use case, as I am most likely just
> missing something.
>
> Best regards,
> Natanel.
>
> On Wed, 18 Feb 2026 at 17:49, Jorge Rocamora García <
> [email protected]> wrote:
>
> > Hi all,
> >
> > I’d like to start a discussion around Task Group retries.
> >
> > Issue: https://github.com/apache/airflow/issues/21867
> > PR: https://github.com/apache/airflow/pull/61809
> >
> > This PR introduces a proof of concept for TaskGroup retries, allowing a
> > whole TaskGroup to be retried as a unit rather than relying only on
> > individual task retries.
> >
> > In addition to standard retry parameters (retries, retry_delay,
> > exponential backoff, etc.), this proposal introduces TaskGroup-specific
> > retry semantics, including:
> >
> >
> >   *
> > retry_condition: allows defining when a group should be retried (e.g.,
> > based on aggregated task states), enabling more flexible policies than
> > simple failure-based retries.
> >   *
> > retry_fast_fail: enables fail-fast behavior within the group, so that
> once
> > a retry-triggering condition is met, the group can short-circuit
> remaining
> > tasks and move directly to retry handling.
> >
> > The implementation adds retry configuration to TaskGroup, introduces a
> > task_group_instance model to persist retry state per DagRun, and includes
> > scheduler logic to evaluate retry conditions, enforce delay/backoff, and
> > clear group tasks for subsequent attempts. The feature is opt-in and does
> > not affect existing DAGs unless configured.
> >
> > I’d appreciate feedback on:
> >
> >
> >   *
> > The proposed API.
> >   *
> > The scheduler and state-management approach.
> >   *
> > The new model/migration.
> >   *
> > Whether the retry semantics feel intuitive and consistent with existing
> > task-level retries.
> >   *
> > ..
> >
> > If there is general agreement on the direction, I’m happy to continue
> > refining the implementation.
> >
> > Best,
> > Jorge
> >
> >
>

Re: [DISCUSS] Task Group Retries

Reply via email to