Yeah I think arguing that there’s a need for it with use cases is a good idea.
On Wed, Feb 18, 2026 at 12:02 PM Natanel <[email protected]> wrote: > Hello, I have skimmed over the PR, overall I have to say that it looks > good. > I have yet to find a use case for this (as I just can't think of one) where > I find the feature useful, and I will appreciate it if you could give an > example use case for the feature, as it looks like quite a bit of changes > have been introduced (including a new table and new dependency types) for a > feature which allows for task groups to be retried. > > I would love to hear about what the use case of the feature is, as I just > can't think of one, I think that it might be simpler to implement if we do > something like a composite task instance, yet I do not want to propose > anything before I hear mroe about the use case, as I am most likely just > missing something. > > Best regards, > Natanel. > > On Wed, 18 Feb 2026 at 17:49, Jorge Rocamora García < > [email protected]> wrote: > > > Hi all, > > > > I’d like to start a discussion around Task Group retries. > > > > Issue: https://github.com/apache/airflow/issues/21867 > > PR: https://github.com/apache/airflow/pull/61809 > > > > This PR introduces a proof of concept for TaskGroup retries, allowing a > > whole TaskGroup to be retried as a unit rather than relying only on > > individual task retries. > > > > In addition to standard retry parameters (retries, retry_delay, > > exponential backoff, etc.), this proposal introduces TaskGroup-specific > > retry semantics, including: > > > > > > * > > retry_condition: allows defining when a group should be retried (e.g., > > based on aggregated task states), enabling more flexible policies than > > simple failure-based retries. > > * > > retry_fast_fail: enables fail-fast behavior within the group, so that > once > > a retry-triggering condition is met, the group can short-circuit > remaining > > tasks and move directly to retry handling. > > > > The implementation adds retry configuration to TaskGroup, introduces a > > task_group_instance model to persist retry state per DagRun, and includes > > scheduler logic to evaluate retry conditions, enforce delay/backoff, and > > clear group tasks for subsequent attempts. The feature is opt-in and does > > not affect existing DAGs unless configured. > > > > I’d appreciate feedback on: > > > > > > * > > The proposed API. > > * > > The scheduler and state-management approach. > > * > > The new model/migration. > > * > > Whether the retry semantics feel intuitive and consistent with existing > > task-level retries. > > * > > .. > > > > If there is general agreement on the direction, I’m happy to continue > > refining the implementation. > > > > Best, > > Jorge > > > > >
