Hi all, I’d like to clarify that several concrete use cases were already described in the original issue: https://github.com/apache/airflow/issues/21867
One important aspect is that with the deprecation of SubDAGs in favor of TaskGroups, some retry semantics were lost. In my specific case, I’m using the KubernetesPodOperator, where different steps must run in separate pods because they depend on different software. However, conceptually, the entire block needs to behave as a single logical unit. For example: - A: Create a PersistentVolumeClaim (PVC) to share data - B: Retrieve and prepare inputs - C: Run the analysis - D: Remove the PVC This pattern was previously achievable with SubDAGs, but there is currently no straightforward mechanism that preserves this grouped execution and retry behavior. Best regards, Jorge On 2026/02/18 22:20:10 Daniel Standish via dev wrote: > Yeah I think arguing that there’s a need for it with use cases is a good > idea. > > > On Wed, Feb 18, 2026 at 12:02 PM Natanel <[email protected]> wrote: > > > Hello, I have skimmed over the PR, overall I have to say that it looks > > good. > > I have yet to find a use case for this (as I just can't think of one) where > > I find the feature useful, and I will appreciate it if you could give an > > example use case for the feature, as it looks like quite a bit of changes > > have been introduced (including a new table and new dependency types) for a > > feature which allows for task groups to be retried. > > > > I would love to hear about what the use case of the feature is, as I just > > can't think of one, I think that it might be simpler to implement if we do > > something like a composite task instance, yet I do not want to propose > > anything before I hear mroe about the use case, as I am most likely just > > missing something. > > > > Best regards, > > Natanel. > > > > On Wed, 18 Feb 2026 at 17:49, Jorge Rocamora García < > > [email protected]> wrote: > > > > > Hi all, > > > > > > I’d like to start a discussion around Task Group retries. > > > > > > Issue: https://github.com/apache/airflow/issues/21867 > > > PR: https://github.com/apache/airflow/pull/61809 > > > > > > This PR introduces a proof of concept for TaskGroup retries, allowing a > > > whole TaskGroup to be retried as a unit rather than relying only on > > > individual task retries. > > > > > > In addition to standard retry parameters (retries, retry_delay, > > > exponential backoff, etc.), this proposal introduces TaskGroup-specific > > > retry semantics, including: > > > > > > > > > * > > > retry_condition: allows defining when a group should be retried (e.g., > > > based on aggregated task states), enabling more flexible policies than > > > simple failure-based retries. > > > * > > > retry_fast_fail: enables fail-fast behavior within the group, so that > > once > > > a retry-triggering condition is met, the group can short-circuit > > remaining > > > tasks and move directly to retry handling. > > > > > > The implementation adds retry configuration to TaskGroup, introduces a > > > task_group_instance model to persist retry state per DagRun, and includes > > > scheduler logic to evaluate retry conditions, enforce delay/backoff, and > > > clear group tasks for subsequent attempts. The feature is opt-in and does > > > not affect existing DAGs unless configured. > > > > > > I’d appreciate feedback on: > > > > > > > > > * > > > The proposed API. > > > * > > > The scheduler and state-management approach. > > > * > > > The new model/migration. > > > * > > > Whether the retry semantics feel intuitive and consistent with existing > > > task-level retries. > > > * > > > .. > > > > > > If there is general agreement on the direction, I’m happy to continue > > > refining the implementation. > > > > > > Best, > > > Jorge > > > > > > > > >
