Hi Kevin,

A quick piece of input from our recent experiences of working with TaskGroup is 
that we often have dependencies across DAGs that require waiting upon the 
completion of all the tasks in a group.  At the moment, you basically have two 
options:


  1.  Create a sensor task in a DAG for every task in the group
  2.  Create a Dummy task after the group that a sensor waits on

So, I would certainly like TaskGroups to have some notion of run status as to 
better enable downstream decision making.

I’ve already created a feature ticket to try to add some kind of TaskGroup 
Sensor but perhaps this can also form part of the wider discussions here.

https://github.com/apache/airflow/issues/14563

Cheers,

Nathan

From: Kevin Yang <yrql...@gmail.com>
Date: Thursday, 4 March 2021 at 05:21
To: dev@airflow.apache.org <dev@airflow.apache.org>
Subject: [DISCUSS] TaskGroup in Tree View
Hi team,

We are very glad to see the introduction of TaskGroup in Airflow 2.0 and really 
like it. Thanks to Yu Qian and everyone that contributed to it. To continue 
moving towards the goal of replacing SubDagOperator with TaskGroup, I'd like to 
kick off a discussion on bringing TaskGroup into Tree View.

Why do we need TaskGroup in Tree View?
For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the preferred 
view for its loading speed and simpler representation. SubDagOperator is often 
used to provide an isolated view into a subset of tasks in such large DAGs. To 
replace such SubDag use cases, TaskGroup will need to support Tree View.

What should TaskGroup look like in Tree View?
We didn't have a conclusion during the 1st iteration of TaskGroup. In Airbnb, 
we use SubDag mostly for providing a zoom in view on a small set of tasks and 
the SubDag zoom in feature worked well for us. We'd like to see TaskGroup 
provide a zoom in option for both Graph View and Tree View but also like to 
hear everyone's thoughts.

What needs to be in TaskGroup and what doesn't?
TaskGroup started off as a pure UI concept while SubDag is something more, e.g. 
it has its own DagRun thus isolated scheduling decisions, it can serve as a 
logical isolation layer that holds different sets of DAG level params, etc. 
While we only use SubDag as a UI feature, I think it would be a good 
opportunity for us to discuss what should be TaskGroup and what shouldn't.

Please don't hesitate to share your thoughts.


Cheers,
Kevin Y

Reply via email to