I personally think that TaskGroup should go beyond being “just” a UI concept. I 
think that there are a lot of use-cases where people might want to perform a 
single operation across an entire group of tasks. I think that Bin points out a 
few really good examples (default arguments and group delete are good 
examples). I also have a proposal coming out hopefully later this week that 
will offer some more functionality to TaskGroup objects as well.
I don’t personally see the benefit of keeping them “UI only.” If we want to be 
able to group delete or add external sensors to a group of tasks we’d basically 
need to create another concept that centers around “a grouping of tasks” which 
I think might create confusion.
On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yuqian1...@gmail.com> wrote:
Hi, all, it's really exciting to see the great discussions about TaskGroup.
There are some interesting ideas here. - Tree View support for TaskGroup: I 
think this can mostly be achieved at the web layer? Changes probably involve 
tree.html and www/view.py. Should we change Tree View to organize tasks based 
on the TaskGroup hierarchy (no need to duplicate tasks in Tree View)? Currently 
the Tree View is organized into a flattened graph hierarchy, which means the 
same task can appear multiple times in Tree View. - Clear an entire TaskGroup. 
We should be able to do this in graph.html and www/view.py too. E.g. the UI 
passes the group_id of the TaskGroup to the web server which then clears the 
list of tasks in the TaskGroup, which is already an iterable of its child tasks 
so this should be possible. In fact, I've heard from several users that they 
sometimes want to select multiple tasks on Graph View with the mouse and then 
clear all of them at once. This is actually a very similar problem as clearing 
a TaskGroup.
Some other ideas such as default_args and ExternalTaskSensor support sound good 
too. We can probably continue the discussion on those individual issues/PRs.
On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang < bin.huan...@gmail.com 
[bin.huan...@gmail.com] > wrote:
Hi Kaxil,
One use case I have is to reuse TaskGroup across different DAGs as a predefined 
sub-workflow. For example, my team is currently building out a data platform 
that will allow a certain level of self-serve ability. Users of the platform 
(mostly analyst and scientist) should focus on business logic - transformation 
part - while don't need to pay too much attention to some standard operations 
(i.e. from S3 to Redshift staging table - validate data - swap to production 
table), as these types of tasks are boring and repetitive. Reuse these 
sub-workflows also enables us to load data to a different destination/warehouse 
without users needing to change their code. We can also have a notification 
sub-workflow that allows us to swap in and out Slack/Pageduty/etc over time 
without impacting the user.
Other use cases - allow default_args at TaskGroup level as in this issue: 
https://github.com/apache/airflow/issues/13911 
[https://github.com/apache/airflow/issues/13911] - ExternalTaskSensor on 
TaskGroup as mentioned by Nathan: 
https://github.com/apache/airflow/issues/14563 
[https://github.com/apache/airflow/issues/14563] - delete an entire TaskGroup: 
https://github.com/apache/airflow/issues/14529 
[https://github.com/apache/airflow/issues/14529]
All these use cases go beyond the pure UI level and require operations 
(viewing/triggering/deleting/waiting/etc) on a group of tasks. I think we can 
easily implement/formalize this with the current API without changing the 
backend too much (this PR https://github.com/apache/airflow/pull/14640 
[https://github.com/apache/airflow/pull/14640] shows a small example).
What do other people think?
Best Bin
On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik < kaxiln...@gmail.com 
[kaxiln...@gmail.com] > wrote:
Hi all, interesting discussion. I would love to hear about some more use-cases 
where TaskGroup needs to be something more than the UI concept.
All of Kevin's use-cases can be achieved while keeping it as a UI 
concept.Xinbin can you please expand a bit on your use case.
Regards, Kaxil
On Sat, Mar 6, 2021, 10:08 Xinbin Huang < bin.huan...@gmail.com 
[bin.huan...@gmail.com] > wrote:
Hi Kevin, Vikram, and Nathan,
I think we don't need to restrict too much on keeping TaskGroup only as a UI 
concept. We are already using TaskGroup to author DAGs and create dependencies, 
which already lies a bit outside the UI. To fully replace SubDagOperator, I 
think it's necessary to expand TaskGroup as a container for tasks than just UI 
concept.
As for TaskGroupSensor specifically, I land with the same approach as Kevin, 
and I have created a draft PR here: 
https://github.com/apache/airflow/pull/14640 
[https://github.com/apache/airflow/pull/14640]
Cheers Bin
On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang < yrql...@gmail.com 
[yrql...@gmail.com] > wrote:
Hi Vikram,
Good point. What I had in mind was getting the TaskGroup definition in a 
sensor, e.g. extract the _task_group field from serialized DAG, and query the 
DB for the TI states within.
You are right that it might not be clean nor does it keep TaskGroup as a UI 
concept.

Cheers, Kevin Y
On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vik...@astronomer.io.invalid> wrote:
Kevin,
I am not sure I understand your response to Nathan.
I agree that it is also a valid use case, but I don't see how it can be cleanly 
done while keeping TaskGroup only as a UI concept. Would this require extending 
the TaskGroup concept to the backend?
Best regards, Vikram
On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang < yrql...@gmail.com 
[yrql...@gmail.com] > wrote:
Hi Nathan,
Thanks a lot for your input and it is indeed a valid use case. This can be done 
either keeping TaskGroup as a UI concept or bringing it into the backend. I'm 
curious to hear what others think.

Cheers, Kevin Y
On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield < nathan.hadfi...@king.com 
[nathan.hadfi...@king.com] > wrote:
Hi Kevin,



A quick piece of input from our recent experiences of working with TaskGroup is 
that we often have dependencies across DAGs that require waiting upon the 
completion of all the tasks in a group. At the moment, you basically have two 
options:



 1. Create a sensor task in a DAG for every task in the group
 2. Create a Dummy task after the group that a sensor waits on



So, I would certainly like TaskGroups to have some notion of run status as to 
better enable downstream decision making.



I’ve already created a feature ticket to try to add some kind of TaskGroup 
Sensor but perhaps this can also form part of the wider discussions here.



https://github.com/apache/airflow/issues/14563 
[https://github.com/apache/airflow/issues/14563]



Cheers,



Nathan



From: Kevin Yang < yrql...@gmail.com [yrql...@gmail.com] >
Date: Thursday, 4 March 2021 at 05:21
To: dev@airflow.apache.org [dev@airflow.apache.org] < dev@airflow.apache.org 
[dev@airflow.apache.org] >
Subject: [DISCUSS] TaskGroup in Tree View

Hi team,



We are very glad to see the introduction of TaskGroup in Airflow 2.0 and really 
like it. Thanks to Yu Qian and everyone that contributed to it. To continue 
moving towards the goal of replacing SubDagOperator with TaskGroup, I'd like to 
kick off a discussion on bringing TaskGroup into Tree View.



Why do we need TaskGroup in Tree View?

For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the preferred 
view for its loading speed and simpler representation. SubDagOperator is often 
used to provide an isolated view into a subset of tasks in such large DAGs. To 
replace such SubDag use cases, TaskGroup will need to support Tree View.



What should TaskGroup look like in Tree View?

We didn't have a conclusion during the 1st iteration of TaskGroup. In Airbnb, 
we use SubDag mostly for providing a zoom in view on a small set of tasks and 
the SubDag zoom in feature worked well for us. We'd like to see TaskGroup 
provide a zoom in option for both Graph View and Tree View but also like to 
hear everyone's thoughts.



What needs to be in TaskGroup and what doesn't?

TaskGroup started off as a pure UI concept while SubDag is something more, e.g. 
it has its own DagRun thus isolated scheduling decisions, it can serve as a 
logical isolation layer that holds different sets of DAG level params, etc. 
While we only use SubDag as a UI feature, I think it would be a good 
opportunity for us to discuss what should be TaskGroup and what shouldn't.



Please don't hesitate to share your thoughts.





Cheers,

Kevin Y

Reply via email to