Hi, all, it's really exciting to see the great discussions about TaskGroup.

There are some interesting ideas here.
- Tree View support for TaskGroup: I think this can mostly be achieved at
the web layer? Changes probably involve tree.html and www/view.py. Should
we change Tree View to organize tasks based on the TaskGroup hierarchy (no
need to duplicate tasks in Tree View)? Currently the Tree View is organized
into a flattened graph hierarchy, which means the same task can
appear multiple times in Tree View.
- Clear an entire TaskGroup. We should be able to do this in graph.html and
www/view.py too. E.g. the UI passes the group_id of the TaskGroup to the
web server which then clears the list of tasks in the TaskGroup, which is
already an iterable of its child tasks so this should be possible. In fact,
I've heard from several users that they sometimes want to select multiple
tasks on Graph View with the mouse and then clear all of them at once. This
is actually a very similar problem as clearing a TaskGroup.

Some other ideas such as default_args and ExternalTaskSensor support
sound good too. We can probably continue the discussion on those individual
issues/PRs.

On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang <bin.huan...@gmail.com> wrote:

> Hi Kaxil,
>
> One use case I have is to reuse TaskGroup across different DAGs as a
> predefined sub-workflow. For example, my team is currently building out a
> data platform that will allow a certain level of self-serve ability. Users
> of the platform (mostly analyst and scientist) should focus on business
> logic - transformation part - while don't need to pay too much attention to
> some standard operations (i.e. from S3 to Redshift staging table - validate
> data - swap to production table), as these types of tasks are boring and
> repetitive. Reuse these sub-workflows also enables us to load data to a
> different destination/warehouse without users needing to change their code.
> We can also have a notification sub-workflow that allows us to swap in and
> out Slack/Pageduty/etc over time without impacting the user.
>
> Other use cases
> - allow default_args at TaskGroup level as in this issue:
> https://github.com/apache/airflow/issues/13911
> - ExternalTaskSensor on TaskGroup as mentioned by Nathan:
> https://github.com/apache/airflow/issues/14563
> - delete an entire TaskGroup:
> https://github.com/apache/airflow/issues/14529
>
> All these use cases go beyond the pure UI level and require operations
> (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I think
> we can easily implement/formalize this with the current API without
> changing the backend too much (this PR
> https://github.com/apache/airflow/pull/14640 shows a small example).
>
> What do other people think?
>
> Best
> Bin
>
> On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
>> Hi all, interesting discussion. I would love to hear about some more
>> use-cases where TaskGroup needs to be something more than the UI concept.
>>
>> All of Kevin's use-cases can be achieved while keeping it as a UI
>> concept.Xinbin can you please expand a bit on your use case.
>>
>> Regards,
>> Kaxil
>>
>> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bin.huan...@gmail.com> wrote:
>>
>>> Hi Kevin, Vikram, and Nathan,
>>>
>>> I think we don't need to restrict too much on keeping TaskGroup only as
>>> a UI concept. We are already using TaskGroup to author DAGs and create
>>> dependencies, which already lies a bit outside the UI.
>>> To fully replace SubDagOperator, I think it's necessary to expand
>>> TaskGroup as a *container for tasks* than just UI concept.
>>>
>>> As for TaskGroupSensor specifically, I land with the same approach as
>>> Kevin, and I have created a draft PR here:
>>> https://github.com/apache/airflow/pull/14640
>>>
>>> Cheers
>>> Bin
>>>
>>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yrql...@gmail.com> wrote:
>>>
>>>> Hi Vikram,
>>>>
>>>> Good point. What I had in mind was getting the TaskGroup definition in
>>>> a sensor, e.g. extract the _task_group field from serialized DAG, and query
>>>> the DB for the TI states within.
>>>>
>>>> You are right that it might not be clean nor does it keep TaskGroup as
>>>> a UI concept.
>>>>
>>>>
>>>> Cheers,
>>>> Kevin Y
>>>>
>>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vik...@astronomer.io.invalid>
>>>> wrote:
>>>>
>>>>> Kevin,
>>>>>
>>>>> I am not sure I understand your response to Nathan.
>>>>>
>>>>> I agree that it is also a valid use case, but I don't see how it can
>>>>> be cleanly done while keeping TaskGroup only as a UI concept.
>>>>> Would this require extending the TaskGroup concept to the backend?
>>>>>
>>>>> Best regards,
>>>>> Vikram
>>>>>
>>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yrql...@gmail.com> wrote:
>>>>>
>>>>>> Hi Nathan,
>>>>>>
>>>>>> Thanks a lot for your input and it is indeed a valid use case. This
>>>>>> can be done either keeping TaskGroup as a UI concept or bringing it into
>>>>>> the backend. I'm curious to hear what others think.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Kevin Y
>>>>>>
>>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>>> nathan.hadfi...@king.com> wrote:
>>>>>>
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> A quick piece of input from our recent experiences of working with
>>>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>>>>>> you basically have two options:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>>> status as to better enable downstream decision making.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>>> discussions here.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Nathan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Kevin Yang <yrql...@gmail.com>
>>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>>> *To: *dev@airflow.apache.org <dev@airflow.apache.org>
>>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>>
>>>>>>> Hi team,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0
>>>>>>> and really like it. Thanks to Yu Qian and everyone that contributed to 
>>>>>>> it.
>>>>>>> To continue moving towards the goal of replacing SubDagOperator with
>>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>>>> Tree View.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>>
>>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is
>>>>>>> the preferred view for its loading speed and simpler representation.
>>>>>>> SubDagOperator is often used to provide an isolated view into a subset 
>>>>>>> of
>>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup 
>>>>>>> will
>>>>>>> need to support Tree View.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>>
>>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup.
>>>>>>> In Airbnb, we use SubDag mostly for providing a zoom in view on a small 
>>>>>>> set
>>>>>>> of tasks and the SubDag zoom in feature worked well for us. We'd like to
>>>>>>> see TaskGroup provide a zoom in option for both Graph View and Tree View
>>>>>>> but also like to hear everyone's thoughts.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>>
>>>>>>> TaskGroup started off as a pure UI concept while SubDag is something
>>>>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it 
>>>>>>> can
>>>>>>> serve as a logical isolation layer that holds different sets of DAG 
>>>>>>> level
>>>>>>> params, etc. While we only use SubDag as a UI feature, I think it would 
>>>>>>> be
>>>>>>> a good opportunity for us to discuss what should be TaskGroup and what
>>>>>>> shouldn't.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please don't hesitate to share your thoughts.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Kevin Y
>>>>>>>
>>>>>>

Reply via email to