Hi Kaxil,

One use case I have is to reuse TaskGroup across different DAGs as a
predefined sub-workflow. For example, my team is currently building out a
data platform that will allow a certain level of self-serve ability. Users
of the platform (mostly analyst and scientist) should focus on business
logic - transformation part - while don't need to pay too much attention to
some standard operations (i.e. from S3 to Redshift staging table - validate
data - swap to production table), as these types of tasks are boring and
repetitive. Reuse these sub-workflows also enables us to load data to a
different destination/warehouse without users needing to change their code.
We can also have a notification sub-workflow that allows us to swap in and
out Slack/Pageduty/etc over time without impacting the user.

Other use cases
- allow default_args at TaskGroup level as in this issue:
https://github.com/apache/airflow/issues/13911
- ExternalTaskSensor on TaskGroup as mentioned by Nathan:
https://github.com/apache/airflow/issues/14563
- delete an entire TaskGroup: https://github.com/apache/airflow/issues/14529

All these use cases go beyond the pure UI level and require operations
(viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I think we
can easily implement/formalize this with the current API without changing
the backend too much (this PR
https://github.com/apache/airflow/pull/14640 shows
a small example).

What do other people think?

Best
Bin

On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Hi all, interesting discussion. I would love to hear about some more
> use-cases where TaskGroup needs to be something more than the UI concept.
>
> All of Kevin's use-cases can be achieved while keeping it as a UI
> concept.Xinbin can you please expand a bit on your use case.
>
> Regards,
> Kaxil
>
> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bin.huan...@gmail.com> wrote:
>
>> Hi Kevin, Vikram, and Nathan,
>>
>> I think we don't need to restrict too much on keeping TaskGroup only as a
>> UI concept. We are already using TaskGroup to author DAGs and create
>> dependencies, which already lies a bit outside the UI.
>> To fully replace SubDagOperator, I think it's necessary to expand
>> TaskGroup as a *container for tasks* than just UI concept.
>>
>> As for TaskGroupSensor specifically, I land with the same approach as
>> Kevin, and I have created a draft PR here:
>> https://github.com/apache/airflow/pull/14640
>>
>> Cheers
>> Bin
>>
>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yrql...@gmail.com> wrote:
>>
>>> Hi Vikram,
>>>
>>> Good point. What I had in mind was getting the TaskGroup definition in a
>>> sensor, e.g. extract the _task_group field from serialized DAG, and query
>>> the DB for the TI states within.
>>>
>>> You are right that it might not be clean nor does it keep TaskGroup as a
>>> UI concept.
>>>
>>>
>>> Cheers,
>>> Kevin Y
>>>
>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vik...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> Kevin,
>>>>
>>>> I am not sure I understand your response to Nathan.
>>>>
>>>> I agree that it is also a valid use case, but I don't see how it can be
>>>> cleanly done while keeping TaskGroup only as a UI concept.
>>>> Would this require extending the TaskGroup concept to the backend?
>>>>
>>>> Best regards,
>>>> Vikram
>>>>
>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yrql...@gmail.com> wrote:
>>>>
>>>>> Hi Nathan,
>>>>>
>>>>> Thanks a lot for your input and it is indeed a valid use case. This
>>>>> can be done either keeping TaskGroup as a UI concept or bringing it into
>>>>> the backend. I'm curious to hear what others think.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Kevin Y
>>>>>
>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>> nathan.hadfi...@king.com> wrote:
>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>>
>>>>>>
>>>>>> A quick piece of input from our recent experiences of working with
>>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>>>>> you basically have two options:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>> status as to better enable downstream decision making.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>> discussions here.
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Nathan
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Kevin Yang <yrql...@gmail.com>
>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>> *To: *dev@airflow.apache.org <dev@airflow.apache.org>
>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>
>>>>>> Hi team,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0
>>>>>> and really like it. Thanks to Yu Qian and everyone that contributed to 
>>>>>> it.
>>>>>> To continue moving towards the goal of replacing SubDagOperator with
>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>>> Tree View.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>
>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>>>>>> preferred view for its loading speed and simpler representation.
>>>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup 
>>>>>> will
>>>>>> need to support Tree View.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>
>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>>>>>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set 
>>>>>> of
>>>>>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>>>>>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>>>>>> also like to hear everyone's thoughts.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>
>>>>>> TaskGroup started off as a pure UI concept while SubDag is something
>>>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it 
>>>>>> can
>>>>>> serve as a logical isolation layer that holds different sets of DAG level
>>>>>> params, etc. While we only use SubDag as a UI feature, I think it would 
>>>>>> be
>>>>>> a good opportunity for us to discuss what should be TaskGroup and what
>>>>>> shouldn't.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please don't hesitate to share your thoughts.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Kevin Y
>>>>>>
>>>>>

Reply via email to