Hi Kaxil, One use case I have is to reuse TaskGroup across different DAGs as a predefined sub-workflow. For example, my team is currently building out a data platform that will allow a certain level of self-serve ability. Users of the platform (mostly analyst and scientist) should focus on business logic - transformation part - while don't need to pay too much attention to some standard operations (i.e. from S3 to Redshift staging table - validate data - swap to production table), as these types of tasks are boring and repetitive. Reuse these sub-workflows also enables us to load data to a different destination/warehouse without users needing to change their code. We can also have a notification sub-workflow that allows us to swap in and out Slack/Pageduty/etc over time without impacting the user.
Other use cases - allow default_args at TaskGroup level as in this issue: https://github.com/apache/airflow/issues/13911 - ExternalTaskSensor on TaskGroup as mentioned by Nathan: https://github.com/apache/airflow/issues/14563 - delete an entire TaskGroup: https://github.com/apache/airflow/issues/14529 All these use cases go beyond the pure UI level and require operations (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I think we can easily implement/formalize this with the current API without changing the backend too much (this PR https://github.com/apache/airflow/pull/14640 shows a small example). What do other people think? Best Bin On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Hi all, interesting discussion. I would love to hear about some more > use-cases where TaskGroup needs to be something more than the UI concept. > > All of Kevin's use-cases can be achieved while keeping it as a UI > concept.Xinbin can you please expand a bit on your use case. > > Regards, > Kaxil > > On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bin.huan...@gmail.com> wrote: > >> Hi Kevin, Vikram, and Nathan, >> >> I think we don't need to restrict too much on keeping TaskGroup only as a >> UI concept. We are already using TaskGroup to author DAGs and create >> dependencies, which already lies a bit outside the UI. >> To fully replace SubDagOperator, I think it's necessary to expand >> TaskGroup as a *container for tasks* than just UI concept. >> >> As for TaskGroupSensor specifically, I land with the same approach as >> Kevin, and I have created a draft PR here: >> https://github.com/apache/airflow/pull/14640 >> >> Cheers >> Bin >> >> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yrql...@gmail.com> wrote: >> >>> Hi Vikram, >>> >>> Good point. What I had in mind was getting the TaskGroup definition in a >>> sensor, e.g. extract the _task_group field from serialized DAG, and query >>> the DB for the TI states within. >>> >>> You are right that it might not be clean nor does it keep TaskGroup as a >>> UI concept. >>> >>> >>> Cheers, >>> Kevin Y >>> >>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vik...@astronomer.io.invalid> >>> wrote: >>> >>>> Kevin, >>>> >>>> I am not sure I understand your response to Nathan. >>>> >>>> I agree that it is also a valid use case, but I don't see how it can be >>>> cleanly done while keeping TaskGroup only as a UI concept. >>>> Would this require extending the TaskGroup concept to the backend? >>>> >>>> Best regards, >>>> Vikram >>>> >>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yrql...@gmail.com> wrote: >>>> >>>>> Hi Nathan, >>>>> >>>>> Thanks a lot for your input and it is indeed a valid use case. This >>>>> can be done either keeping TaskGroup as a UI concept or bringing it into >>>>> the backend. I'm curious to hear what others think. >>>>> >>>>> >>>>> Cheers, >>>>> Kevin Y >>>>> >>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield < >>>>> nathan.hadfi...@king.com> wrote: >>>>> >>>>>> Hi Kevin, >>>>>> >>>>>> >>>>>> >>>>>> A quick piece of input from our recent experiences of working with >>>>>> TaskGroup is that we often have dependencies across DAGs that require >>>>>> waiting upon the completion of all the tasks in a group. At the moment, >>>>>> you basically have two options: >>>>>> >>>>>> >>>>>> >>>>>> 1. Create a sensor task in a DAG for every task in the group >>>>>> 2. Create a Dummy task after the group that a sensor waits on >>>>>> >>>>>> >>>>>> >>>>>> So, I would certainly like TaskGroups to have some notion of run >>>>>> status as to better enable downstream decision making. >>>>>> >>>>>> >>>>>> >>>>>> I’ve already created a feature ticket to try to add some kind of >>>>>> TaskGroup Sensor but perhaps this can also form part of the wider >>>>>> discussions here. >>>>>> >>>>>> >>>>>> >>>>>> https://github.com/apache/airflow/issues/14563 >>>>>> >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> >>>>>> >>>>>> Nathan >>>>>> >>>>>> >>>>>> >>>>>> *From: *Kevin Yang <yrql...@gmail.com> >>>>>> *Date: *Thursday, 4 March 2021 at 05:21 >>>>>> *To: *dev@airflow.apache.org <dev@airflow.apache.org> >>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View >>>>>> >>>>>> Hi team, >>>>>> >>>>>> >>>>>> >>>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0 >>>>>> and really like it. Thanks to Yu Qian and everyone that contributed to >>>>>> it. >>>>>> To continue moving towards the goal of replacing SubDagOperator with >>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into >>>>>> Tree View. >>>>>> >>>>>> >>>>>> >>>>>> *Why do we need TaskGroup in Tree View?* >>>>>> >>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the >>>>>> preferred view for its loading speed and simpler representation. >>>>>> SubDagOperator is often used to provide an isolated view into a subset of >>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup >>>>>> will >>>>>> need to support Tree View. >>>>>> >>>>>> >>>>>> >>>>>> *What should TaskGroup look like in Tree View?* >>>>>> >>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup. In >>>>>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set >>>>>> of >>>>>> tasks and the SubDag zoom in feature worked well for us. We'd like to see >>>>>> TaskGroup provide a zoom in option for both Graph View and Tree View but >>>>>> also like to hear everyone's thoughts. >>>>>> >>>>>> >>>>>> >>>>>> *What needs to be in TaskGroup and what doesn't?* >>>>>> >>>>>> TaskGroup started off as a pure UI concept while SubDag is something >>>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it >>>>>> can >>>>>> serve as a logical isolation layer that holds different sets of DAG level >>>>>> params, etc. While we only use SubDag as a UI feature, I think it would >>>>>> be >>>>>> a good opportunity for us to discuss what should be TaskGroup and what >>>>>> shouldn't. >>>>>> >>>>>> >>>>>> >>>>>> Please don't hesitate to share your thoughts. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Kevin Y >>>>>> >>>>>