I agree, but we should see what of those we can implement just on the parsing side - i.e. can we continue to make the scheduler not have to care about Task Groups?
If so, then things like the default args example is a small enough change that it doesn't need an AIP (IMO) -ash On 8 March 2021 17:12:07 GMT, Daniel Imberman <daniel.imber...@gmail.com> wrote: >I personally think that TaskGroup should go beyond being “just” a UI concept. >I think that there are a lot of use-cases where people might want to perform a >single operation across an entire group of tasks. I think that Bin points out >a few really good examples (default arguments and group delete are good >examples). I also have a proposal coming out hopefully later this week that >will offer some more functionality to TaskGroup objects as well. >I don’t personally see the benefit of keeping them “UI only.” If we want to be >able to group delete or add external sensors to a group of tasks we’d >basically need to create another concept that centers around “a grouping of >tasks” which I think might create confusion. >On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yuqian1...@gmail.com> wrote: >Hi, all, it's really exciting to see the great discussions about TaskGroup. >There are some interesting ideas here. - Tree View support for TaskGroup: I >think this can mostly be achieved at the web layer? Changes probably involve >tree.html and www/view.py. Should we change Tree View to organize tasks based >on the TaskGroup hierarchy (no need to duplicate tasks in Tree View)? >Currently the Tree View is organized into a flattened graph hierarchy, which >means the same task can appear multiple times in Tree View. - Clear an entire >TaskGroup. We should be able to do this in graph.html and www/view.py too. >E.g. the UI passes the group_id of the TaskGroup to the web server which then >clears the list of tasks in the TaskGroup, which is already an iterable of its >child tasks so this should be possible. In fact, I've heard from several users >that they sometimes want to select multiple tasks on Graph View with the mouse >and then clear all of them at once. This is actually a very similar problem as >clearing a TaskGroup. >Some other ideas such as default_args and ExternalTaskSensor support sound >good too. We can probably continue the discussion on those individual >issues/PRs. >On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang < bin.huan...@gmail.com >[bin.huan...@gmail.com] > wrote: >Hi Kaxil, >One use case I have is to reuse TaskGroup across different DAGs as a >predefined sub-workflow. For example, my team is currently building out a data >platform that will allow a certain level of self-serve ability. Users of the >platform (mostly analyst and scientist) should focus on business logic - >transformation part - while don't need to pay too much attention to some >standard operations (i.e. from S3 to Redshift staging table - validate data - >swap to production table), as these types of tasks are boring and repetitive. >Reuse these sub-workflows also enables us to load data to a different >destination/warehouse without users needing to change their code. We can also >have a notification sub-workflow that allows us to swap in and out >Slack/Pageduty/etc over time without impacting the user. >Other use cases - allow default_args at TaskGroup level as in this issue: >https://github.com/apache/airflow/issues/13911 >[https://github.com/apache/airflow/issues/13911] - ExternalTaskSensor on >TaskGroup as mentioned by Nathan: >https://github.com/apache/airflow/issues/14563 >[https://github.com/apache/airflow/issues/14563] - delete an entire TaskGroup: >https://github.com/apache/airflow/issues/14529 >[https://github.com/apache/airflow/issues/14529] >All these use cases go beyond the pure UI level and require operations >(viewing/triggering/deleting/waiting/etc) on a group of tasks. I think we can >easily implement/formalize this with the current API without changing the >backend too much (this PR https://github.com/apache/airflow/pull/14640 >[https://github.com/apache/airflow/pull/14640] shows a small example). >What do other people think? >Best Bin >On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik < kaxiln...@gmail.com >[kaxiln...@gmail.com] > wrote: >Hi all, interesting discussion. I would love to hear about some more use-cases >where TaskGroup needs to be something more than the UI concept. >All of Kevin's use-cases can be achieved while keeping it as a UI >concept.Xinbin can you please expand a bit on your use case. >Regards, Kaxil >On Sat, Mar 6, 2021, 10:08 Xinbin Huang < bin.huan...@gmail.com >[bin.huan...@gmail.com] > wrote: >Hi Kevin, Vikram, and Nathan, >I think we don't need to restrict too much on keeping TaskGroup only as a UI >concept. We are already using TaskGroup to author DAGs and create >dependencies, which already lies a bit outside the UI. To fully replace >SubDagOperator, I think it's necessary to expand TaskGroup as a container for >tasks than just UI concept. >As for TaskGroupSensor specifically, I land with the same approach as Kevin, >and I have created a draft PR here: >https://github.com/apache/airflow/pull/14640 >[https://github.com/apache/airflow/pull/14640] >Cheers Bin >On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang < yrql...@gmail.com >[yrql...@gmail.com] > wrote: >Hi Vikram, >Good point. What I had in mind was getting the TaskGroup definition in a >sensor, e.g. extract the _task_group field from serialized DAG, and query the >DB for the TI states within. >You are right that it might not be clean nor does it keep TaskGroup as a UI >concept. > >Cheers, Kevin Y >On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vik...@astronomer.io.invalid> >wrote: >Kevin, >I am not sure I understand your response to Nathan. >I agree that it is also a valid use case, but I don't see how it can be >cleanly done while keeping TaskGroup only as a UI concept. Would this require >extending the TaskGroup concept to the backend? >Best regards, Vikram >On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang < yrql...@gmail.com >[yrql...@gmail.com] > wrote: >Hi Nathan, >Thanks a lot for your input and it is indeed a valid use case. This can be >done either keeping TaskGroup as a UI concept or bringing it into the backend. >I'm curious to hear what others think. > >Cheers, Kevin Y >On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield < nathan.hadfi...@king.com >[nathan.hadfi...@king.com] > wrote: >Hi Kevin, > > > >A quick piece of input from our recent experiences of working with TaskGroup >is that we often have dependencies across DAGs that require waiting upon the >completion of all the tasks in a group. At the moment, you basically have two >options: > > > > 1. Create a sensor task in a DAG for every task in the group > 2. Create a Dummy task after the group that a sensor waits on > > > >So, I would certainly like TaskGroups to have some notion of run status as to >better enable downstream decision making. > > > >I’ve already created a feature ticket to try to add some kind of TaskGroup >Sensor but perhaps this can also form part of the wider discussions here. > > > >https://github.com/apache/airflow/issues/14563 >[https://github.com/apache/airflow/issues/14563] > > > >Cheers, > > > >Nathan > > > >From: Kevin Yang < yrql...@gmail.com [yrql...@gmail.com] > >Date: Thursday, 4 March 2021 at 05:21 >To: dev@airflow.apache.org [dev@airflow.apache.org] < dev@airflow.apache.org >[dev@airflow.apache.org] > >Subject: [DISCUSS] TaskGroup in Tree View > >Hi team, > > > >We are very glad to see the introduction of TaskGroup in Airflow 2.0 and >really like it. Thanks to Yu Qian and everyone that contributed to it. To >continue moving towards the goal of replacing SubDagOperator with TaskGroup, >I'd like to kick off a discussion on bringing TaskGroup into Tree View. > > > >Why do we need TaskGroup in Tree View? > >For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the >preferred view for its loading speed and simpler representation. >SubDagOperator is often used to provide an isolated view into a subset of >tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will >need to support Tree View. > > > >What should TaskGroup look like in Tree View? > >We didn't have a conclusion during the 1st iteration of TaskGroup. In Airbnb, >we use SubDag mostly for providing a zoom in view on a small set of tasks and >the SubDag zoom in feature worked well for us. We'd like to see TaskGroup >provide a zoom in option for both Graph View and Tree View but also like to >hear everyone's thoughts. > > > >What needs to be in TaskGroup and what doesn't? > >TaskGroup started off as a pure UI concept while SubDag is something more, >e.g. it has its own DagRun thus isolated scheduling decisions, it can serve as >a logical isolation layer that holds different sets of DAG level params, etc. >While we only use SubDag as a UI feature, I think it would be a good >opportunity for us to discuss what should be TaskGroup and what shouldn't. > > > >Please don't hesitate to share your thoughts. > > > > > >Cheers, > >Kevin Y