Hi, all, I'm interested in contributing to adding TaskGroup to Tree View. Here's a prototype <https://yuqian90.github.io/task_group_tree/> of how it can look like. Suggestions are welcome.
I understand AIP-38 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application> plans to use modern web technology such as React, etc to revamp the Airflow UI. I'm not sure where we are on that front. With AIP-38 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application> in mind, if we make improvements to pages such as Tree View, it makes a lot of sense to contribute it as a reusable TaskInstanceTree component so that it can be used in more than just the Tree View page itself. One other example where a TaskInstanceTree component can be useful is when displaying the confirmation page when a user clears/marks success/marks failed task instances. Right now the confirmation page just concatenates all the TaskInstance text representation. It's very difficult to read when a lot of task instances are cleared or marked. If we put the task instances in the confirmation page into a TaskInstanceTree component with TaskGroup support, it can be much easier to read. If you have ideas regarding how to contribute to Tree View so it's easy to incorporate into AIP-38, definitely let me know. One On Tue, Mar 9, 2021 at 1:19 AM Ash Berlin-Taylor <a...@apache.org> wrote: > I agree, but we should see what of those we can implement just on the > parsing side - i.e. can we continue to make the scheduler not have to care > about Task Groups? > > If so, then things like the default args example is a small enough change > that it doesn't need an AIP (IMO) > > -ash > > On 8 March 2021 17:12:07 GMT, Daniel Imberman <daniel.imber...@gmail.com> > wrote: >> >> I personally think that TaskGroup should go beyond being “just” a UI >> concept. I think that there are a lot of use-cases where people might want >> to perform a single operation across an entire group of tasks. I think that >> Bin points out a few really good examples (default arguments and group >> delete are good examples). I also have a proposal coming out hopefully >> later this week that will offer some more functionality to TaskGroup >> objects as well. >> >> I don’t personally see the benefit of keeping them “UI only.” If we want >> to be able to group delete or add external sensors to a group of tasks we’d >> basically need to create another concept that centers around “a grouping of >> tasks” which I think might create confusion. >> >> On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yuqian1...@gmail.com> wrote: >> >> Hi, all, it's really exciting to see the great discussions about >> TaskGroup. >> >> There are some interesting ideas here. >> - Tree View support for TaskGroup: I think this can mostly be achieved at >> the web layer? Changes probably involve tree.html and www/view.py. >> Should we change Tree View to organize tasks based on the TaskGroup >> hierarchy (no need to duplicate tasks in Tree View)? Currently the Tree >> View is organized into a flattened graph hierarchy, which means the same >> task can appear multiple times in Tree View. >> - Clear an entire TaskGroup. We should be able to do this in graph.html >> and www/view.py too. E.g. the UI passes the group_id of the TaskGroup to >> the web server which then clears the list of tasks in the TaskGroup, which >> is already an iterable of its child tasks so this should be possible. In >> fact, I've heard from several users that they sometimes want to select >> multiple tasks on Graph View with the mouse and then clear all of them at >> once. This is actually a very similar problem as clearing a TaskGroup. >> >> Some other ideas such as default_args and ExternalTaskSensor support >> sound good too. We can probably continue the discussion on those individual >> issues/PRs. >> >> On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang <bin.huan...@gmail.com> >> wrote: >> >>> Hi Kaxil, >>> >>> One use case I have is to reuse TaskGroup across different DAGs as a >>> predefined sub-workflow. For example, my team is currently building out a >>> data platform that will allow a certain level of self-serve ability. Users >>> of the platform (mostly analyst and scientist) should focus on business >>> logic - transformation part - while don't need to pay too much attention to >>> some standard operations (i.e. from S3 to Redshift staging table - validate >>> data - swap to production table), as these types of tasks are boring and >>> repetitive. Reuse these sub-workflows also enables us to load data to a >>> different destination/warehouse without users needing to change their code. >>> We can also have a notification sub-workflow that allows us to swap in and >>> out Slack/Pageduty/etc over time without impacting the user. >>> >>> Other use cases >>> - allow default_args at TaskGroup level as in this issue: >>> https://github.com/apache/airflow/issues/13911 >>> - ExternalTaskSensor on TaskGroup as mentioned by Nathan: >>> https://github.com/apache/airflow/issues/14563 >>> - delete an entire TaskGroup: >>> https://github.com/apache/airflow/issues/14529 >>> >>> All these use cases go beyond the pure UI level and require operations >>> (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I >>> think we can easily implement/formalize this with the current API without >>> changing the backend too much (this PR >>> https://github.com/apache/airflow/pull/14640 shows a small example). >>> >>> What do other people think? >>> >>> Best >>> Bin >>> >>> On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <kaxiln...@gmail.com> wrote: >>> >>>> Hi all, interesting discussion. I would love to hear about some more >>>> use-cases where TaskGroup needs to be something more than the UI concept. >>>> >>>> All of Kevin's use-cases can be achieved while keeping it as a UI >>>> concept.Xinbin can you please expand a bit on your use case. >>>> >>>> Regards, >>>> Kaxil >>>> >>>> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bin.huan...@gmail.com> wrote: >>>> >>>>> Hi Kevin, Vikram, and Nathan, >>>>> >>>>> I think we don't need to restrict too much on keeping TaskGroup only >>>>> as a UI concept. We are already using TaskGroup to author DAGs and create >>>>> dependencies, which already lies a bit outside the UI. >>>>> To fully replace SubDagOperator, I think it's necessary to expand >>>>> TaskGroup as a *container for tasks* than just UI concept. >>>>> >>>>> As for TaskGroupSensor specifically, I land with the same approach as >>>>> Kevin, and I have created a draft PR here: >>>>> https://github.com/apache/airflow/pull/14640 >>>>> >>>>> Cheers >>>>> Bin >>>>> >>>>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yrql...@gmail.com> wrote: >>>>> >>>>>> Hi Vikram, >>>>>> >>>>>> Good point. What I had in mind was getting the TaskGroup definition >>>>>> in a sensor, e.g. extract the _task_group field from serialized DAG, and >>>>>> query the DB for the TI states within. >>>>>> >>>>>> You are right that it might not be clean nor does it keep TaskGroup >>>>>> as a UI concept. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Kevin Y >>>>>> >>>>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka >>>>>> <vik...@astronomer.io.invalid> wrote: >>>>>> >>>>>>> Kevin, >>>>>>> >>>>>>> I am not sure I understand your response to Nathan. >>>>>>> >>>>>>> I agree that it is also a valid use case, but I don't see how it can >>>>>>> be cleanly done while keeping TaskGroup only as a UI concept. >>>>>>> Would this require extending the TaskGroup concept to the backend? >>>>>>> >>>>>>> Best regards, >>>>>>> Vikram >>>>>>> >>>>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yrql...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Nathan, >>>>>>>> >>>>>>>> Thanks a lot for your input and it is indeed a valid use case. This >>>>>>>> can be done either keeping TaskGroup as a UI concept or bringing it >>>>>>>> into >>>>>>>> the backend. I'm curious to hear what others think. >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Kevin Y >>>>>>>> >>>>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield < >>>>>>>> nathan.hadfi...@king.com> wrote: >>>>>>>> >>>>>>>>> Hi Kevin, >>>>>>>>> >>>>>>>>> A quick piece of input from our recent experiences of working with >>>>>>>>> TaskGroup is that we often have dependencies across DAGs that require >>>>>>>>> waiting upon the completion of all the tasks in a group. At the >>>>>>>>> moment, you >>>>>>>>> basically have two options: >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. Create a sensor task in a DAG for every task in the group >>>>>>>>> 2. Create a Dummy task after the group that a sensor waits on >>>>>>>>> >>>>>>>>> So, I would certainly like TaskGroups to have some notion of run >>>>>>>>> status as to better enable downstream decision making. >>>>>>>>> >>>>>>>>> I’ve already created a feature ticket to try to add some kind of >>>>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider >>>>>>>>> discussions here. >>>>>>>>> >>>>>>>>> https://github.com/apache/airflow/issues/14563 >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Nathan >>>>>>>>> >>>>>>>>> *From: *Kevin Yang <yrql...@gmail.com> >>>>>>>>> *Date: *Thursday, 4 March 2021 at 05:21 >>>>>>>>> *To: *dev@airflow.apache.org <dev@airflow.apache.org> >>>>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View >>>>>>>>> >>>>>>>>> Hi team, >>>>>>>>> >>>>>>>>> We are very glad to see the introduction of TaskGroup in Airflow >>>>>>>>> 2.0 and really like it. Thanks to Yu Qian and everyone that >>>>>>>>> contributed to >>>>>>>>> it. To continue moving towards the goal of replacing SubDagOperator >>>>>>>>> with >>>>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup >>>>>>>>> into >>>>>>>>> Tree View. >>>>>>>>> >>>>>>>>> *Why do we need TaskGroup in Tree View?* >>>>>>>>> >>>>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is >>>>>>>>> the preferred view for its loading speed and simpler representation. >>>>>>>>> SubDagOperator is often used to provide an isolated view into a >>>>>>>>> subset of >>>>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup >>>>>>>>> will >>>>>>>>> need to support Tree View. >>>>>>>>> >>>>>>>>> *What should TaskGroup look like in Tree View?* >>>>>>>>> >>>>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup. >>>>>>>>> In Airbnb, we use SubDag mostly for providing a zoom in view on a >>>>>>>>> small set >>>>>>>>> of tasks and the SubDag zoom in feature worked well for us. We'd like >>>>>>>>> to >>>>>>>>> see TaskGroup provide a zoom in option for both Graph View and Tree >>>>>>>>> View >>>>>>>>> but also like to hear everyone's thoughts. >>>>>>>>> >>>>>>>>> *What needs to be in TaskGroup and what doesn't?* >>>>>>>>> >>>>>>>>> TaskGroup started off as a pure UI concept while SubDag is >>>>>>>>> something more, e.g. it has its own DagRun thus isolated scheduling >>>>>>>>> decisions, it can serve as a logical isolation layer that holds >>>>>>>>> different >>>>>>>>> sets of DAG level params, etc. While we only use SubDag as a UI >>>>>>>>> feature, I >>>>>>>>> think it would be a good opportunity for us to discuss what should be >>>>>>>>> TaskGroup and what shouldn't. >>>>>>>>> >>>>>>>>> Please don't hesitate to share your thoughts. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Kevin Y >>>>>>>>> >>>>>>>>