Hi, all,

I'm interested in contributing to adding TaskGroup to Tree View. Here's a
prototype <https://yuqian90.github.io/task_group_tree/> of how it can look
like. Suggestions are welcome.

I understand AIP-38
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application>
plans
to use modern web technology such as React, etc to revamp the Airflow UI.
I'm not sure where we are on that front. With AIP-38
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application>
in
mind, if we make improvements to pages such as Tree View, it makes a lot of
sense to contribute it as a reusable TaskInstanceTree component so that it
can be used in more than just the Tree View page itself. One other example
where a TaskInstanceTree component can be useful is when displaying the
confirmation page when a user clears/marks success/marks failed task
instances. Right now the confirmation page just concatenates all the
TaskInstance text representation. It's very difficult to read when a lot of
task instances are cleared or marked. If we put the task instances in the
confirmation page into a TaskInstanceTree component with TaskGroup support,
it can be much easier to read. If you have ideas regarding how to
contribute to Tree View so it's easy to incorporate into AIP-38, definitely
let me know.

One

On Tue, Mar 9, 2021 at 1:19 AM Ash Berlin-Taylor <a...@apache.org> wrote:

> I agree, but we should see what of those we can implement just on the
> parsing side - i.e. can we continue to make the scheduler not have to care
> about Task Groups?
>
> If so, then things like the default args example is a small enough change
> that it doesn't need an AIP (IMO)
>
> -ash
>
> On 8 March 2021 17:12:07 GMT, Daniel Imberman <daniel.imber...@gmail.com>
> wrote:
>>
>> I personally think that TaskGroup should go beyond being “just” a UI
>> concept. I think that there are a lot of use-cases where people might want
>> to perform a single operation across an entire group of tasks. I think that
>> Bin points out a few really good examples (default arguments and group
>> delete are good examples). I also have a proposal coming out hopefully
>> later this week that will offer some more functionality to TaskGroup
>> objects as well.
>>
>> I don’t personally see the benefit of keeping them “UI only.” If we want
>> to be able to group delete or add external sensors to a group of tasks we’d
>> basically need to create another concept that centers around “a grouping of
>> tasks” which I think might create confusion.
>>
>> On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yuqian1...@gmail.com> wrote:
>>
>> Hi, all, it's really exciting to see the great discussions about
>> TaskGroup.
>>
>> There are some interesting ideas here.
>> - Tree View support for TaskGroup: I think this can mostly be achieved at
>> the web layer? Changes probably involve tree.html and www/view.py.
>> Should we change Tree View to organize tasks based on the TaskGroup
>> hierarchy (no need to duplicate tasks in Tree View)? Currently the Tree
>> View is organized into a flattened graph hierarchy, which means the same
>> task can appear multiple times in Tree View.
>> - Clear an entire TaskGroup. We should be able to do this in graph.html
>> and www/view.py too. E.g. the UI passes the group_id of the TaskGroup to
>> the web server which then clears the list of tasks in the TaskGroup, which
>> is already an iterable of its child tasks so this should be possible. In
>> fact, I've heard from several users that they sometimes want to select
>> multiple tasks on Graph View with the mouse and then clear all of them at
>> once. This is actually a very similar problem as clearing a TaskGroup.
>>
>> Some other ideas such as default_args and ExternalTaskSensor support
>> sound good too. We can probably continue the discussion on those individual
>> issues/PRs.
>>
>> On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang <bin.huan...@gmail.com>
>> wrote:
>>
>>> Hi Kaxil,
>>>
>>> One use case I have is to reuse TaskGroup across different DAGs as a
>>> predefined sub-workflow. For example, my team is currently building out a
>>> data platform that will allow a certain level of self-serve ability. Users
>>> of the platform (mostly analyst and scientist) should focus on business
>>> logic - transformation part - while don't need to pay too much attention to
>>> some standard operations (i.e. from S3 to Redshift staging table - validate
>>> data - swap to production table), as these types of tasks are boring and
>>> repetitive. Reuse these sub-workflows also enables us to load data to a
>>> different destination/warehouse without users needing to change their code.
>>> We can also have a notification sub-workflow that allows us to swap in and
>>> out Slack/Pageduty/etc over time without impacting the user.
>>>
>>> Other use cases
>>> - allow default_args at TaskGroup level as in this issue:
>>> https://github.com/apache/airflow/issues/13911
>>> - ExternalTaskSensor on TaskGroup as mentioned by Nathan:
>>> https://github.com/apache/airflow/issues/14563
>>> - delete an entire TaskGroup:
>>> https://github.com/apache/airflow/issues/14529
>>>
>>> All these use cases go beyond the pure UI level and require operations
>>> (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I
>>> think we can easily implement/formalize this with the current API without
>>> changing the backend too much (this PR
>>> https://github.com/apache/airflow/pull/14640 shows a small example).
>>>
>>> What do other people think?
>>>
>>> Best
>>> Bin
>>>
>>> On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>>
>>>> Hi all, interesting discussion. I would love to hear about some more
>>>> use-cases where TaskGroup needs to be something more than the UI concept.
>>>>
>>>> All of Kevin's use-cases can be achieved while keeping it as a UI
>>>> concept.Xinbin can you please expand a bit on your use case.
>>>>
>>>> Regards,
>>>> Kaxil
>>>>
>>>> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bin.huan...@gmail.com> wrote:
>>>>
>>>>> Hi Kevin, Vikram, and Nathan,
>>>>>
>>>>> I think we don't need to restrict too much on keeping TaskGroup only
>>>>> as a UI concept. We are already using TaskGroup to author DAGs and create
>>>>> dependencies, which already lies a bit outside the UI.
>>>>> To fully replace SubDagOperator, I think it's necessary to expand
>>>>> TaskGroup as a *container for tasks* than just UI concept.
>>>>>
>>>>> As for TaskGroupSensor specifically, I land with the same approach as
>>>>> Kevin, and I have created a draft PR here:
>>>>> https://github.com/apache/airflow/pull/14640
>>>>>
>>>>> Cheers
>>>>> Bin
>>>>>
>>>>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yrql...@gmail.com> wrote:
>>>>>
>>>>>> Hi Vikram,
>>>>>>
>>>>>> Good point. What I had in mind was getting the TaskGroup definition
>>>>>> in a sensor, e.g. extract the _task_group field from serialized DAG, and
>>>>>> query the DB for the TI states within.
>>>>>>
>>>>>> You are right that it might not be clean nor does it keep TaskGroup
>>>>>> as a UI concept.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Kevin Y
>>>>>>
>>>>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka
>>>>>> <vik...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> Kevin,
>>>>>>>
>>>>>>> I am not sure I understand your response to Nathan.
>>>>>>>
>>>>>>> I agree that it is also a valid use case, but I don't see how it can
>>>>>>> be cleanly done while keeping TaskGroup only as a UI concept.
>>>>>>> Would this require extending the TaskGroup concept to the backend?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vikram
>>>>>>>
>>>>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yrql...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Nathan,
>>>>>>>>
>>>>>>>> Thanks a lot for your input and it is indeed a valid use case. This
>>>>>>>> can be done either keeping TaskGroup as a UI concept or bringing it 
>>>>>>>> into
>>>>>>>> the backend. I'm curious to hear what others think.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Kevin Y
>>>>>>>>
>>>>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>>>>> nathan.hadfi...@king.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Kevin,
>>>>>>>>>
>>>>>>>>> A quick piece of input from our recent experiences of working with
>>>>>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>>>>>> waiting upon the completion of all the tasks in a group. At the 
>>>>>>>>> moment, you
>>>>>>>>> basically have two options:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>>>>
>>>>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>>>>> status as to better enable downstream decision making.
>>>>>>>>>
>>>>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>>>>> discussions here.
>>>>>>>>>
>>>>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Nathan
>>>>>>>>>
>>>>>>>>> *From: *Kevin Yang <yrql...@gmail.com>
>>>>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>>>>> *To: *dev@airflow.apache.org <dev@airflow.apache.org>
>>>>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>>>>
>>>>>>>>> Hi team,
>>>>>>>>>
>>>>>>>>> We are very glad to see the introduction of TaskGroup in Airflow
>>>>>>>>> 2.0 and really like it. Thanks to Yu Qian and everyone that 
>>>>>>>>> contributed to
>>>>>>>>> it. To continue moving towards the goal of replacing SubDagOperator 
>>>>>>>>> with
>>>>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup 
>>>>>>>>> into
>>>>>>>>> Tree View.
>>>>>>>>>
>>>>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>>>>
>>>>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is
>>>>>>>>> the preferred view for its loading speed and simpler representation.
>>>>>>>>> SubDagOperator is often used to provide an isolated view into a 
>>>>>>>>> subset of
>>>>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup 
>>>>>>>>> will
>>>>>>>>> need to support Tree View.
>>>>>>>>>
>>>>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>>>>
>>>>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup.
>>>>>>>>> In Airbnb, we use SubDag mostly for providing a zoom in view on a 
>>>>>>>>> small set
>>>>>>>>> of tasks and the SubDag zoom in feature worked well for us. We'd like 
>>>>>>>>> to
>>>>>>>>> see TaskGroup provide a zoom in option for both Graph View and Tree 
>>>>>>>>> View
>>>>>>>>> but also like to hear everyone's thoughts.
>>>>>>>>>
>>>>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>>>>
>>>>>>>>> TaskGroup started off as a pure UI concept while SubDag is
>>>>>>>>> something more, e.g. it has its own DagRun thus isolated scheduling
>>>>>>>>> decisions, it can serve as a logical isolation layer that holds 
>>>>>>>>> different
>>>>>>>>> sets of DAG level params, etc. While we only use SubDag as a UI 
>>>>>>>>> feature, I
>>>>>>>>> think it would be a good opportunity for us to discuss what should be
>>>>>>>>> TaskGroup and what shouldn't.
>>>>>>>>>
>>>>>>>>> Please don't hesitate to share your thoughts.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Kevin Y
>>>>>>>>>
>>>>>>>>

Reply via email to