What's your ID i.e. if you haven't created an account yet, please create
one at https://cwiki.apache.org/confluence/signup.action and send us your
ID and we will add permissions.

Thanks. I'll edit the AIP. May I request permission to edit it?
> My wiki user email is yuqian1...@gmail.com.


On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1...@gmail.com> wrote:

> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit it?
> My wiki user email is yuqian1...@gmail.com.
>
> Re Gerard: yes the UI loads all the nodes as json from the web server at
> once. However, it only adds the top level nodes and edges to the graph when
> the Graph View page is first opened. And then adds the expanded nodes to
> the graph as the user expands them. From what I've experienced with DAGs
> containing around 400 tasks (not using TaskGroup or SubDagOperator),
> opening the whole dag in Graph View usually takes 5 seconds. Less than 60ms
> of that is taken by loading the data from webserver. The remaining 4.9s+ is
> taken by javascript functions in dagre-d3.min.js such as createNodes,
> createEdgeLabels, etc and by rendering the graph. With TaskGroup being used
> to group tasks into a smaller number of top-level nodes, the amount of data
> loaded from webserver will remain about the same compared to a flat dag of
> the same size, but the number of nodes and edges needed to be plot on the
> graph can be reduced significantly. So in theory this should speed up the
> time it takes to open Graph View even without lazy-loading the data (I'll
> experiment to find out). That said, if it comes to a point lazy-loading
> helps, we can still implement it as an improvement.
>
> Re James: the Tree View looks as if all all the groups are fully expanded.
> (because under the hood all the tasks are in a single DAG). I'm less
> worried about Tree View at the moment because it already has a mechanism
> for collapsing tasks by the dependency tree. That said, the Tree View can
> definitely be improved too with TaskGroup. (e.g. collapse tasks in the same
> TaskGroup when Tree View is first opened).
>
> For both suggestions, implementing them don't require fundamental changes
> to the idea. I think we can have a basic working TaskGroup first, and then
> improve it incrementally in several PRs as we get more feedback from the
> community. What do you think?
>
> Qian
>
>
> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jcode...@gmail.com> wrote:
>
> > I agree this looks great, one question, how does the tree view look?
> >
> > James Coder
> >
> > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <gcasass...@twitter.com
> .invalid>
> > wrote:
> > >
> > > First of all, this is awesome!!
> > >
> > > Secondly, checking your UI code, seems you are loading all operators at
> > > once. Wondering if we can load them as needed (aka load whenever we
> click
> > > the TaskGroup). Some of our DAGs are so large that take forever to load
> > on
> > > the Graph view, so worried about this still being an issue here. It may
> > be
> > > easily solvable by implementing lazy loading of the graph. Not sure how
> > > easy to implement/add to the UI extension (and dont want to push for
> > early
> > > optimization as its the root of all evil).
> > > Gerard Casas Saez
> > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >
> > >
> > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bin.huan...@gmail.com>
> > wrote:
> > >>
> > >> Hi Yu,
> > >>
> > >> Thank you so much for taking on this. I was fairly distracted
> previously
> > >> and I didn't have the time to update the proposal. In fact, after
> > >> discussing with Ash, Kaxil and Daniel, the direction of this AIP has
> > been
> > >> changed to favor the concept of TaskGroup instead of rewriting
> > >> SubDagOperator (though it may may sense to deprecate SubDag in a
> future
> > >> date.).
> > >>
> > >> Your PR is amazing and it has implemented the desire features. I think
> > we
> > >> can focus on your new PR instead. Do you mind updating the AIP based
> on
> > >> what you have done in your PR?
> > >>
> > >> Best,
> > >> Bin
> > >>
> > >>
> > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yuqian1...@gmail.com>
> wrote:
> > >>>
> > >>> Hi, all, I've added the basic UI changes to my proposed
> implementation
> > of
> > >>> TaskGroup as UI grouping concept:
> > >>> https://github.com/apache/airflow/pull/10153
> > >>>
> > >>> I think Chris had a pretty good specification of TaskGroup so i'm
> > quoting
> > >>> it here. The only thing I don't fully agree with is the restriction
> > >>> "... **cannot*
> > >>> have dependencies between a Task in a TaskGroup and either a*
> > >>> *   Task in a different TaskGroup or a Task not in any group*". I
> think
> > >>> this is over restrictive. Since TaskGroup is a UI concept, tasks can
> > have
> > >>> dependencies on tasks in other TaskGroup or not in any TaskGroup. In
> my
> > >> PR,
> > >>> this is allowed. The graph edges will update accordingly when
> > TaskGroups
> > >>> are expanded/collapsed. TaskGroup is only helping to make the UI look
> > >> less
> > >>> crowded. Under the hood, everything is still a DAG of tasks and edges
> > so
> > >>> things work normally. Here's a screenshot
> > >>> <
> > >>>
> > >>
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > >>>>
> > >>> of the UI interaction.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> dependencies
> > >>> between Tasks in the same TaskGroup, but   *cannot* have dependencies
> > >>> between a Task in a TaskGroup and either a   Task in a different
> > >> TaskGroup
> > >>> or a Task not in any group   - You *can* have dependencies between a
> > >>> TaskGroup and either other   TaskGroups or Tasks not in any group   -
> > The
> > >>> UI will by default render a TaskGroup as a single "object", but
>  which
> > >> you
> > >>> expand or zoom into in some way   - You'd need some way to determine
> > what
> > >>> the "status" of a TaskGroup was   at least for UI display purposes*
> > >>>
> > >>>
> > >>> Regarding Jake's comment, I agree it's possible to implement the
> > >> "retrying
> > >>> tasks in a group" pattern he mentioned as an optional feature of
> > >> TaskGroup
> > >>> although that may go against having TaskGroup as a pure UI concept.
> For
> > >> the
> > >>> motivating example Jake provided, I suggest implementing both
> > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single
> operator.
> > It
> > >>> can do something like BaseSensorOperator.execute() does in
> "reschedule"
> > >>> mode, i.e. it first executes some code to submit the long running job
> > to
> > >>> the external service, and store the state (e.g. in XCom). Then
> > reschedule
> > >>> itself. Subsequent runs then pokes for the completion state.
> > >>>
> > >>>
> > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > >> <jferri...@google.com.invalid
> > >>>>
> > >>> wrote:
> > >>>
> > >>>> I really like this idea of a TaskGroup container as I think this
> will
> > >> be
> > >>>> much easier to use than SubDag.
> > >>>>
> > >>>> I'd like to propose an optional behavior for special retry mechanics
> > >> via
> > >>> a
> > >>>> TaskGroup.retry_all property.
> > >>>> This way I could use TaskGroup to replace my favorite use of SubDag
> > for
> > >>>> atomically retrying tasks of the pattern "act on external state then
> > >>>> reschedule poll until desired state reached".
> > >>>>
> > >>>> Motivating use case I have for a SubDag is very simple two task
> group
> > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > >>>> I use SubDag is because it gives me an easy way to retry the
> > >>> SubmitJobTask
> > >>>> if something about the PollJobSensor fails.
> > >>>> This pattern would be really nice for jobs that are expected to run
> a
> > >>> long
> > >>>> time (because we can use sensor can use reschedule mode freeing up
> > >> slots)
> > >>>> but might fail for a retryable reason.
> > >>>> However, using SubDag to meet this use case defeats the purpose
> > because
> > >>>> SubDag infamously
> > >>>> <
> > >>>>
> > >>>
> > >>
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > >>>>>
> > >>>> blocks a "controller" slot for the entire duration.
> > >>>> This may feel like a cyclic behavior but reality it is very common
> for
> > >> a
> > >>>> single operator to submit job / wait til done.
> > >>>> We could use this case refactor many operators (e.g. BQ, Dataproc,
> > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask]
> with
> > >> an
> > >>>> optional reschedule mode if user knows that this job may take a long
> > >>> time.
> > >>>>
> > >>>> I'd be happy to the development work on adding this specific retry
> > >>> behavior
> > >>>> to TaskGroup once the base concept is implemented if others in the
> > >>>> community would find this a useful feature.
> > >>>>
> > >>>> Cheers,
> > >>>> Jake
> > >>>>
> > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > jarek.pot...@polidea.com
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> All for it :) . I think we are getting closer to have regular
> > >> planning
> > >>>> and
> > >>>>> making some structured approach to 2.0 and starting task force for
> it
> > >>>> soon,
> > >>>>> so I think this should be perfectly fine to discuss and even start
> > >>>>> implementing what's beyond as soon as we make sure that we are
> > >>>> prioritizing
> > >>>>> 2.0 work.
> > >>>>>
> > >>>>> J,
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yuqian1...@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi Jarek,
> > >>>>>>
> > >>>>>> I agree we should not change the behaviour of the existing
> > >>>> SubDagOperator
> > >>>>>> till Airflow 2.1. Is it okay to continue the discussion about
> > >>> TaskGroup
> > >>>>> as
> > >>>>>> a brand new concept/feature independent from the existing
> > >>>> SubDagOperator?
> > >>>>>> In other words, shall we add TaskGroup as a UI grouping concept
> > >> like
> > >>>> Ash
> > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are
> > >>> ready
> > >>>>> with
> > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > >>>>>>
> > >>>>>> I really like Ash's idea of simplifying the SubDagOperator idea
> > >> into
> > >>> a
> > >>>>>> simple UI grouping concept. I think Xinbin's idea of "reattaching
> > >> all
> > >>>> the
> > >>>>>> tasks to the root DAG" is the way to go. And I see James pointed
> > >> out
> > >>> we
> > >>>>>> need some helper functions to simplify dependencies setting of
> > >>>> TaskGroup.
> > >>>>>> Xinbin put up a pretty elegant example in his PR
> > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
> > >>>> TaskGroup
> > >>>>> as
> > >>>>>> a UI concept should be a relatively small change. We can simplify
> > >>>>> Xinbin's
> > >>>>>> PR further. So I put up this alternative proposal here:
> > >>>>>> https://github.com/apache/airflow/pull/10153
> > >>>>>>
> > >>>>>> I have not done any UI changes due to lack of experience with web
> > >> UI.
> > >>>> If
> > >>>>>> anyone's interested, please take a look at the PR.
> > >>>>>>
> > >>>>>> Qian
> > >>>>>>
> > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > >>> jarek.pot...@polidea.com
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Similar point here to the other ideas that are popping up. Maybe
> > >> we
> > >>>>>> should
> > >>>>>>> just focus on completing 2.0 and make all discussions about
> > >> further
> > >>>>>>> improvements to 2.1? While those are important discussions (and
> > >> we
> > >>>>> should
> > >>>>>>> continue them in the  near future !) I think at this point
> > >> focusing
> > >>>> on
> > >>>>>>> delivering 2.0 in its current shape should be our focus now ?
> > >>>>>>>
> > >>>>>>> J.
> > >>>>>>>
> > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > >>> bin.huan...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Daniel
> > >>>>>>>>
> > >>>>>>>> I agree that the TaskGroup should have the same API as a DAG
> > >>> object
> > >>>>>>> related
> > >>>>>>>> to task dependencies, but it will not have anything related to
> > >>>> actual
> > >>>>>>>> execution or scheduling.
> > >>>>>>>> I will update the AIP according to this over the weekend.
> > >>>>>>>>
> > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> > >> import
> > >>>> the
> > >>>>>>> object
> > >>>>>>>> you can import it with parameters to determine the shape of the
> > >>>> DAG.
> > >>>>>>>>
> > >>>>>>>> Can you elaborate a bit more on this? Does it serve a similar
> > >>>> purpose
> > >>>>>> as
> > >>>>>>> a
> > >>>>>>>> DAG factory function?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > >>>>>>> daniel.imber...@gmail.com
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Bin,
> > >>>>>>>>>
> > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g.
> > >>> the
> > >>>>>>> bitwise
> > >>>>>>>>> operator fro task dependencies). We could even make a
> > >>>> “DAGTemplate”
> > >>>>>>>> object
> > >>>>>>>>> s.t. when you import the object you can import it with
> > >>> parameters
> > >>>>> to
> > >>>>>>>>> determine the shape of the DAG.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > >>>>> bin.huan...@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>> The TaskGroup will not take schedule interval as a parameter
> > >>>>> itself,
> > >>>>>>> and
> > >>>>>>>> it
> > >>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
> > >>>>> TaskGroup
> > >>>>>>>> will
> > >>>>>>>>> only contain a group of tasks with interdependencies, and the
> > >>>>>> TaskGroup
> > >>>>>>>>> behaves like a task. It doesn't contain any
> > >>> execution/scheduling
> > >>>>>> logic
> > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
> > >>> like
> > >>>> a
> > >>>>>> DAG
> > >>>>>>>>> does.
> > >>>>>>>>>
> > >>>>>>>>>> For example, there is the scenario that the schedule
> > >> interval
> > >>>> of
> > >>>>>> DAG
> > >>>>>>> is
> > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
> > >>>>>>>>>
> > >>>>>>>>> I am curious why you ask this. Is this a use case that you
> > >> want
> > >>>> to
> > >>>>>>>> achieve?
> > >>>>>>>>>
> > >>>>>>>>> Bin
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > >> thanosxnicho...@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Bin,
> > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
> > >>> same
> > >>>>> as
> > >>>>>>> the
> > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > >> interval
> > >>> of
> > >>>>>>>> TaskGroup
> > >>>>>>>>>> could be different with that of the DAG? For example, there
> > >>> is
> > >>>>> the
> > >>>>>>>>> scenario
> > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > >> schedule
> > >>>>>> interval
> > >>>>>>>> of
> > >>>>>>>>>> TaskGroup is 20 min.
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers,
> > >>>>>>>>>> Nicholas
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > >>>>>> bin.huan...@gmail.com
> > >>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Nicholas,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
> > >>> maybe
> > >>>>> it
> > >>>>>>> will
> > >>>>>>>>>> throw
> > >>>>>>>>>>> an error? But in the original proposal, the subdag's
> > >>>>>>>> schedule_interval
> > >>>>>>>>>> will
> > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
> > >>>> SubDag,
> > >>>>>>> there
> > >>>>>>>>>> will
> > >>>>>>>>>>> be no subdag schedule_interval.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Bin
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > >>>> thanosxnicho...@gmail.com
> > >>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>> Thanks for your good proposal. I was confused whether
> > >> the
> > >>>>>>> schedule
> > >>>>>>>>>>>> interval of SubDAG is different from that of the parent
> > >>>> DAG?
> > >>>>> I
> > >>>>>>> have
> > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
> > >>> of
> > >>>>>>> SubDAG.
> > >>>>>>>> If
> > >>>>>>>>>> the
> > >>>>>>>>>>>> SubDagOperator has a different schedule interval, what
> > >>> will
> > >>>>>>> happen
> > >>>>>>>>> for
> > >>>>>>>>>>> the
> > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>> Nicholas Jiang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > >>>>>>>> bin.huan...@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I have rethought about the concept of subdag and task
> > >>>>>> groups. I
> > >>>>>>>>> think
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > >>> subdag
> > >>>>> and
> > >>>>>>>>>> introduce
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> > >>> along
> > >>>>>> with
> > >>>>>>>>> their
> > >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
> > >>>> DAG*.
> > >>>>>> The
> > >>>>>>>>> only
> > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> > >>> still
> > >>>>> need
> > >>>>>>> to
> > >>>>>>>>> add
> > >>>>>>>>>> it
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>> a DAG for execution.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here is a small code snippet.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> ```
> > >>>>>>>>>>>>> class TaskGroup:
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> If default_args is missing, it will take default args
> > >>>> from
> > >>>>>> the
> > >>>>>>>>>> DAG.
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > >>>>>>>>>>>>> pass
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> > >>> tasks
> > >>>>> to
> > >>>>>> a
> > >>>>>>>> DAG
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> This can be declared in a separate file from the dag
> > >>> file
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > >>>>>>>>>>>> default_args=default_args)
> > >>>>>>>>>>>>> download_group.add_task(task1)
> > >>>>>>>>>>>>> task2.dag = download_group
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> with download_group:
> > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [task, task2] >> task3
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > >>>>>>> default_args=default_args,
> > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > >>>>>>>>>>>>> start >> download_group
> > >>>>>>>>>>>>> # this is equivalent to
> > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > >>>>>>>>>>>>> ```
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> > >> set
> > >>>>>>>> dependencies
> > >>>>>>>>>>>> between
> > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > >>>>>> SubDagOperator,
> > >>>>>>>> and
> > >>>>>>>>>> we
> > >>>>>>>>>>>> can
> > >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > >> Airflow
> > >>>> 2.0
> > >>>>>> and
> > >>>>>>>>> allow
> > >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> > >> want
> > >>>> to
> > >>>>>> keep
> > >>>>>>>> the
> > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Any thoughts?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > >>>>>>>>>>>>> maximebeauche...@gmail.com> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> +1, proposal looks good.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The original intention was really to have tasks
> > >>> groups
> > >>>>> and
> > >>>>>> a
> > >>>>>>>>>>>> zoom-in/out
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
> > >>>>> object
> > >>>>>>>> since
> > >>>>>>>>> it
> > >>>>>>>>>>> is
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > >>> create
> > >>>>>>>> underlying
> > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > >> group
> > >>>> of
> > >>>>>>> tasks.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Max
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > >>>>>>>>>>>>> joshipoornim...@gmail.com>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thank you for your email.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > >>>>>>>>>>> bin.huan...@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > >>>>>> rewrites
> > >>>>>>>> the
> > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > >> it
> > >>>>> will
> > >>>>>>>> give a
> > >>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > >> does
> > >>>>> this I
> > >>>>>>>>> think.
> > >>>>>>>>>> At
> > >>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > >>> representation,
> > >>>>> but
> > >>>>>> at
> > >>>>>>>>> least
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > >> In
> > >>> my
> > >>>>>>>> proposal
> > >>>>>>>>> as
> > >>>>>>>>>>>> also
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > >> from
> > >>>> the
> > >>>>>>> subdag
> > >>>>>>>>> and
> > >>>>>>>>>>> add
> > >>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> > >>>> will
> > >>>>>> look
> > >>>>>>>>>> exactly
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > >> attached
> > >>>> to
> > >>>>>>> those
> > >>>>>>>>>>>> sections.
> > >>>>>>>>>>>>>>> These
> > >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> > >>> UI.
> > >>>>> So
> > >>>>>>>> after
> > >>>>>>>>>>>> parsing
> > >>>>>>>>>>>>> (
> > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > >> the
> > >>>>>>> *root_dag
> > >>>>>>>>>>>> *instead
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> *root_dag +
> > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > >>>>>>>>>> current_group=section-1,
> > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > >>> naming
> > >>>>>>>>>>> suggestions),
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > >>> nested
> > >>>>>> group
> > >>>>>>>> and
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>> able to capture the dependency.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Runtime DAG:
> > >>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> > >>>> like
> > >>>>>> this
> > >>>>>>>> by
> > >>>>>>>>>>>>> utilizing
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > >> in
> > >>>> some
> > >>>>>>> way.
> > >>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > >>> complexity
> > >>>> of
> > >>>>>>>> SubDag
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>> execution
> > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > >> using
> > >>>>>> SubDag.
> > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > >>>>> reusable
> > >>>>>>> dag
> > >>>>>>>>> code
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> > >>> new
> > >>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>> (see
> > >>>>>>>>>>>>>>> AIP
> > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > >>>>> function
> > >>>>>>> for
> > >>>>>>>>>>>>> generating 1
> > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > >>> (in
> > >>>>> this
> > >>>>>>>> case,
> > >>>>>>>>>> it
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > >>> root
> > >>>>>> dag).
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > >>>> with a
> > >>>>>>>>>> simpler
> > >>>>>>>>>>>>>> concept
> > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > >> out
> > >>>> the
> > >>>>>>>>>> contents
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>> SubDag
> > >>>>>>>>>>>>>>>> and becomes more like
> > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > >>>>>>>>>>>>>>> (forgive
> > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> > >>>> still
> > >>>>>>>>>>> necessary
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> keep the
> > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > >>>> name?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > >>>> Chris
> > >>>>>>> Palmer
> > >>>>>>>>> for
> > >>>>>>>>>>>>> helping
> > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> > >>>> will
> > >>>>>> just
> > >>>>>>>>> paste
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>>> here.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > >> in
> > >>>> the
> > >>>>>> same
> > >>>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > >> a
> > >>>>>>> TaskGroup
> > >>>>>>>>>> and
> > >>>>>>>>>>>>>> either a
> > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > >> in
> > >>>> any
> > >>>>>>> group
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>> TaskGroup
> > >>>>> and
> > >>>>>>>>>> either
> > >>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >> as
> > >>> a
> > >>>>>> single
> > >>>>>>>>>>>> "object",
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>>> "status"
> > >>>>>>> of a
> > >>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I agree with Chris:
> > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > >>> executor), I
> > >>>>>> think
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > >> to
> > >>>>>>> implement
> > >>>>>>>>>> some
> > >>>>>>>>>>>>>> metadata
> > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > >>> tasks
> > >>>>>> etc.)
> > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> > >>> up
> > >>>>> the
> > >>>>>>>>>> individual
> > >>>>>>>>>>>>>> tasks'
> > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > >> status
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > >> Imberman
> > >>> <
> > >>>>>>>>>>>>>>>> daniel.imber...@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> > >>> to
> > >>>>> tie
> > >>>>>>> dags
> > >>>>>>>>>>>> together
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> > >>>> could
> > >>>>>>>>>> essentially
> > >>>>>>>>>>>>> write
> > >>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > >>>> starter-tasks
> > >>>>>> for
> > >>>>>>>>> that
> > >>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > >> UI
> > >>>>>> concept.
> > >>>>>>>> It
> > >>>>>>>>>>>> doesn’t
> > >>>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> > >>>> tasks
> > >>>>>> to
> > >>>>>>>> the
> > >>>>>>>>>>> queue
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> be executed when there are resources
> > >> available.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> via Newton Mail [
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > >>>>>>>>>>>>>>>>> ]
> > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > >> <
> > >>>>>>>>>>> ch...@crpalmer.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > >>>>>> abstraction.
> > >>>>>>> I
> > >>>>>>>>>> think
> > >>>>>>>>>>>> what
> > >>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > >> high
> > >>>>> level
> > >>>>>> I
> > >>>>>>>>> think
> > >>>>>>>>>>> you
> > >>>>>>>>>>>>> want
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>> functionality:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> > >>> the
> > >>>>>> same
> > >>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> > >>>>>> TaskGroup
> > >>>>>>>> and
> > >>>>>>>>>>>> either
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> > >>> any
> > >>>>>> group
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>> TaskGroup
> > >>>>> and
> > >>>>>>>> either
> > >>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >> as a
> > >>>>>> single
> > >>>>>>>>>>> "object",
> > >>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>> "status"
> > >>>>>> of
> > >>>>>>> a
> > >>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > >>> object
> > >>>>>> with
> > >>>>>>>> its
> > >>>>>>>>>> own
> > >>>>>>>>>>>>>> database
> > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > >>>> tasks.
> > >>>>> I
> > >>>>>>>> think
> > >>>>>>>>>> you
> > >>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>> build
> > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > >> point
> > >>> of
> > >>>>>> view
> > >>>>>>> a
> > >>>>>>>>> DAG
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > >> differently.
> > >>> So
> > >>>>> it
> > >>>>>>>> really
> > >>>>>>>>>>> just
> > >>>>>>>>>>>>>>> becomes
> > >>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> > >>> of
> > >>>>>> Tasks,
> > >>>>>>>> and
> > >>>>>>>>>>>> allows
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Chris
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > >>>>>>>>>>>>>>> <ddavy...@twitter.com.invalid
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > >> the
> > >>>> more
> > >>>>>>>>> important
> > >>>>>>>>>>>> issue
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> fix),
> > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > >>> right
> > >>>>> way
> > >>>>>>>>> forward
> > >>>>>>>>>>>> (just
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> might
> > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > >>> adding
> > >>>>>>> visual
> > >>>>>>>>>>> grouping
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> UI).
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > >>> with
> > >>>>> more
> > >>>>>>>>> context
> > >>>>>>>>>>> on
> > >>>>>>>>>>>>> why
> > >>>>>>>>>>>>>>>>> subdags
> > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > >>>>>>>>>>>>>> . A
> > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > >> is
> > >>>> e.g.
> > >>>>>>>>> enabling
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> operator
> > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > >>>> well. I
> > >>>>>> see
> > >>>>>>>>> this
> > >>>>>>>>>>>> being
> > >>>>>>>>>>>>>>>>> separate
> > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > >> UI
> > >>>> but
> > >>>>>> one
> > >>>>>>> of
> > >>>>>>>>> the
> > >>>>>>>>>>> two
> > >>>>>>>>>>>>>> items
> > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > >>>>>> functionality.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > >> and
> > >>>>> they
> > >>>>>>> are
> > >>>>>>>>>>> always a
> > >>>>>>>>>>>>>> giant
> > >>>>>>>>>>>>>>>>> pain
> > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > >>>>> confusion
> > >>>>>>> and
> > >>>>>>>>>>>> breakages
> > >>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > >> Coder <
> > >>>>>>>>>>>> jcode...@gmail.com>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > >> UI
> > >>>>>>> concept. I
> > >>>>>>>>> use
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > >>> you
> > >>>>>> have a
> > >>>>>>>>> group
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > >> tasks
> > >>>>>> start,
> > >>>>>>>>> using
> > >>>>>>>>>> a
> > >>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > >>>> and I
> > >>>>>>> think
> > >>>>>>>>>> also
> > >>>>>>>>>>>> make
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>> easier
> > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > >> Hamlin
> > >>> <
> > >>>>>>>>>>>>> hamlin...@gmail.com>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > >>>>>> Berlin-Taylor
> > >>>>>>> <
> > >>>>>>>>>>>>>> a...@apache.org
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Question:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > >>>> anymore?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > >>>>> replacing
> > >>>>>> it
> > >>>>>>>>> with
> > >>>>>>>>>> a
> > >>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>> grouping
> > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > >> to
> > >>>> get
> > >>>>>>>> wrong,
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> closer
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > >>>> subdags?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > >>>> subdags
> > >>>>>>> could
> > >>>>>>>>>> start
> > >>>>>>>>>>>>>> running
> > >>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > >> we
> > >>>> not
> > >>>>>>> also
> > >>>>>>>>> just
> > >>>>>>>>>>>>>>> _enitrely_
> > >>>>>>>>>>>>>>>>>>> remove
> > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > >> it
> > >>>> with
> > >>>>>>>>> something
> > >>>>>>>>>>>>>> simpler.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > >>> haven't
> > >>>>> used
> > >>>>>>>> them
> > >>>>>>>>>>>>>> extensively
> > >>>>>>>>>>>>>>> so
> > >>>>>>>>>>>>>>>>>> may
> > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > >>>> has(?)
> > >>>>> to
> > >>>>>>> be
> > >>>>>>>> of
> > >>>>>>>>>> the
> > >>>>>>>>>>>>> form
> > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > >> schedule_interval,
> > >>>> but
> > >>>>>> it
> > >>>>>>>> has
> > >>>>>>>>> to
> > >>>>>>>>>>>> match
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> parent
> > >>>>>>>>>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > >>>> (Does
> > >>>>>> it
> > >>>>>>>> make
> > >>>>>>>>>>> sense
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>> this?
> > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > >>> sub
> > >>>>> dag
> > >>>>>>>> would
> > >>>>>>>>>>> never
> > >>>>>>>>>>>>>>>>> execute, so
> > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > >>>>> operator a
> > >>>>>>>>> subdag
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>> always
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > >>>>>> Berlin-Taylor <
> > >>>>>>>>>>>>>> a...@apache.org>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > >>>>> excited
> > >>>>>> to
> > >>>>>>>> see
> > >>>>>>>>>> how
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>> progresses.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>> parsing*:
> > >>>>> This
> > >>>>>>>>>> rewrites
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>> parsing,
> > >>>>> and
> > >>>>>> it
> > >>>>>>>>> will
> > >>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > >>>> already
> > >>>>>> does
> > >>>>>>>>> this
> > >>>>>>>>>> I
> > >>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>> At
> > >>>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > >>>> correctly.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > >>>> Huang <
> > >>>>>>>>>>>>>>> bin.huan...@gmail.com
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > >>>> collect
> > >>>>>>>>> feedback
> > >>>>>>>>>> on
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> AIP-34
> > >>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > >>>>>> previously
> > >>>>>>>>>> briefly
> > >>>>>>>>>>>>>>>>> mentioned in
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > >>> done
> > >>>>> for
> > >>>>>>>>> Airflow
> > >>>>>>>>>>> 2.0,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> one of
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > >>> attach
> > >>>>>> tasks
> > >>>>>>>> back
> > >>>>>>>>>> to
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> root
> > >>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > >>>>>> SubDagOperator
> > >>>>>>>>>> related
> > >>>>>>>>>>>>>> issues
> > >>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>> reattaching
> > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > >> while
> > >>>>>>> respecting
> > >>>>>>>>>>>>>> dependencies
> > >>>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > >> effect
> > >>>> on
> > >>>>>> the
> > >>>>>>> UI
> > >>>>>>>>>> will
> > >>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>> achieved
> > >>>>>>>>>>>>>>>>>>>> through
> > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > >>>> function
> > >>>>>> more
> > >>>>>>>>>>> reusable
> > >>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > >>>>>>> child_dag_name
> > >>>>>>>>> in
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>>>>>>>> signature
> > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>> parsing*:
> > >>>>> This
> > >>>>>>>>>> rewrites
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>> parsing,
> > >>>>> and
> > >>>>>> it
> > >>>>>>>>> will
> > >>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > >> new
> > >>>>>>>>> SubDagOperator
> > >>>>>>>>>>>> acts
> > >>>>>>>>>>>>>>> like a
> > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > >>>>> methods
> > >>>>>>> are
> > >>>>>>>>>>> removed.
> > >>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > >> *with
> > >>>>>>>>> *subdag_args
> > >>>>>>>>>>> *and
> > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > >> PythonOperator
> > >>>>>>>> signature.
> > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > >>>>>>> current_group
> > >>>>>>>> &
> > >>>>>>>>>>>>>> parent_group
> > >>>>>>>>>>>>>>>>>>>>> attributes
> > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > >>> used
> > >>>>> to
> > >>>>>>>> group
> > >>>>>>>>>>> tasks
> > >>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > >>>>> further
> > >>>>>>> to
> > >>>>>>>>>> group
> > >>>>>>>>>>>>>>> arbitrary
> > >>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > >>> allow
> > >>>>>>>>> group-level
> > >>>>>>>>>>>>>> operations
> > >>>>>>>>>>>>>>>>>>> (i.e.
> > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > >>> the
> > >>>>>> dag)
> > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > >> Proposed
> > >>>> UI
> > >>>>>>>>>> modification
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > >>>> flat
> > >>>>>>>>> structure
> > >>>>>>>>>> to
> > >>>>>>>>>>>>> pair
> > >>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> first
> > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > >>>>> hierarchical
> > >>>>>>>>>>> structure.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > >> PRs
> > >>>> for
> > >>>>>>>> details:
> > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > >>>>> aspects
> > >>>>>>>> that
> > >>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>> agree/disagree
> > >>>>>>>>>>>>>>>>>>>>>>> with or
> > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > >>> the
> > >>>>>> third
> > >>>>>>>>>> change
> > >>>>>>>>>>>>>>> regarding
> > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > >>>> looking
> > >>>>>>>> forward
> > >>>>>>>>>> to
> > >>>>>>>>>>>> it!
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>> Thanks & Regards
> > >>>>>>>>>>>>>>> Poornima
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>>
> > >>>>>>> Jarek Potiuk
> > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>>>
> > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > >>>>> <+48%20660%20796%20129>>
> > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>>
> > >>>>> Jarek Potiuk
> > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>
> > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > >>>>> <+48%20660%20796%20129>>
> > >>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>> *Jacob Ferriero*
> > >>>>
> > >>>> Strategic Cloud Engineer: Data Engineering
> > >>>>
> > >>>> jferri...@google.com
> > >>>>
> > >>>> 617-714-2509
> > >>>>
> > >>>
> > >>
> >
> >
>

Reply via email to