What's your ID i.e. if you haven't created an account yet, please create one at https://cwiki.apache.org/confluence/signup.action and send us your ID and we will add permissions.
Thanks. I'll edit the AIP. May I request permission to edit it? > My wiki user email is yuqian1...@gmail.com. On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1...@gmail.com> wrote: > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit it? > My wiki user email is yuqian1...@gmail.com. > > Re Gerard: yes the UI loads all the nodes as json from the web server at > once. However, it only adds the top level nodes and edges to the graph when > the Graph View page is first opened. And then adds the expanded nodes to > the graph as the user expands them. From what I've experienced with DAGs > containing around 400 tasks (not using TaskGroup or SubDagOperator), > opening the whole dag in Graph View usually takes 5 seconds. Less than 60ms > of that is taken by loading the data from webserver. The remaining 4.9s+ is > taken by javascript functions in dagre-d3.min.js such as createNodes, > createEdgeLabels, etc and by rendering the graph. With TaskGroup being used > to group tasks into a smaller number of top-level nodes, the amount of data > loaded from webserver will remain about the same compared to a flat dag of > the same size, but the number of nodes and edges needed to be plot on the > graph can be reduced significantly. So in theory this should speed up the > time it takes to open Graph View even without lazy-loading the data (I'll > experiment to find out). That said, if it comes to a point lazy-loading > helps, we can still implement it as an improvement. > > Re James: the Tree View looks as if all all the groups are fully expanded. > (because under the hood all the tasks are in a single DAG). I'm less > worried about Tree View at the moment because it already has a mechanism > for collapsing tasks by the dependency tree. That said, the Tree View can > definitely be improved too with TaskGroup. (e.g. collapse tasks in the same > TaskGroup when Tree View is first opened). > > For both suggestions, implementing them don't require fundamental changes > to the idea. I think we can have a basic working TaskGroup first, and then > improve it incrementally in several PRs as we get more feedback from the > community. What do you think? > > Qian > > > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jcode...@gmail.com> wrote: > > > I agree this looks great, one question, how does the tree view look? > > > > James Coder > > > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <gcasass...@twitter.com > .invalid> > > wrote: > > > > > > First of all, this is awesome!! > > > > > > Secondly, checking your UI code, seems you are loading all operators at > > > once. Wondering if we can load them as needed (aka load whenever we > click > > > the TaskGroup). Some of our DAGs are so large that take forever to load > > on > > > the Graph view, so worried about this still being an issue here. It may > > be > > > easily solvable by implementing lazy loading of the graph. Not sure how > > > easy to implement/add to the UI extension (and dont want to push for > > early > > > optimization as its the root of all evil). > > > Gerard Casas Saez > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > > > > > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bin.huan...@gmail.com> > > wrote: > > >> > > >> Hi Yu, > > >> > > >> Thank you so much for taking on this. I was fairly distracted > previously > > >> and I didn't have the time to update the proposal. In fact, after > > >> discussing with Ash, Kaxil and Daniel, the direction of this AIP has > > been > > >> changed to favor the concept of TaskGroup instead of rewriting > > >> SubDagOperator (though it may may sense to deprecate SubDag in a > future > > >> date.). > > >> > > >> Your PR is amazing and it has implemented the desire features. I think > > we > > >> can focus on your new PR instead. Do you mind updating the AIP based > on > > >> what you have done in your PR? > > >> > > >> Best, > > >> Bin > > >> > > >> > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yuqian1...@gmail.com> > wrote: > > >>> > > >>> Hi, all, I've added the basic UI changes to my proposed > implementation > > of > > >>> TaskGroup as UI grouping concept: > > >>> https://github.com/apache/airflow/pull/10153 > > >>> > > >>> I think Chris had a pretty good specification of TaskGroup so i'm > > quoting > > >>> it here. The only thing I don't fully agree with is the restriction > > >>> "... **cannot* > > >>> have dependencies between a Task in a TaskGroup and either a* > > >>> * Task in a different TaskGroup or a Task not in any group*". I > think > > >>> this is over restrictive. Since TaskGroup is a UI concept, tasks can > > have > > >>> dependencies on tasks in other TaskGroup or not in any TaskGroup. In > my > > >> PR, > > >>> this is allowed. The graph edges will update accordingly when > > TaskGroups > > >>> are expanded/collapsed. TaskGroup is only helping to make the UI look > > >> less > > >>> crowded. Under the hood, everything is still a DAG of tasks and edges > > so > > >>> things work normally. Here's a screenshot > > >>> < > > >>> > > >> > > > https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif > > >>>> > > >>> of the UI interaction. > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> * - Tasks can be added to a TaskGroup - You *can* have > dependencies > > >>> between Tasks in the same TaskGroup, but *cannot* have dependencies > > >>> between a Task in a TaskGroup and either a Task in a different > > >> TaskGroup > > >>> or a Task not in any group - You *can* have dependencies between a > > >>> TaskGroup and either other TaskGroups or Tasks not in any group - > > The > > >>> UI will by default render a TaskGroup as a single "object", but > which > > >> you > > >>> expand or zoom into in some way - You'd need some way to determine > > what > > >>> the "status" of a TaskGroup was at least for UI display purposes* > > >>> > > >>> > > >>> Regarding Jake's comment, I agree it's possible to implement the > > >> "retrying > > >>> tasks in a group" pattern he mentioned as an optional feature of > > >> TaskGroup > > >>> although that may go against having TaskGroup as a pure UI concept. > For > > >> the > > >>> motivating example Jake provided, I suggest implementing both > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single > operator. > > It > > >>> can do something like BaseSensorOperator.execute() does in > "reschedule" > > >>> mode, i.e. it first executes some code to submit the long running job > > to > > >>> the external service, and store the state (e.g. in XCom). Then > > reschedule > > >>> itself. Subsequent runs then pokes for the completion state. > > >>> > > >>> > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero > > >> <jferri...@google.com.invalid > > >>>> > > >>> wrote: > > >>> > > >>>> I really like this idea of a TaskGroup container as I think this > will > > >> be > > >>>> much easier to use than SubDag. > > >>>> > > >>>> I'd like to propose an optional behavior for special retry mechanics > > >> via > > >>> a > > >>>> TaskGroup.retry_all property. > > >>>> This way I could use TaskGroup to replace my favorite use of SubDag > > for > > >>>> atomically retrying tasks of the pattern "act on external state then > > >>>> reschedule poll until desired state reached". > > >>>> > > >>>> Motivating use case I have for a SubDag is very simple two task > group > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor]. > > >>>> I use SubDag is because it gives me an easy way to retry the > > >>> SubmitJobTask > > >>>> if something about the PollJobSensor fails. > > >>>> This pattern would be really nice for jobs that are expected to run > a > > >>> long > > >>>> time (because we can use sensor can use reschedule mode freeing up > > >> slots) > > >>>> but might fail for a retryable reason. > > >>>> However, using SubDag to meet this use case defeats the purpose > > because > > >>>> SubDag infamously > > >>>> < > > >>>> > > >>> > > >> > > > https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10 > > >>>>> > > >>>> blocks a "controller" slot for the entire duration. > > >>>> This may feel like a cyclic behavior but reality it is very common > for > > >> a > > >>>> single operator to submit job / wait til done. > > >>>> We could use this case refactor many operators (e.g. BQ, Dataproc, > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] > with > > >> an > > >>>> optional reschedule mode if user knows that this job may take a long > > >>> time. > > >>>> > > >>>> I'd be happy to the development work on adding this specific retry > > >>> behavior > > >>>> to TaskGroup once the base concept is implemented if others in the > > >>>> community would find this a useful feature. > > >>>> > > >>>> Cheers, > > >>>> Jake > > >>>> > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk < > > jarek.pot...@polidea.com > > >>> > > >>>> wrote: > > >>>> > > >>>>> All for it :) . I think we are getting closer to have regular > > >> planning > > >>>> and > > >>>>> making some structured approach to 2.0 and starting task force for > it > > >>>> soon, > > >>>>> so I think this should be perfectly fine to discuss and even start > > >>>>> implementing what's beyond as soon as we make sure that we are > > >>>> prioritizing > > >>>>> 2.0 work. > > >>>>> > > >>>>> J, > > >>>>> > > >>>>> > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yuqian1...@gmail.com> > > >> wrote: > > >>>>> > > >>>>>> Hi Jarek, > > >>>>>> > > >>>>>> I agree we should not change the behaviour of the existing > > >>>> SubDagOperator > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion about > > >>> TaskGroup > > >>>>> as > > >>>>>> a brand new concept/feature independent from the existing > > >>>> SubDagOperator? > > >>>>>> In other words, shall we add TaskGroup as a UI grouping concept > > >> like > > >>>> Ash > > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are > > >>> ready > > >>>>> with > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1. > > >>>>>> > > >>>>>> I really like Ash's idea of simplifying the SubDagOperator idea > > >> into > > >>> a > > >>>>>> simple UI grouping concept. I think Xinbin's idea of "reattaching > > >> all > > >>>> the > > >>>>>> tasks to the root DAG" is the way to go. And I see James pointed > > >> out > > >>> we > > >>>>>> need some helper functions to simplify dependencies setting of > > >>>> TaskGroup. > > >>>>>> Xinbin put up a pretty elegant example in his PR > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think having > > >>>> TaskGroup > > >>>>> as > > >>>>>> a UI concept should be a relatively small change. We can simplify > > >>>>> Xinbin's > > >>>>>> PR further. So I put up this alternative proposal here: > > >>>>>> https://github.com/apache/airflow/pull/10153 > > >>>>>> > > >>>>>> I have not done any UI changes due to lack of experience with web > > >> UI. > > >>>> If > > >>>>>> anyone's interested, please take a look at the PR. > > >>>>>> > > >>>>>> Qian > > >>>>>> > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk < > > >>> jarek.pot...@polidea.com > > >>>>> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Similar point here to the other ideas that are popping up. Maybe > > >> we > > >>>>>> should > > >>>>>>> just focus on completing 2.0 and make all discussions about > > >> further > > >>>>>>> improvements to 2.1? While those are important discussions (and > > >> we > > >>>>> should > > >>>>>>> continue them in the near future !) I think at this point > > >> focusing > > >>>> on > > >>>>>>> delivering 2.0 in its current shape should be our focus now ? > > >>>>>>> > > >>>>>>> J. > > >>>>>>> > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang < > > >>> bin.huan...@gmail.com> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hi Daniel > > >>>>>>>> > > >>>>>>>> I agree that the TaskGroup should have the same API as a DAG > > >>> object > > >>>>>>> related > > >>>>>>>> to task dependencies, but it will not have anything related to > > >>>> actual > > >>>>>>>> execution or scheduling. > > >>>>>>>> I will update the AIP according to this over the weekend. > > >>>>>>>> > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you > > >> import > > >>>> the > > >>>>>>> object > > >>>>>>>> you can import it with parameters to determine the shape of the > > >>>> DAG. > > >>>>>>>> > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a similar > > >>>> purpose > > >>>>>> as > > >>>>>>> a > > >>>>>>>> DAG factory function? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman < > > >>>>>>> daniel.imber...@gmail.com > > >>>>>>>>> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi Bin, > > >>>>>>>>> > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g. > > >>> the > > >>>>>>> bitwise > > >>>>>>>>> operator fro task dependencies). We could even make a > > >>>> “DAGTemplate” > > >>>>>>>> object > > >>>>>>>>> s.t. when you import the object you can import it with > > >>> parameters > > >>>>> to > > >>>>>>>>> determine the shape of the DAG. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang < > > >>>>> bin.huan...@gmail.com > > >>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>> The TaskGroup will not take schedule interval as a parameter > > >>>>> itself, > > >>>>>>> and > > >>>>>>>> it > > >>>>>>>>> depends on the DAG where it attaches to. In my opinion, the > > >>>>> TaskGroup > > >>>>>>>> will > > >>>>>>>>> only contain a group of tasks with interdependencies, and the > > >>>>>> TaskGroup > > >>>>>>>>> behaves like a task. It doesn't contain any > > >>> execution/scheduling > > >>>>>> logic > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.) > > >>> like > > >>>> a > > >>>>>> DAG > > >>>>>>>>> does. > > >>>>>>>>> > > >>>>>>>>>> For example, there is the scenario that the schedule > > >> interval > > >>>> of > > >>>>>> DAG > > >>>>>>> is > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min. > > >>>>>>>>> > > >>>>>>>>> I am curious why you ask this. Is this a use case that you > > >> want > > >>>> to > > >>>>>>>> achieve? > > >>>>>>>>> > > >>>>>>>>> Bin > > >>>>>>>>> > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 < > > >> thanosxnicho...@gmail.com > > >>>> > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi Bin, > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the > > >>> same > > >>>>> as > > >>>>>>> the > > >>>>>>>>>> parent DAG? My main concern is whether the schedule > > >> interval > > >>> of > > >>>>>>>> TaskGroup > > >>>>>>>>>> could be different with that of the DAG? For example, there > > >>> is > > >>>>> the > > >>>>>>>>> scenario > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the > > >> schedule > > >>>>>> interval > > >>>>>>>> of > > >>>>>>>>>> TaskGroup is 20 min. > > >>>>>>>>>> > > >>>>>>>>>> Cheers, > > >>>>>>>>>> Nicholas > > >>>>>>>>>> > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang < > > >>>>>> bin.huan...@gmail.com > > >>>>>>>> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Hi Nicholas, > > >>>>>>>>>>> > > >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator, > > >>> maybe > > >>>>> it > > >>>>>>> will > > >>>>>>>>>> throw > > >>>>>>>>>>> an error? But in the original proposal, the subdag's > > >>>>>>>> schedule_interval > > >>>>>>>>>> will > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace > > >>>> SubDag, > > >>>>>>> there > > >>>>>>>>>> will > > >>>>>>>>>>> be no subdag schedule_interval. > > >>>>>>>>>>> > > >>>>>>>>>>> Bin > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 < > > >>>> thanosxnicho...@gmail.com > > >>>>>> > > >>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi Bin, > > >>>>>>>>>>>> Thanks for your good proposal. I was confused whether > > >> the > > >>>>>>> schedule > > >>>>>>>>>>>> interval of SubDAG is different from that of the parent > > >>>> DAG? > > >>>>> I > > >>>>>>> have > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval > > >>> of > > >>>>>>> SubDAG. > > >>>>>>>> If > > >>>>>>>>>> the > > >>>>>>>>>>>> SubDagOperator has a different schedule interval, what > > >>> will > > >>>>>>> happen > > >>>>>>>>> for > > >>>>>>>>>>> the > > >>>>>>>>>>>> scheduler to schedule the parent DAG? > > >>>>>>>>>>>> > > >>>>>>>>>>>> Regards, > > >>>>>>>>>>>> Nicholas Jiang > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang < > > >>>>>>>> bin.huan...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback! > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I have rethought about the concept of subdag and task > > >>>>>> groups. I > > >>>>>>>>> think > > >>>>>>>>>>> the > > >>>>>>>>>>>>> better way to approach this is to entirely remove > > >>> subdag > > >>>>> and > > >>>>>>>>>> introduce > > >>>>>>>>>>>> the > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks > > >>> along > > >>>>>> with > > >>>>>>>>> their > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a > > >>>> DAG*. > > >>>>>> The > > >>>>>>>>> only > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you > > >>> still > > >>>>> need > > >>>>>>> to > > >>>>>>>>> add > > >>>>>>>>>> it > > >>>>>>>>>>>> to > > >>>>>>>>>>>>> a DAG for execution. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Here is a small code snippet. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> ``` > > >>>>>>>>>>>>> class TaskGroup: > > >>>>>>>>>>>>> """ > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> If default_args is missing, it will take default args > > >>>> from > > >>>>>> the > > >>>>>>>>>> DAG. > > >>>>>>>>>>>>> """ > > >>>>>>>>>>>>> def __init__(self, group_id, default_args): > > >>>>>>>>>>>>> pass > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> """ > > >>>>>>>>>>>>> You can add tasks to a task group similar to adding > > >>> tasks > > >>>>> to > > >>>>>> a > > >>>>>>>> DAG > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> This can be declared in a separate file from the dag > > >>> file > > >>>>>>>>>>>>> """ > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download', > > >>>>>>>>>>>> default_args=default_args) > > >>>>>>>>>>>>> download_group.add_task(task1) > > >>>>>>>>>>>>> task2.dag = download_group > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> with download_group: > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3') > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> [task, task2] >> task3 > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> """Add it to a DAG for execution""" > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag', > > >>>>>>> default_args=default_args, > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag: > > >>>>>>>>>>>>> start = DummyOperator(task_id='start') > > >>>>>>>>>>>>> start >> download_group > > >>>>>>>>>>>>> # this is equivalent to > > >>>>>>>>>>>>> # start >> [task, task2] >> task3 > > >>>>>>>>>>>>> ``` > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and > > >> set > > >>>>>>>> dependencies > > >>>>>>>>>>>> between > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using > > >>>>>> SubDagOperator, > > >>>>>>>> and > > >>>>>>>>>> we > > >>>>>>>>>>>> can > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> User migration wise, we can introduce it before > > >> Airflow > > >>>> 2.0 > > >>>>>> and > > >>>>>>>>> allow > > >>>>>>>>>>>>> gradual transition. Then we can decide if we still > > >> want > > >>>> to > > >>>>>> keep > > >>>>>>>> the > > >>>>>>>>>>>>> SubDagOperator or simply remove it. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Any thoughts? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>> Bin > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin < > > >>>>>>>>>>>>> maximebeauche...@gmail.com> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> +1, proposal looks good. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> The original intention was really to have tasks > > >>> groups > > >>>>> and > > >>>>>> a > > >>>>>>>>>>>> zoom-in/out > > >>>>>>>>>>>>> in > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG > > >>>>> object > > >>>>>>>> since > > >>>>>>>>> it > > >>>>>>>>>>> is > > >>>>>>>>>>>> a > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does > > >>> create > > >>>>>>>> underlying > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a > > >> group > > >>>> of > > >>>>>>> tasks. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Max > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi < > > >>>>>>>>>>>>> joshipoornim...@gmail.com> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thank you for your email. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang < > > >>>>>>>>>>> bin.huan...@gmail.com > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This > > >>>>>> rewrites > > >>>>>>>> the > > >>>>>>>>>>>>>>>> *DagBag.bag_dag* > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and > > >> it > > >>>>> will > > >>>>>>>> give a > > >>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>>> structure at > > >>>>>>>>>>>>>>>>>> the task level > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> The serialized_dag representation already > > >> does > > >>>>> this I > > >>>>>>>>> think. > > >>>>>>>>>> At > > >>>>>>>>>>>>> least > > >>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag > > >>> representation, > > >>>>> but > > >>>>>> at > > >>>>>>>>> least > > >>>>>>>>>>> it > > >>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table? > > >> In > > >>> my > > >>>>>>>> proposal > > >>>>>>>>> as > > >>>>>>>>>>>> also > > >>>>>>>>>>>>> in > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks > > >> from > > >>>> the > > >>>>>>> subdag > > >>>>>>>>> and > > >>>>>>>>>>> add > > >>>>>>>>>>>>>> them > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph > > >>>> will > > >>>>>> look > > >>>>>>>>>> exactly > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> same as without subdag but with metadata > > >> attached > > >>>> to > > >>>>>>> those > > >>>>>>>>>>>> sections. > > >>>>>>>>>>>>>>> These > > >>>>>>>>>>>>>>>> metadata will be later on used to render in the > > >>> UI. > > >>>>> So > > >>>>>>>> after > > >>>>>>>>>>>> parsing > > >>>>>>>>>>>>> ( > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output > > >> the > > >>>>>>> *root_dag > > >>>>>>>>>>>> *instead > > >>>>>>>>>>>>> of > > >>>>>>>>>>>>>>> *root_dag + > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata > > >>>>>>>>>> current_group=section-1, > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for > > >>> naming > > >>>>>>>>>>> suggestions), > > >>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have > > >>> nested > > >>>>>> group > > >>>>>>>> and > > >>>>>>>>>>>> still > > >>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>> able to capture the dependency. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Runtime DAG: > > >>>>>>>>>>>>>>>> [image: image.png] > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> While at the UI, what we see would be something > > >>>> like > > >>>>>> this > > >>>>>>>> by > > >>>>>>>>>>>>> utilizing > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into > > >> in > > >>>> some > > >>>>>>> way. > > >>>>>>>>>>>>>>>> [image: image.png] > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> The benefits I can see is that: > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra > > >>> complexity > > >>>> of > > >>>>>>>> SubDag > > >>>>>>>>>> for > > >>>>>>>>>>>>>>> execution > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not > > >> using > > >>>>>> SubDag. > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and > > >>>>> reusable > > >>>>>>> dag > > >>>>>>>>> code > > >>>>>>>>>>> and > > >>>>>>>>>>>>>>>> declare dependencies between them. And with the > > >>> new > > >>>>>>>>>>> SubDagOperator > > >>>>>>>>>>>>> (see > > >>>>>>>>>>>>>>> AIP > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory > > >>>>> function > > >>>>>>> for > > >>>>>>>>>>>>> generating 1 > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag > > >>> (in > > >>>>> this > > >>>>>>>> case, > > >>>>>>>>>> it > > >>>>>>>>>>>> will > > >>>>>>>>>>>>>>> just > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the > > >>> root > > >>>>>> dag). > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag > > >>>> with a > > >>>>>>>>>> simpler > > >>>>>>>>>>>>>> concept > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains > > >> out > > >>>> the > > >>>>>>>>>> contents > > >>>>>>>>>>>> of > > >>>>>>>>>>>>> a > > >>>>>>>>>>>>>>> SubDag > > >>>>>>>>>>>>>>>> and becomes more like > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator > > >>>>>>>>>>>>>>> (forgive > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is > > >>>> still > > >>>>>>>>>>> necessary > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>> keep the > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a > > >>>> name? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks > > >>>> Chris > > >>>>>>> Palmer > > >>>>>>>>> for > > >>>>>>>>>>>>> helping > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I > > >>>> will > > >>>>>> just > > >>>>>>>>> paste > > >>>>>>>>>>> it > > >>>>>>>>>>>>>> here. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks > > >> in > > >>>> the > > >>>>>> same > > >>>>>>>>>>>> TaskGroup, > > >>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in > > >> a > > >>>>>>> TaskGroup > > >>>>>>>>>> and > > >>>>>>>>>>>>>> either a > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not > > >> in > > >>>> any > > >>>>>>> group > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a > > >>> TaskGroup > > >>>>> and > > >>>>>>>>>> either > > >>>>>>>>>>>>> other > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup > > >> as > > >>> a > > >>>>>> single > > >>>>>>>>>>>> "object", > > >>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the > > >>>>> "status" > > >>>>>>> of a > > >>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>> was > > >>>>>>>>>>>>>>>>> at least for UI display purposes > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I agree with Chris: > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler & > > >>> executor), I > > >>>>>> think > > >>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide > > >> to > > >>>>>>> implement > > >>>>>>>>>> some > > >>>>>>>>>>>>>> metadata > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of > > >>> tasks > > >>>>>> etc.) > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick > > >>> up > > >>>>> the > > >>>>>>>>>> individual > > >>>>>>>>>>>>>> tasks' > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's > > >> status > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel > > >> Imberman > > >>> < > > >>>>>>>>>>>>>>>> daniel.imber...@gmail.com> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator > > >>> to > > >>>>> tie > > >>>>>>> dags > > >>>>>>>>>>>> together > > >>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we > > >>>> could > > >>>>>>>>>> essentially > > >>>>>>>>>>>>> write > > >>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all > > >>>> starter-tasks > > >>>>>> for > > >>>>>>>>> that > > >>>>>>>>>>> DAG. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly > > >> UI > > >>>>>> concept. > > >>>>>>>> It > > >>>>>>>>>>>> doesn’t > > >>>>>>>>>>>>>> need > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more > > >>>> tasks > > >>>>>> to > > >>>>>>>> the > > >>>>>>>>>>> queue > > >>>>>>>>>>>>> that > > >>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>> be executed when there are resources > > >> available. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> via Newton Mail [ > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 > > >>>>>>>>>>>>>>>>> ] > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer > > >> < > > >>>>>>>>>>> ch...@crpalmer.com > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex > > >>>>>> abstraction. > > >>>>>>> I > > >>>>>>>>>> think > > >>>>>>>>>>>> what > > >>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a > > >> high > > >>>>> level > > >>>>>> I > > >>>>>>>>> think > > >>>>>>>>>>> you > > >>>>>>>>>>>>> want > > >>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>> functionality: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in > > >>> the > > >>>>>> same > > >>>>>>>>>>> TaskGroup, > > >>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a > > >>>>>> TaskGroup > > >>>>>>>> and > > >>>>>>>>>>>> either > > >>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in > > >>> any > > >>>>>> group > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a > > >>> TaskGroup > > >>>>> and > > >>>>>>>> either > > >>>>>>>>>>> other > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup > > >> as a > > >>>>>> single > > >>>>>>>>>>> "object", > > >>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the > > >>>> "status" > > >>>>>> of > > >>>>>>> a > > >>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>> was > > >>>>>>>>>>>>>>>>> at least for UI display purposes > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level > > >>> object > > >>>>>> with > > >>>>>>>> its > > >>>>>>>>>> own > > >>>>>>>>>>>>>> database > > >>>>>>>>>>>>>>>>> table and model or just another attribute on > > >>>> tasks. > > >>>>> I > > >>>>>>>> think > > >>>>>>>>>> you > > >>>>>>>>>>>>> could > > >>>>>>>>>>>>>>>>> build > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers > > >> point > > >>> of > > >>>>>> view > > >>>>>>> a > > >>>>>>>>> DAG > > >>>>>>>>>>> with > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any > > >> differently. > > >>> So > > >>>>> it > > >>>>>>>> really > > >>>>>>>>>>> just > > >>>>>>>>>>>>>>> becomes > > >>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets > > >>> of > > >>>>>> Tasks, > > >>>>>>>> and > > >>>>>>>>>>>> allows > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>> UI > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Chris > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov > > >>>>>>>>>>>>>>> <ddavy...@twitter.com.invalid > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually > > >> the > > >>>> more > > >>>>>>>>> important > > >>>>>>>>>>>> issue > > >>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> fix), > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the > > >>> right > > >>>>> way > > >>>>>>>>> forward > > >>>>>>>>>>>> (just > > >>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> might > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than > > >>> adding > > >>>>>>> visual > > >>>>>>>>>>> grouping > > >>>>>>>>>>>>> in > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> UI). > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI > > >>> with > > >>>>> more > > >>>>>>>>> context > > >>>>>>>>>>> on > > >>>>>>>>>>>>> why > > >>>>>>>>>>>>>>>>> subdags > > >>>>>>>>>>>>>>>>>> are bad and potential solutions: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>> > > >> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html > > >>>>>>>>>>>>>> . A > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem > > >> is > > >>>> e.g. > > >>>>>>>>> enabling > > >>>>>>>>>>> the > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> operator > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as > > >>>> well. I > > >>>>>> see > > >>>>>>>>> this > > >>>>>>>>>>>> being > > >>>>>>>>>>>>>>>>> separate > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the > > >> UI > > >>>> but > > >>>>>> one > > >>>>>>> of > > >>>>>>>>> the > > >>>>>>>>>>> two > > >>>>>>>>>>>>>> items > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag > > >>>>>> functionality. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years > > >> and > > >>>>> they > > >>>>>>> are > > >>>>>>>>>>> always a > > >>>>>>>>>>>>>> giant > > >>>>>>>>>>>>>>>>> pain > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user > > >>>>> confusion > > >>>>>>> and > > >>>>>>>>>>>> breakages > > >>>>>>>>>>>>>>>>> during > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :). > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James > > >> Coder < > > >>>>>>>>>>>> jcode...@gmail.com> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a > > >> UI > > >>>>>>> concept. I > > >>>>>>>>> use > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> subdag > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If > > >>> you > > >>>>>> have a > > >>>>>>>>> group > > >>>>>>>>>>> of > > >>>>>>>>>>>>>> tasks > > >>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>> need to finish before another group of > > >> tasks > > >>>>>> start, > > >>>>>>>>> using > > >>>>>>>>>> a > > >>>>>>>>>>>>> subdag > > >>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies > > >>>> and I > > >>>>>>> think > > >>>>>>>>>> also > > >>>>>>>>>>>> make > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>> easier > > >>>>>>>>>>>>>>>>>>> to follow the dag code. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle > > >> Hamlin > > >>> < > > >>>>>>>>>>>>> hamlin...@gmail.com> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash > > >>>>>> Berlin-Taylor > > >>>>>>> < > > >>>>>>>>>>>>>> a...@apache.org > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Question: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator > > >>>> anymore? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just > > >>>>> replacing > > >>>>>> it > > >>>>>>>>> with > > >>>>>>>>>> a > > >>>>>>>>>>> UI > > >>>>>>>>>>>>>>>>> grouping > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less > > >> to > > >>>> get > > >>>>>>>> wrong, > > >>>>>>>>>> and > > >>>>>>>>>>>>> closer > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>> what > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with > > >>>> subdags? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in > > >>>> subdags > > >>>>>>> could > > >>>>>>>>>> start > > >>>>>>>>>>>>>> running > > >>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should > > >> we > > >>>> not > > >>>>>>> also > > >>>>>>>>> just > > >>>>>>>>>>>>>>> _enitrely_ > > >>>>>>>>>>>>>>>>>>> remove > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace > > >> it > > >>>> with > > >>>>>>>>> something > > >>>>>>>>>>>>>> simpler. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I > > >>> haven't > > >>>>> used > > >>>>>>>> them > > >>>>>>>>>>>>>> extensively > > >>>>>>>>>>>>>>> so > > >>>>>>>>>>>>>>>>>> may > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these): > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it > > >>>> has(?) > > >>>>> to > > >>>>>>> be > > >>>>>>>> of > > >>>>>>>>>> the > > >>>>>>>>>>>>> form > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`. > > >>>>>>>>>>>>>>>>>>>>> - They need their own > > >> schedule_interval, > > >>>> but > > >>>>>> it > > >>>>>>>> has > > >>>>>>>>> to > > >>>>>>>>>>>> match > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> parent > > >>>>>>>>>>>>>>>>>>>> dag > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own. > > >>>> (Does > > >>>>>> it > > >>>>>>>> make > > >>>>>>>>>>> sense > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>>>> this? > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the > > >>> sub > > >>>>> dag > > >>>>>>>> would > > >>>>>>>>>>> never > > >>>>>>>>>>>>>>>>> execute, so > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too. > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to > > >>>>> operator a > > >>>>>>>>> subdag > > >>>>>>>>>>> with > > >>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>>>> always > > >>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Thoughts? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> -ash > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash > > >>>>>> Berlin-Taylor < > > >>>>>>>>>>>>>> a...@apache.org> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm > > >>>>> excited > > >>>>>> to > > >>>>>>>> see > > >>>>>>>>>> how > > >>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>> progresses. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag > > >>> parsing*: > > >>>>> This > > >>>>>>>>>> rewrites > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while > > >>> parsing, > > >>>>> and > > >>>>>> it > > >>>>>>>>> will > > >>>>>>>>>>>> give a > > >>>>>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>>>>>>>> structure at > > >>>>>>>>>>>>>>>>>>>>>>> the task level > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation > > >>>> already > > >>>>>> does > > >>>>>>>>> this > > >>>>>>>>>> I > > >>>>>>>>>>>>> think. > > >>>>>>>>>>>>>>> At > > >>>>>>>>>>>>>>>>>> least > > >>>>>>>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here > > >>>> correctly. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> -ash > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin > > >>>> Huang < > > >>>>>>>>>>>>>>> bin.huan...@gmail.com > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone, > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and > > >>>> collect > > >>>>>>>>> feedback > > >>>>>>>>>> on > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> AIP-34 > > >>>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was > > >>>>>> previously > > >>>>>>>>>> briefly > > >>>>>>>>>>>>>>>>> mentioned in > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be > > >>> done > > >>>>> for > > >>>>>>>>> Airflow > > >>>>>>>>>>> 2.0, > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>> one of > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator > > >>> attach > > >>>>>> tasks > > >>>>>>>> back > > >>>>>>>>>> to > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>> root > > >>>>>>>>>>>>>>>>> DAG. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving > > >>>>>> SubDagOperator > > >>>>>>>>>> related > > >>>>>>>>>>>>>> issues > > >>>>>>>>>>>>>>> by > > >>>>>>>>>>>>>>>>>>>>> reattaching > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag > > >> while > > >>>>>>> respecting > > >>>>>>>>>>>>>> dependencies > > >>>>>>>>>>>>>>>>>> during > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping > > >> effect > > >>>> on > > >>>>>> the > > >>>>>>> UI > > >>>>>>>>>> will > > >>>>>>>>>>> be > > >>>>>>>>>>>>>>>>> achieved > > >>>>>>>>>>>>>>>>>>>> through > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory > > >>>> function > > >>>>>> more > > >>>>>>>>>>> reusable > > >>>>>>>>>>>>>>> because > > >>>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>> don't > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and > > >>>>>>> child_dag_name > > >>>>>>>>> in > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> function > > >>>>>>>>>>>>>>>>>>>>> signature > > >>>>>>>>>>>>>>>>>>>>>>> anymore. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag > > >>> parsing*: > > >>>>> This > > >>>>>>>>>> rewrites > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while > > >>> parsing, > > >>>>> and > > >>>>>> it > > >>>>>>>>> will > > >>>>>>>>>>>> give a > > >>>>>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>>>>>>>> structure at > > >>>>>>>>>>>>>>>>>>>>>>> the task level > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The > > >> new > > >>>>>>>>> SubDagOperator > > >>>>>>>>>>>> acts > > >>>>>>>>>>>>>>> like a > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original > > >>>>> methods > > >>>>>>> are > > >>>>>>>>>>> removed. > > >>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>>>>>>>>> signature is > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory > > >> *with > > >>>>>>>>> *subdag_args > > >>>>>>>>>>> *and > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*. > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the > > >> PythonOperator > > >>>>>>>> signature. > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add > > >>>>>>> current_group > > >>>>>>>> & > > >>>>>>>>>>>>>> parent_group > > >>>>>>>>>>>>>>>>>>>>> attributes > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is > > >>> used > > >>>>> to > > >>>>>>>> group > > >>>>>>>>>>> tasks > > >>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>>>>> rendering at > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend > > >>>>> further > > >>>>>>> to > > >>>>>>>>>> group > > >>>>>>>>>>>>>>> arbitrary > > >>>>>>>>>>>>>>>>>>> tasks > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to > > >>> allow > > >>>>>>>>> group-level > > >>>>>>>>>>>>>> operations > > >>>>>>>>>>>>>>>>>>> (i.e. > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within > > >>> the > > >>>>>> dag) > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*: > > >> Proposed > > >>>> UI > > >>>>>>>>>> modification > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>> allow > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a > > >>>> flat > > >>>>>>>>> structure > > >>>>>>>>>> to > > >>>>>>>>>>>>> pair > > >>>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> first > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original > > >>>>> hierarchical > > >>>>>>>>>>> structure. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and > > >> PRs > > >>>> for > > >>>>>>>> details: > > >>>>>>>>>>>>>>>>>>>>>>> AIP: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue: > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078 > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR: > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243 > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any > > >>>>> aspects > > >>>>>>>> that > > >>>>>>>>>> you > > >>>>>>>>>>>>>>>>>> agree/disagree > > >>>>>>>>>>>>>>>>>>>>>>> with or > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially > > >>> the > > >>>>>> third > > >>>>>>>>>> change > > >>>>>>>>>>>>>>> regarding > > >>>>>>>>>>>>>>>>>>>>> TaskGroup). > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am > > >>>> looking > > >>>>>>>> forward > > >>>>>>>>>> to > > >>>>>>>>>>>> it! > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>> Thanks & Regards > > >>>>>>>>>>>>>>> Poornima > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> -- > > >>>>>>> > > >>>>>>> Jarek Potiuk > > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >>>>>>> > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129 > > >>>>> <+48%20660%20796%20129>> > > >>>>>>> [image: Polidea] <https://www.polidea.com/> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> > > >>>>> Jarek Potiuk > > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >>>>> > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129 > > >>>>> <+48%20660%20796%20129>> > > >>>>> [image: Polidea] <https://www.polidea.com/> > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> > > >>>> *Jacob Ferriero* > > >>>> > > >>>> Strategic Cloud Engineer: Data Engineering > > >>>> > > >>>> jferri...@google.com > > >>>> > > >>>> 617-714-2509 > > >>>> > > >>> > > >> > > > > >