The vote for this AIP-34 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator> passed. However, there's an interesting discussion going on here <https://github.com/apache/airflow/pull/10153#discussion_r480247681> regarding whether task_id should be automatically prefixed with group_id of TaskGroup. So I'm bringing it up in this email thread for discussion.
Plan A: Prefix task_id with group_id of TaskGroup. This is the original plan in AIP-34 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>. The task_id argument passed to an operator just needs to be unique across the TaskGroup. The actual task_id is prefixed with the group_id so task_id is guaranteed to be unique across the DAG. Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id argument passed to the operator is the actual task_id. So the user is forced to make sure task_id is unique across the whole DAG. Obviously the convenience of Plan A is not free of charge. I’m summarizing some of the pros and cons in this table. There are two examples at the bottom illustrating the different usage. I was convinced by houqp on the github comments and some of my own experiments that Plan B has more advantages and avoids surprises. I'm going to update AIP-34 <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator> according to Plan B unless I hear strong objections before 20200903 7am UTC. Plan A Plan B Ease of Use Easier to use for new DAGs Slightly more work on the user to maintain task_id uniqueness Implementation A little more complicated. Each group needs to know its parent’s group_id in order to prefix the group_id correctly. Implementation is simpler. No need to know the parent TaskGroup’s group_id. Ease of Migration task_id will change if TaskGroup is introduced into an existing DAG. Existing tasks put into a TaskGroup will appear like new tasks if the DAG already has some historical DagRun. This may pose a barrier to adoption of TaskGroup. No change in task_id when an existing task is put into a TaskGroup. Migrating existing DAGs to adopt TaskGroup will be easier. Actual task_id Actual task_id tend to be longer because it’s always prefixed with group_id, especially if the task is in a nested TaskGroup. Actual task_id tend to be shorter because users control the actual task_id themselves. Graph label Labels on Graph View tend to be shorter because task_id only needs to be unique within the TaskGroup Labels on Graph View tend to be longer because it displays the actual task_id, which is a unique str across the DAG. Plan A Example: def create_section(): dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)] with TaskGroup("inside_section_1") as inside_section_1: _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)] with TaskGroup("inside_section_2") as inside_section_2: _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)] dummies[-1] >> inside_section_1 dummies[-2] >> inside_section_2 inside_section_1 >> inside_section_2 with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag: start = DummyOperator(task_id="start") with TaskGroup("section_1", tooltip="Tasks for Section 1") as section_1: create_section() some_other_task = DummyOperator(task_id="some-other-task") with TaskGroup("section_2", tooltip="Tasks for Section 2") as section_2: create_section() end = DummyOperator(task_id='end') start >> section_1 >> some_other_task >> section_2 >> end Plan B Example: def create_section(section_num): dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i in range(5)] with TaskGroup(f"section_{section_num}.1") as inside_section_1: _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for i in range(3)] with TaskGroup(f"section_{section_num}.2") as inside_section_2: _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for i in range(3)] dummies[-1] >> inside_section_1 dummies[-2] >> inside_section_2 inside_section_1 >> inside_section_2 with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag: start = DummyOperator(task_id="start") with TaskGroup("section_1", tooltip="Tasks for Section 1") as section_1: create_section(1) some_other_task = DummyOperator(task_id="some-other-task") with TaskGroup("section_2", tooltip="Tasks for Section 2") as section_2: create_section(2) end = DummyOperator(task_id='end') start >> section_1 >> some_other_task >> section_2 >> end On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez <gcasass...@twitter.com.invalid> wrote: > Agree on this being non-blocking. > > Regarding moving to vote, you can take care. Just open a new email thread > on dev list and call for a vote. You can see this example from Tomek for > AIP-31: > > https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E > > Best, > > > Gerard Casas Saez > Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yuqian1...@gmail.com> wrote: > > > Hi, Gerard, yes I agree it's possible to do this at UI level without any > > fundamental change to the implementation. If expand_group() sees that two > > groups are fully connected (i.e. every task in one parent group depends > on > > every task in another parent group), it can decide to collapse all those > > children edges into a single edge between the parent groups to reduce the > > burden of the layout() function. However, I did not find any existing > > algorithm to do this within dagre so we'll likely need to implement this > > ourselves. Another hiccup is that at the moment it doesn't seem to be > > possible to call setEdge() between two parent groups (aka clusters). If > > someone has ideas how to do this please feel free to contribute. > > > > One other consideration is that this example is only an extreme case. > There > > are other in-between cases that still require user intervention. Let's > say > > if 90% of tasks in group1 depends on 90% of tasks in group2 and both > groups > > have more than 100 tasks. This will still cause a lot of edges on the > graph > > and it's even harder to reduce because the parent groups are not fully > > connected so it's inaccurate to reduce them to a single edge between the > > parents. In those cases, the user may still need to do something > > themselves. e.g. adding some DummyOperator to the DAG to cut down the > > edges. There will be some tradeoff because DummyOperator takes a short > > while to execute like you mentioned. > > > > There are lots of room for improvements, but I don't think that's a > > blocking issue for this AIP? So if you can move it to the voting stage > > that'll be fantastic. > > > > > > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zhouyao1...@icloud.com.invalid> > > wrote: > > > > > +1 > > > > > > > 2020年8月18日 23:55,Gerard Casas Saez <gcasass...@twitter.com.INVALID> > > 写道: > > > > > > > > Is it not possible to solve this at the UI level? Aka tell dagre to > > only > > > > add 1 edge to the group instead of to all nodes in the group? No need > > to > > > do > > > > SubDag behaviour, but just reduce the edges on the graph. Should > reduce > > > > load time if I understand correctly. > > > > > > > > I would strongly avoid the Dummy operator since it will introduce > > delays > > > on > > > > operator execution (as it will need to execute 1 dummy operator and > > that > > > > can be expensive imo). > > > > > > > > Overall though proposal looks good, unless anyone opposes it, I would > > > move > > > > this to vote mode :D > > > > > > > > Gerard Casas Saez > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > > > > > > > > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yuqian1...@gmail.com> > wrote: > > > > > > > >> Hi, All, > > > >> Here's the updated AIP-34 > > > >> < > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator > > > >>> . > > > >> The PR has been fine-tuned with better UI interactions and added > > > >> serialization of TaskGroup: > > > https://github.com/apache/airflow/pull/10153 > > > >> > > > >> Here's some experiment results: > > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like > this. > > > Note > > > >> there's a inside_section_2 is intentionally made to depend on all > > tasks > > > >> in inside_section_1 to generate a large number of edges. The > > > observation is > > > >> that opening the top level graph is very quick, around 270ms. > > Expanding > > > >> groups that don't have a lot of dense dependencies on other groups > are > > > also > > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part > that > > > takes > > > >> time is when expanding both groups inside_section_1 and > > inside_section_2 > > > >> Because there are 2500 edges between these two inner groups, it took > > 63 > > > >> seconds to expand both of them. Majority of the time (more than > > > 62seconds) > > > >> is actually taken by the layout() function in dagre. In other words, > > > it's > > > >> very fast to add nodes and edges, but laying them out on the graph > > takes > > > >> time. This issue is not actually a problem specific to TaskGroup. > > > Without > > > >> TaskGroup, if a DAG contains too many edges, it takes time to layout > > the > > > >> graph too. > > > >> > > > >> On the other hand, a more realistic experiment with production DAG > > > >> containing about 400 tasks and 700 edges showed that grouping tasks > > into > > > >> three levels of nested TaskGroup cut the upfront page opening time > > from > > > >> around 6s to 500ms. (Obviously the time is paid back when user > > gradually > > > >> expands all the groups one by one, but normally people don't need to > > > expand > > > >> every group every time so it's still a big saving). The experiments > > are > > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome. > > > >> > > > >> I can see a few possible improvements to TaskGroup (or how it's > used) > > > that > > > >> can be done as a next-step: > > > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of > > > >> displaying the whole DAG, we can limit the Graph View to show only a > > > single > > > >> TaskGroup, omitting its edges going out to other TaskGroups. This > > > behaviour > > > >> is more like SubDagOperator where users can zoom into/out of a > > TaskGroup > > > >> and look at only tasks within that TaskGroup as if those are the > only > > > tasks > > > >> on the DAG. This can be done with either background javascript calls > > or > > > by > > > >> making a new get request with filtering parameters. Obviously the > > > downside > > > >> is that it's not as explicit as showing all the dependencies on the > > > graph. > > > >> 2). Users can improve the organization of the DAG themselves to > reduce > > > the > > > >> number of edges. E.g. if every task in group2 depends on every tasks > > in > > > >> group1, instead of doing group1 >> group2, they can add a > > DummyOperator > > > in > > > >> between and do this: group1 >> dummy >> group2. This cuts down the > > > number > > > >> of edges significantly and page load becomes much faster. > > > >> 3). If we really want, we can improve the >> operator of TaskGroup > to > > > do 2) > > > >> automatically. If it sees that both sides of >> are TaskGroup, it > can > > > >> create a DummyOperator on behalf of the user. The downside is that > it > > > may > > > >> be too much magic. > > > >> > > > >> Thanks, > > > >> Qian > > > >> > > > >> def create_section(): > > > >> """ > > > >> Create tasks in the outer section. > > > >> """ > > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in > range(100)] > > > >> > > > >> with TaskGroup("inside_section_1") as inside_section_1: > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)] > > > >> > > > >> with TaskGroup("inside_section_2") as inside_section_2: > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)] > > > >> > > > >> dummies[-1] >> inside_section_1 > > > >> dummies[-2] >> inside_section_2 > > > >> inside_section_1 >> inside_section_2 > > > >> > > > >> > > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as > dag: > > > >> start = DummyOperator(task_id="start") > > > >> > > > >> with TaskGroup("section_1") as section_1: > > > >> create_section() > > > >> > > > >> some_other_task = DummyOperator(task_id="some-other-task") > > > >> > > > >> with TaskGroup("section_2") as section_2: > > > >> create_section() > > > >> > > > >> end = DummyOperator(task_id='end') > > > >> > > > >> start >> section_1 >> some_other_task >> section_2 >> end > > > >> > > > >> > > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez > > > >> <gcasass...@twitter.com.invalid> wrote: > > > >> > > > >>> Re graph times. That makes sense. Let me know what you find. We may > > be > > > >> able > > > >>> to contribute on the lazy loading part. > > > >>> > > > >>> Looking forward to see the updated AIP! > > > >>> > > > >>> > > > >>> Gerard Casas Saez > > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > >>> > > > >>> > > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <kaxiln...@gmail.com> > > > wrote: > > > >>> > > > >>>> Permissions granted, let me know if you face any issues. > > > >>>> > > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yuqian1...@gmail.com> > > wrote: > > > >>>> > > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you! > > > >>>>> > > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <kaxiln...@gmail.com> > > > >>> wrote: > > > >>>>> > > > >>>>>> What's your ID i.e. if you haven't created an account yet, > please > > > >>>> create > > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and > send > > > >> us > > > >>>>> your > > > >>>>>> ID and we will add permissions. > > > >>>>>> > > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it? > > > >>>>>>> My wiki user email is yuqian1...@gmail.com. > > > >>>>>> > > > >>>>>> > > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1...@gmail.com> > > > >>> wrote: > > > >>>>>> > > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission > > > >> to > > > >>>> edit > > > >>>>>> it? > > > >>>>>>> My wiki user email is yuqian1...@gmail.com. > > > >>>>>>> > > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web > > > >>> server > > > >>>>> at > > > >>>>>>> once. However, it only adds the top level nodes and edges to > the > > > >>>> graph > > > >>>>>> when > > > >>>>>>> the Graph View page is first opened. And then adds the expanded > > > >>> nodes > > > >>>>> to > > > >>>>>>> the graph as the user expands them. From what I've experienced > > > >> with > > > >>>>> DAGs > > > >>>>>>> containing around 400 tasks (not using TaskGroup or > > > >>> SubDagOperator), > > > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds. > Less > > > >>>> than > > > >>>>>> 60ms > > > >>>>>>> of that is taken by loading the data from webserver. The > > > >> remaining > > > >>>>> 4.9s+ > > > >>>>>> is > > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as > > > >>> createNodes, > > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With > TaskGroup > > > >>>> being > > > >>>>>> used > > > >>>>>>> to group tasks into a smaller number of top-level nodes, the > > > >> amount > > > >>>> of > > > >>>>>> data > > > >>>>>>> loaded from webserver will remain about the same compared to a > > > >> flat > > > >>>> dag > > > >>>>>> of > > > >>>>>>> the same size, but the number of nodes and edges needed to be > > > >> plot > > > >>> on > > > >>>>> the > > > >>>>>>> graph can be reduced significantly. So in theory this should > > > >> speed > > > >>> up > > > >>>>> the > > > >>>>>>> time it takes to open Graph View even without lazy-loading the > > > >> data > > > >>>>> (I'll > > > >>>>>>> experiment to find out). That said, if it comes to a point > > > >>>> lazy-loading > > > >>>>>>> helps, we can still implement it as an improvement. > > > >>>>>>> > > > >>>>>>> Re James: the Tree View looks as if all all the groups are > fully > > > >>>>>> expanded. > > > >>>>>>> (because under the hood all the tasks are in a single DAG). I'm > > > >>> less > > > >>>>>>> worried about Tree View at the moment because it already has a > > > >>>>> mechanism > > > >>>>>>> for collapsing tasks by the dependency tree. That said, the > Tree > > > >>> View > > > >>>>> can > > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks > > > >> in > > > >>>> the > > > >>>>>> same > > > >>>>>>> TaskGroup when Tree View is first opened). > > > >>>>>>> > > > >>>>>>> For both suggestions, implementing them don't require > fundamental > > > >>>>> changes > > > >>>>>>> to the idea. I think we can have a basic working TaskGroup > first, > > > >>> and > > > >>>>>> then > > > >>>>>>> improve it incrementally in several PRs as we get more feedback > > > >>> from > > > >>>>> the > > > >>>>>>> community. What do you think? > > > >>>>>>> > > > >>>>>>> Qian > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder < > jcode...@gmail.com> > > > >>>>> wrote: > > > >>>>>>> > > > >>>>>>>> I agree this looks great, one question, how does the tree view > > > >>>> look? > > > >>>>>>>> > > > >>>>>>>> James Coder > > > >>>>>>>> > > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez < > > > >>>>>> gcasass...@twitter.com > > > >>>>>>> .invalid> > > > >>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>> First of all, this is awesome!! > > > >>>>>>>>> > > > >>>>>>>>> Secondly, checking your UI code, seems you are loading all > > > >>>>> operators > > > >>>>>> at > > > >>>>>>>>> once. Wondering if we can load them as needed (aka load > > > >>> whenever > > > >>>> we > > > >>>>>>> click > > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take > > > >> forever > > > >>>> to > > > >>>>>> load > > > >>>>>>>> on > > > >>>>>>>>> the Graph view, so worried about this still being an issue > > > >>> here. > > > >>>> It > > > >>>>>> may > > > >>>>>>>> be > > > >>>>>>>>> easily solvable by implementing lazy loading of the graph. > > > >> Not > > > >>>> sure > > > >>>>>> how > > > >>>>>>>>> easy to implement/add to the UI extension (and dont want to > > > >>> push > > > >>>>> for > > > >>>>>>>> early > > > >>>>>>>>> optimization as its the root of all evil). > > > >>>>>>>>> Gerard Casas Saez > > > >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang < > > > >>>>>> bin.huan...@gmail.com> > > > >>>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>> Hi Yu, > > > >>>>>>>>>> > > > >>>>>>>>>> Thank you so much for taking on this. I was fairly > > > >> distracted > > > >>>>>>> previously > > > >>>>>>>>>> and I didn't have the time to update the proposal. In fact, > > > >>>> after > > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this > > > >>> AIP > > > >>>>> has > > > >>>>>>>> been > > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of > > > >> rewriting > > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag > > > >>> in a > > > >>>>>>> future > > > >>>>>>>>>> date.). > > > >>>>>>>>>> > > > >>>>>>>>>> Your PR is amazing and it has implemented the desire > > > >>> features. I > > > >>>>>> think > > > >>>>>>>> we > > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the > > > >> AIP > > > >>>>> based > > > >>>>>>> on > > > >>>>>>>>>> what you have done in your PR? > > > >>>>>>>>>> > > > >>>>>>>>>> Best, > > > >>>>>>>>>> Bin > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian < > > > >>> yuqian1...@gmail.com> > > > >>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed > > > >>>>>>> implementation > > > >>>>>>>> of > > > >>>>>>>>>>> TaskGroup as UI grouping concept: > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153 > > > >>>>>>>>>>> > > > >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup > > > >> so > > > >>>> i'm > > > >>>>>>>> quoting > > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the > > > >>>>> restriction > > > >>>>>>>>>>> "... **cannot* > > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either > > > >> a* > > > >>>>>>>>>>> * Task in a different TaskGroup or a Task not in any > > > >>>> group*". I > > > >>>>>>> think > > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept, > > > >>>> tasks > > > >>>>>> can > > > >>>>>>>> have > > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any > > > >>>> TaskGroup. > > > >>>>>> In > > > >>>>>>> my > > > >>>>>>>>>> PR, > > > >>>>>>>>>>> this is allowed. The graph edges will update accordingly > > > >> when > > > >>>>>>>> TaskGroups > > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make > > > >> the > > > >>>> UI > > > >>>>>> look > > > >>>>>>>>>> less > > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks > > > >>> and > > > >>>>>> edges > > > >>>>>>>> so > > > >>>>>>>>>>> things work normally. Here's a screenshot > > > >>>>>>>>>>> < > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif > > > >>>>>>>>>>>> > > > >>>>>>>>>>> of the UI interaction. > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> * - Tasks can be added to a TaskGroup - You *can* have > > > >>>>>>> dependencies > > > >>>>>>>>>>> between Tasks in the same TaskGroup, but *cannot* have > > > >>>>>> dependencies > > > >>>>>>>>>>> between a Task in a TaskGroup and either a Task in a > > > >>>> different > > > >>>>>>>>>> TaskGroup > > > >>>>>>>>>>> or a Task not in any group - You *can* have dependencies > > > >>>>> between > > > >>>>>> a > > > >>>>>>>>>>> TaskGroup and either other TaskGroups or Tasks not in any > > > >>>> group > > > >>>>>> - > > > >>>>>>>> The > > > >>>>>>>>>>> UI will by default render a TaskGroup as a single "object", > > > >>> but > > > >>>>>>> which > > > >>>>>>>>>> you > > > >>>>>>>>>>> expand or zoom into in some way - You'd need some way to > > > >>>>>> determine > > > >>>>>>>> what > > > >>>>>>>>>>> the "status" of a TaskGroup was at least for UI display > > > >>>>> purposes* > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to > > > >> implement > > > >>>> the > > > >>>>>>>>>> "retrying > > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional > > > >> feature > > > >>>> of > > > >>>>>>>>>> TaskGroup > > > >>>>>>>>>>> although that may go against having TaskGroup as a pure UI > > > >>>>> concept. > > > >>>>>>> For > > > >>>>>>>>>> the > > > >>>>>>>>>>> motivating example Jake provided, I suggest implementing > > > >> both > > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a > > > >> single > > > >>>>>>> operator. > > > >>>>>>>> It > > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in > > > >>>>>>> "reschedule" > > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long > > > >>>> running > > > >>>>>> job > > > >>>>>>>> to > > > >>>>>>>>>>> the external service, and store the state (e.g. in XCom). > > > >>> Then > > > >>>>>>>> reschedule > > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion > > > >> state. > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero > > > >>>>>>>>>> <jferri...@google.com.invalid > > > >>>>>>>>>>>> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I > > > >> think > > > >>>> this > > > >>>>>>> will > > > >>>>>>>>>> be > > > >>>>>>>>>>>> much easier to use than SubDag. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I'd like to propose an optional behavior for special retry > > > >>>>>> mechanics > > > >>>>>>>>>> via > > > >>>>>>>>>>> a > > > >>>>>>>>>>>> TaskGroup.retry_all property. > > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use > > > >> of > > > >>>>>> SubDag > > > >>>>>>>> for > > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external > > > >>>> state > > > >>>>>> then > > > >>>>>>>>>>>> reschedule poll until desired state reached". > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two > > > >>>> task > > > >>>>>>> group > > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor]. > > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry > > > >> the > > > >>>>>>>>>>> SubmitJobTask > > > >>>>>>>>>>>> if something about the PollJobSensor fails. > > > >>>>>>>>>>>> This pattern would be really nice for jobs that are > > > >> expected > > > >>>> to > > > >>>>>> run > > > >>>>>>> a > > > >>>>>>>>>>> long > > > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode > > > >>>> freeing > > > >>>>> up > > > >>>>>>>>>> slots) > > > >>>>>>>>>>>> but might fail for a retryable reason. > > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the > > > >>>> purpose > > > >>>>>>>> because > > > >>>>>>>>>>>> SubDag infamously > > > >>>>>>>>>>>> < > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10 > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration. > > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is > > > >> very > > > >>>>> common > > > >>>>>>> for > > > >>>>>>>>>> a > > > >>>>>>>>>>>> single operator to submit job / wait til done. > > > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ, > > > >>>>> Dataproc, > > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> > > > >>>> PollTask] > > > >>>>>>> with > > > >>>>>>>>>> an > > > >>>>>>>>>>>> optional reschedule mode if user knows that this job may > > > >>> take > > > >>>> a > > > >>>>>> long > > > >>>>>>>>>>> time. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I'd be happy to the development work on adding this > > > >> specific > > > >>>>> retry > > > >>>>>>>>>>> behavior > > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if > > > >> others > > > >>> in > > > >>>>> the > > > >>>>>>>>>>>> community would find this a useful feature. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>> Jake > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk < > > > >>>>>>>> jarek.pot...@polidea.com > > > >>>>>>>>>>> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have > > > >>> regular > > > >>>>>>>>>> planning > > > >>>>>>>>>>>> and > > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting task > > > >>>> force > > > >>>>>> for > > > >>>>>>> it > > > >>>>>>>>>>>> soon, > > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and > > > >>> even > > > >>>>>> start > > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that > > > >> we > > > >>>> are > > > >>>>>>>>>>>> prioritizing > > > >>>>>>>>>>>>> 2.0 work. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> J, > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian < > > > >>>> yuqian1...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Hi Jarek, > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the > > > >> existing > > > >>>>>>>>>>>> SubDagOperator > > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion > > > >>>> about > > > >>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>> as > > > >>>>>>>>>>>>>> a brand new concept/feature independent from the > > > >> existing > > > >>>>>>>>>>>> SubDagOperator? > > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping > > > >>>>> concept > > > >>>>>>>>>> like > > > >>>>>>>>>>>> Ash > > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all. > > > >> Whenever > > > >>> we > > > >>>>> are > > > >>>>>>>>>>> ready > > > >>>>>>>>>>>>> with > > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow > > > >>> 2.1. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the > > > >> SubDagOperator > > > >>>>> idea > > > >>>>>>>>>> into > > > >>>>>>>>>>> a > > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of > > > >>>>>> "reattaching > > > >>>>>>>>>> all > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James > > > >>>>> pointed > > > >>>>>>>>>> out > > > >>>>>>>>>>> we > > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies > > > >>> setting > > > >>>> of > > > >>>>>>>>>>>> TaskGroup. > > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR > > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think > > > >>>> having > > > >>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>> as > > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can > > > >>>>>> simplify > > > >>>>>>>>>>>>> Xinbin's > > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here: > > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153 > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience > > > >>>> with > > > >>>>>> web > > > >>>>>>>>>> UI. > > > >>>>>>>>>>>> If > > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Qian > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk < > > > >>>>>>>>>>> jarek.pot...@polidea.com > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping > > > >>> up. > > > >>>>>> Maybe > > > >>>>>>>>>> we > > > >>>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions > > > >>> about > > > >>>>>>>>>> further > > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important > > > >>> discussions > > > >>>>> (and > > > >>>>>>>>>> we > > > >>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>> continue them in the near future !) I think at this > > > >>> point > > > >>>>>>>>>> focusing > > > >>>>>>>>>>>> on > > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus > > > >>>> now ? > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> J. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang < > > > >>>>>>>>>>> bin.huan...@gmail.com> > > > >>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Hi Daniel > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API > > > >> as a > > > >>>> DAG > > > >>>>>>>>>>> object > > > >>>>>>>>>>>>>>> related > > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything > > > >>>> related > > > >>>>> to > > > >>>>>>>>>>>> actual > > > >>>>>>>>>>>>>>>> execution or scheduling. > > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the > > > >>> weekend. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when > > > >> you > > > >>>>>>>>>> import > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>> object > > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the > > > >> shape > > > >>>> of > > > >>>>>> the > > > >>>>>>>>>>>> DAG. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a > > > >>>>> similar > > > >>>>>>>>>>>> purpose > > > >>>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>> DAG factory function? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman < > > > >>>>>>>>>>>>>>> daniel.imber...@gmail.com > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Hi Bin, > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG > > > >> object > > > >>>>> (e.g. > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>>> bitwise > > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a > > > >>>>>>>>>>>> “DAGTemplate” > > > >>>>>>>>>>>>>>>> object > > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it > > > >> with > > > >>>>>>>>>>> parameters > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> determine the shape of the DAG. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang < > > > >>>>>>>>>>>>> bin.huan...@gmail.com > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a > > > >>>>> parameter > > > >>>>>>>>>>>>> itself, > > > >>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my > > > >> opinion, > > > >>>> the > > > >>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies, > > > >>> and > > > >>>>> the > > > >>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any > > > >>>>>>>>>>> execution/scheduling > > > >>>>>>>>>>>>>> logic > > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs > > > >>>> etc.) > > > >>>>>>>>>>> like > > > >>>>>>>>>>>> a > > > >>>>>>>>>>>>>> DAG > > > >>>>>>>>>>>>>>>>> does. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule > > > >>>>>>>>>> interval > > > >>>>>>>>>>>> of > > > >>>>>>>>>>>>>> DAG > > > >>>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 > > > >>> min. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case > > > >> that > > > >>>> you > > > >>>>>>>>>> want > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>> achieve? > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Bin > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 < > > > >>>>>>>>>> thanosxnicho...@gmail.com > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Hi Bin, > > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of > > > >> TaskGroup > > > >>>> the > > > >>>>>>>>>>> same > > > >>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule > > > >>>>>>>>>> interval > > > >>>>>>>>>>> of > > > >>>>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For > > > >> example, > > > >>>>> there > > > >>>>>>>>>>> is > > > >>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>> scenario > > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the > > > >>>>>>>>>> schedule > > > >>>>>>>>>>>>>> interval > > > >>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>>>>>>>> Nicholas > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang < > > > >>>>>>>>>>>>>> bin.huan...@gmail.com > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Hi Nicholas, > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of > > > >>> SubDagOperator, > > > >>>>>>>>>>> maybe > > > >>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>> throw > > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the > > > >> subdag's > > > >>>>>>>>>>>>>>>> schedule_interval > > > >>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to > > > >>> replace > > > >>>>>>>>>>>> SubDag, > > > >>>>>>>>>>>>>>> there > > > >>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Bin > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 < > > > >>>>>>>>>>>> thanosxnicho...@gmail.com > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Hi Bin, > > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused > > > >>> whether > > > >>>>>>>>>> the > > > >>>>>>>>>>>>>>> schedule > > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the > > > >>>> parent > > > >>>>>>>>>>>> DAG? > > > >>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule > > > >>>> interval > > > >>>>>>>>>>> of > > > >>>>>>>>>>>>>>> SubDAG. > > > >>>>>>>>>>>>>>>> If > > > >>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval, > > > >>> what > > > >>>>>>>>>>> will > > > >>>>>>>>>>>>>>> happen > > > >>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG? > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Regards, > > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang < > > > >>>>>>>>>>>>>>>> bin.huan...@gmail.com> > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback! > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and > > > >>> task > > > >>>>>>>>>>>>>> groups. I > > > >>>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove > > > >>>>>>>>>>> subdag > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>> introduce > > > >>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of > > > >> tasks > > > >>>>>>>>>>> along > > > >>>>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>> their > > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic > > > >>> as a > > > >>>>>>>>>>>> DAG*. > > > >>>>>>>>>>>>>> The > > > >>>>>>>>>>>>>>>>> only > > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but > > > >> you > > > >>>>>>>>>>> still > > > >>>>>>>>>>>>> need > > > >>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> add > > > >>>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> ``` > > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup: > > > >>>>>>>>>>>>>>>>>>>>> """ > > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default > > > >>> args > > > >>>>>>>>>>>> from > > > >>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>> DAG. > > > >>>>>>>>>>>>>>>>>>>>> """ > > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args): > > > >>>>>>>>>>>>>>>>>>>>> pass > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> """ > > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to > > > >> adding > > > >>>>>>>>>>> tasks > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>> DAG > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the > > > >>> dag > > > >>>>>>>>>>> file > > > >>>>>>>>>>>>>>>>>>>>> """ > > > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download', > > > >>>>>>>>>>>>>>>>>>>> default_args=default_args) > > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1) > > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> with download_group: > > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3') > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3 > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution""" > > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag', > > > >>>>>>>>>>>>>>> default_args=default_args, > > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag: > > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start') > > > >>>>>>>>>>>>>>>>>>>>> start >> download_group > > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to > > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3 > > > >>>>>>>>>>>>>>>>>>>>> ``` > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks > > > >> and > > > >>>>>>>>>> set > > > >>>>>>>>>>>>>>>> dependencies > > > >>>>>>>>>>>>>>>>>>>> between > > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using > > > >>>>>>>>>>>>>> SubDagOperator, > > > >>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>> can > > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >> > > > >>> task`. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before > > > >>>>>>>>>> Airflow > > > >>>>>>>>>>>> 2.0 > > > >>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>> allow > > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we > > > >> still > > > >>>>>>>>>> want > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>> keep > > > >>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Any thoughts? > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>>>>>>>>>>> Bin > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime > > > >> Beauchemin < > > > >>>>>>>>>>>>>>>>>>>>> maximebeauche...@gmail.com> wrote: > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks > > > >>>>>>>>>>> groups > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>> zoom-in/out > > > >>>>>>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the > > > >>> DAG > > > >>>>>>>>>>>>> object > > > >>>>>>>>>>>>>>>> since > > > >>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does > > > >>>>>>>>>>> create > > > >>>>>>>>>>>>>>>> underlying > > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a > > > >>>>>>>>>> group > > > >>>>>>>>>>>> of > > > >>>>>>>>>>>>>>> tasks. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Max > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi < > > > >>>>>>>>>>>>>>>>>>>>> joshipoornim...@gmail.com> > > > >>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email. > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang < > > > >>>>>>>>>>>>>>>>>>> bin.huan...@gmail.com > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This > > > >>>>>>>>>>>>>> rewrites > > > >>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and > > > >>>>>>>>>> it > > > >>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>> give a > > > >>>>>>>>>>>>>>>>>>>> flat > > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already > > > >>>>>>>>>> does > > > >>>>>>>>>>>>> this I > > > >>>>>>>>>>>>>>>>> think. > > > >>>>>>>>>>>>>>>>>> At > > > >>>>>>>>>>>>>>>>>>>>> least > > > >>>>>>>>>>>>>>>>>>>>>>> if > > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly. > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag > > > >>>>>>>>>>> representation, > > > >>>>>>>>>>>>> but > > > >>>>>>>>>>>>>> at > > > >>>>>>>>>>>>>>>>> least > > > >>>>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table? > > > >>>>>>>>>> In > > > >>>>>>>>>>> my > > > >>>>>>>>>>>>>>>> proposal > > > >>>>>>>>>>>>>>>>> as > > > >>>>>>>>>>>>>>>>>>>> also > > > >>>>>>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks > > > >>>>>>>>>> from > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>> subdag > > > >>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>> add > > > >>>>>>>>>>>>>>>>>>>>>> them > > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG > > > >> graph > > > >>>>>>>>>>>> will > > > >>>>>>>>>>>>>> look > > > >>>>>>>>>>>>>>>>>> exactly > > > >>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata > > > >>>>>>>>>> attached > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>>> those > > > >>>>>>>>>>>>>>>>>>>> sections. > > > >>>>>>>>>>>>>>>>>>>>>>> These > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in > > > >> the > > > >>>>>>>>>>> UI. > > > >>>>>>>>>>>>> So > > > >>>>>>>>>>>>>>>> after > > > >>>>>>>>>>>>>>>>>>>> parsing > > > >>>>>>>>>>>>>>>>>>>>> ( > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output > > > >>>>>>>>>> the > > > >>>>>>>>>>>>>>> *root_dag > > > >>>>>>>>>>>>>>>>>>>> *instead > > > >>>>>>>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag + > > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc. > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata > > > >>>>>>>>>>>>>>>>>> current_group=section-1, > > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for > > > >>>>>>>>>>> naming > > > >>>>>>>>>>>>>>>>>>> suggestions), > > > >>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have > > > >>>>>>>>>>> nested > > > >>>>>>>>>>>>>> group > > > >>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>> still > > > >>>>>>>>>>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency. > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG: > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png] > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be > > > >> something > > > >>>>>>>>>>>> like > > > >>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>> by > > > >>>>>>>>>>>>>>>>>>>>> utilizing > > > >>>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into > > > >>>>>>>>>> in > > > >>>>>>>>>>>> some > > > >>>>>>>>>>>>>>> way. > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png] > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that: > > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra > > > >>>>>>>>>>> complexity > > > >>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>> SubDag > > > >>>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>>>>> execution > > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not > > > >>>>>>>>>> using > > > >>>>>>>>>>>>>> SubDag. > > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and > > > >>>>>>>>>>>>> reusable > > > >>>>>>>>>>>>>>> dag > > > >>>>>>>>>>>>>>>>> code > > > >>>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with > > > >> the > > > >>>>>>>>>>> new > > > >>>>>>>>>>>>>>>>>>> SubDagOperator > > > >>>>>>>>>>>>>>>>>>>>> (see > > > >>>>>>>>>>>>>>>>>>>>>>> AIP > > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory > > > >>>>>>>>>>>>> function > > > >>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>>> generating 1 > > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag > > > >>>>>>>>>>> (in > > > >>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>> case, > > > >>>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>>>>>> just > > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the > > > >>>>>>>>>>> root > > > >>>>>>>>>>>>>> dag). > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag > > > >>>>>>>>>>>> with a > > > >>>>>>>>>>>>>>>>>> simpler > > > >>>>>>>>>>>>>>>>>>>>>> concept > > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains > > > >>>>>>>>>> out > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>> contents > > > >>>>>>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>>> SubDag > > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like > > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator > > > >>>>>>>>>>>>>>>>>>>>>>> (forgive > > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it > > > >> is > > > >>>>>>>>>>>> still > > > >>>>>>>>>>>>>>>>>>> necessary > > > >>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>> keep the > > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a > > > >>>>>>>>>>>> name? > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks > > > >>>>>>>>>>>> Chris > > > >>>>>>>>>>>>>>> Palmer > > > >>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>>> helping > > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, > > > >> I > > > >>>>>>>>>>>> will > > > >>>>>>>>>>>>>> just > > > >>>>>>>>>>>>>>>>> paste > > > >>>>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>>>>> here. > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks > > > >>>>>>>>>> in > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>> same > > > >>>>>>>>>>>>>>>>>>>> TaskGroup, > > > >>>>>>>>>>>>>>>>>>>>>> but > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in > > > >>>>>>>>>> a > > > >>>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>> either a > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not > > > >>>>>>>>>> in > > > >>>>>>>>>>>> any > > > >>>>>>>>>>>>>>> group > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a > > > >>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>> either > > > >>>>>>>>>>>>>>>>>>>>> other > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup > > > >>>>>>>>>> as > > > >>>>>>>>>>> a > > > >>>>>>>>>>>>>> single > > > >>>>>>>>>>>>>>>>>>>> "object", > > > >>>>>>>>>>>>>>>>>>>>>> but > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the > > > >>>>>>>>>>>>> "status" > > > >>>>>>>>>>>>>>> of a > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>>>>>>>>> was > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris: > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler & > > > >>>>>>>>>>> executor), I > > > >>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide > > > >>>>>>>>>> to > > > >>>>>>>>>>>>>>> implement > > > >>>>>>>>>>>>>>>>>> some > > > >>>>>>>>>>>>>>>>>>>>>> metadata > > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of > > > >>>>>>>>>>> tasks > > > >>>>>>>>>>>>>> etc.) > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to > > > >> pick > > > >>>>>>>>>>> up > > > >>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>> individual > > > >>>>>>>>>>>>>>>>>>>>>> tasks' > > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's > > > >>>>>>>>>> status > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> Bin > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel > > > >>>>>>>>>> Imberman > > > >>>>>>>>>>> < > > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imber...@gmail.com> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` > > > >> operator > > > >>>>>>>>>>> to > > > >>>>>>>>>>>>> tie > > > >>>>>>>>>>>>>>> dags > > > >>>>>>>>>>>>>>>>>>>> together > > > >>>>>>>>>>>>>>>>>>>>>> but > > > >>>>>>>>>>>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if > > > >> we > > > >>>>>>>>>>>> could > > > >>>>>>>>>>>>>>>>>> essentially > > > >>>>>>>>>>>>>>>>>>>>> write > > > >>>>>>>>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all > > > >>>>>>>>>>>> starter-tasks > > > >>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>>>>> DAG. > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly > > > >>>>>>>>>> UI > > > >>>>>>>>>>>>>> concept. > > > >>>>>>>>>>>>>>>> It > > > >>>>>>>>>>>>>>>>>>>> doesn’t > > > >>>>>>>>>>>>>>>>>>>>>> need > > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding > > > >> more > > > >>>>>>>>>>>> tasks > > > >>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> queue > > > >>>>>>>>>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources > > > >>>>>>>>>> available. > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [ > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 > > > >>>>>>>>>>>>>>>>>>>>>>>>> ] > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer > > > >>>>>>>>>> < > > > >>>>>>>>>>>>>>>>>>> ch...@crpalmer.com > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex > > > >>>>>>>>>>>>>> abstraction. > > > >>>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>>> what > > > >>>>>>>>>>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a > > > >>>>>>>>>> high > > > >>>>>>>>>>>>> level > > > >>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>>>>>>>> want > > > >>>>>>>>>>>>>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality: > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks > > > >> in > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>> same > > > >>>>>>>>>>>>>>>>>>> TaskGroup, > > > >>>>>>>>>>>>>>>>>>>>> but > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in > > > >> a > > > >>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>> either > > > >>>>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not > > > >> in > > > >>>>>>>>>>> any > > > >>>>>>>>>>>>>> group > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a > > > >>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>> either > > > >>>>>>>>>>>>>>>>>>> other > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup > > > >>>>>>>>>> as a > > > >>>>>>>>>>>>>> single > > > >>>>>>>>>>>>>>>>>>> "object", > > > >>>>>>>>>>>>>>>>>>>>> but > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the > > > >>>>>>>>>>>> "status" > > > >>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>> TaskGroup > > > >>>>>>>>>>>>>>>>>>>>>> was > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level > > > >>>>>>>>>>> object > > > >>>>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>> its > > > >>>>>>>>>>>>>>>>>> own > > > >>>>>>>>>>>>>>>>>>>>>> database > > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on > > > >>>>>>>>>>>> tasks. > > > >>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>>>>>>>> could > > > >>>>>>>>>>>>>>>>>>>>>>>>> build > > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers > > > >>>>>>>>>> point > > > >>>>>>>>>>> of > > > >>>>>>>>>>>>>> view > > > >>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>> DAG > > > >>>>>>>>>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any > > > >>>>>>>>>> differently. > > > >>>>>>>>>>> So > > > >>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>> really > > > >>>>>>>>>>>>>>>>>>> just > > > >>>>>>>>>>>>>>>>>>>>>>> becomes > > > >>>>>>>>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between > > > >> sets > > > >>>>>>>>>>> of > > > >>>>>>>>>>>>>> Tasks, > > > >>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>> allows > > > >>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>> UI > > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure. > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov > > > >>>>>>>>>>>>>>>>>>>>>>> <ddavy...@twitter.com.invalid > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually > > > >>>>>>>>>> the > > > >>>>>>>>>>>> more > > > >>>>>>>>>>>>>>>>> important > > > >>>>>>>>>>>>>>>>>>>> issue > > > >>>>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>>>> fix), > > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the > > > >>>>>>>>>>> right > > > >>>>>>>>>>>>> way > > > >>>>>>>>>>>>>>>>> forward > > > >>>>>>>>>>>>>>>>>>>> (just > > > >>>>>>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>>>>>>>> might > > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than > > > >>>>>>>>>>> adding > > > >>>>>>>>>>>>>>> visual > > > >>>>>>>>>>>>>>>>>>> grouping > > > >>>>>>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI). > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI > > > >>>>>>>>>>> with > > > >>>>>>>>>>>>> more > > > >>>>>>>>>>>>>>>>> context > > > >>>>>>>>>>>>>>>>>>> on > > > >>>>>>>>>>>>>>>>>>>>> why > > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags > > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions: > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>> > > > >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html > > > >>>>>>>>>>>>>>>>>>>>>> . A > > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem > > > >>>>>>>>>> is > > > >>>>>>>>>>>> e.g. > > > >>>>>>>>>>>>>>>>> enabling > > > >>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> operator > > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as > > > >>>>>>>>>>>> well. I > > > >>>>>>>>>>>>>> see > > > >>>>>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>>>> being > > > >>>>>>>>>>>>>>>>>>>>>>>>> separate > > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the > > > >>>>>>>>>> UI > > > >>>>>>>>>>>> but > > > >>>>>>>>>>>>>> one > > > >>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> two > > > >>>>>>>>>>>>>>>>>>>>>> items > > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag > > > >>>>>>>>>>>>>> functionality. > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years > > > >>>>>>>>>> and > > > >>>>>>>>>>>>> they > > > >>>>>>>>>>>>>>> are > > > >>>>>>>>>>>>>>>>>>> always a > > > >>>>>>>>>>>>>>>>>>>>>> giant > > > >>>>>>>>>>>>>>>>>>>>>>>>> pain > > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user > > > >>>>>>>>>>>>> confusion > > > >>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>> breakages > > > >>>>>>>>>>>>>>>>>>>>>>>>> during > > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :). > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James > > > >>>>>>>>>> Coder < > > > >>>>>>>>>>>>>>>>>>>> jcode...@gmail.com> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a > > > >>>>>>>>>> UI > > > >>>>>>>>>>>>>>> concept. I > > > >>>>>>>>>>>>>>>>> use > > > >>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>> subdag > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If > > > >>>>>>>>>>> you > > > >>>>>>>>>>>>>> have a > > > >>>>>>>>>>>>>>>>> group > > > >>>>>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>>>>>> tasks > > > >>>>>>>>>>>>>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of > > > >>>>>>>>>> tasks > > > >>>>>>>>>>>>>> start, > > > >>>>>>>>>>>>>>>>> using > > > >>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>> subdag > > > >>>>>>>>>>>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies > > > >>>>>>>>>>>> and I > > > >>>>>>>>>>>>>>> think > > > >>>>>>>>>>>>>>>>>> also > > > >>>>>>>>>>>>>>>>>>>> make > > > >>>>>>>>>>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle > > > >>>>>>>>>> Hamlin > > > >>>>>>>>>>> < > > > >>>>>>>>>>>>>>>>>>>>> hamlin...@gmail.com> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash > > > >>>>>>>>>>>>>> Berlin-Taylor > > > >>>>>>>>>>>>>>> < > > > >>>>>>>>>>>>>>>>>>>>>> a...@apache.org > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator > > > >>>>>>>>>>>> anymore? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just > > > >>>>>>>>>>>>> replacing > > > >>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>> UI > > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less > > > >>>>>>>>>> to > > > >>>>>>>>>>>> get > > > >>>>>>>>>>>>>>>> wrong, > > > >>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>> closer > > > >>>>>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>>>>> what > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with > > > >>>>>>>>>>>> subdags? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in > > > >>>>>>>>>>>> subdags > > > >>>>>>>>>>>>>>> could > > > >>>>>>>>>>>>>>>>>> start > > > >>>>>>>>>>>>>>>>>>>>>> running > > > >>>>>>>>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should > > > >>>>>>>>>> we > > > >>>>>>>>>>>> not > > > >>>>>>>>>>>>>>> also > > > >>>>>>>>>>>>>>>>> just > > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_ > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace > > > >>>>>>>>>> it > > > >>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>> something > > > >>>>>>>>>>>>>>>>>>>>>> simpler. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I > > > >>>>>>>>>>> haven't > > > >>>>>>>>>>>>> used > > > >>>>>>>>>>>>>>>> them > > > >>>>>>>>>>>>>>>>>>>>>> extensively > > > >>>>>>>>>>>>>>>>>>>>>>> so > > > >>>>>>>>>>>>>>>>>>>>>>>>>> may > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these): > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it > > > >>>>>>>>>>>> has(?) > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>> of > > > >>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>> form > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own > > > >>>>>>>>>> schedule_interval, > > > >>>>>>>>>>>> but > > > >>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>> has > > > >>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>> match > > > >>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own. > > > >>>>>>>>>>>> (Does > > > >>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>> make > > > >>>>>>>>>>>>>>>>>>> sense > > > >>>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>> do > > > >>>>>>>>>>>>>>>>>>>>>>>>>> this? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the > > > >>>>>>>>>>> sub > > > >>>>>>>>>>>>> dag > > > >>>>>>>>>>>>>>>> would > > > >>>>>>>>>>>>>>>>>>> never > > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to > > > >>>>>>>>>>>>> operator a > > > >>>>>>>>>>>>>>>>> subdag > > > >>>>>>>>>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>>>>>> -- > > > >>>>>>>>>>>>>>>>>>>>>>>>> always > > > >>>>>>>>>>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash > > > >>>>>>>>>>>>>> Berlin-Taylor < > > > >>>>>>>>>>>>>>>>>>>>>> a...@apache.org> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm > > > >>>>>>>>>>>>> excited > > > >>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>> see > > > >>>>>>>>>>>>>>>>>> how > > > >>>>>>>>>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag > > > >>>>>>>>>>> parsing*: > > > >>>>>>>>>>>>> This > > > >>>>>>>>>>>>>>>>>> rewrites > > > >>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while > > > >>>>>>>>>>> parsing, > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>>> give a > > > >>>>>>>>>>>>>>>>>>>>>>> flat > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation > > > >>>>>>>>>>>> already > > > >>>>>>>>>>>>>> does > > > >>>>>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>> I > > > >>>>>>>>>>>>>>>>>>>>> think. > > > >>>>>>>>>>>>>>>>>>>>>>> At > > > >>>>>>>>>>>>>>>>>>>>>>>>>> least > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here > > > >>>>>>>>>>>> correctly. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin > > > >>>>>>>>>>>> Huang < > > > >>>>>>>>>>>>>>>>>>>>>>> bin.huan...@gmail.com > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and > > > >>>>>>>>>>>> collect > > > >>>>>>>>>>>>>>>>> feedback > > > >>>>>>>>>>>>>>>>>> on > > > >>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34 > > > >>>>>>>>>>>>>>>>>>>>>>>>>> on > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was > > > >>>>>>>>>>>>>> previously > > > >>>>>>>>>>>>>>>>>> briefly > > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be > > > >>>>>>>>>>> done > > > >>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>> Airflow > > > >>>>>>>>>>>>>>>>>>> 2.0, > > > >>>>>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>>>>> one of > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator > > > >>>>>>>>>>> attach > > > >>>>>>>>>>>>>> tasks > > > >>>>>>>>>>>>>>>> back > > > >>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>> root > > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving > > > >>>>>>>>>>>>>> SubDagOperator > > > >>>>>>>>>>>>>>>>>> related > > > >>>>>>>>>>>>>>>>>>>>>> issues > > > >>>>>>>>>>>>>>>>>>>>>>> by > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag > > > >>>>>>>>>> while > > > >>>>>>>>>>>>>>> respecting > > > >>>>>>>>>>>>>>>>>>>>>> dependencies > > > >>>>>>>>>>>>>>>>>>>>>>>>>> during > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping > > > >>>>>>>>>> effect > > > >>>>>>>>>>>> on > > > >>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>> UI > > > >>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory > > > >>>>>>>>>>>> function > > > >>>>>>>>>>>>>> more > > > >>>>>>>>>>>>>>>>>>> reusable > > > >>>>>>>>>>>>>>>>>>>>>>> because > > > >>>>>>>>>>>>>>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and > > > >>>>>>>>>>>>>>> child_dag_name > > > >>>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>> function > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag > > > >>>>>>>>>>> parsing*: > > > >>>>>>>>>>>>> This > > > >>>>>>>>>>>>>>>>>> rewrites > > > >>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while > > > >>>>>>>>>>> parsing, > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>>>>> give a > > > >>>>>>>>>>>>>>>>>>>>>>> flat > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The > > > >>>>>>>>>> new > > > >>>>>>>>>>>>>>>>> SubDagOperator > > > >>>>>>>>>>>>>>>>>>>> acts > > > >>>>>>>>>>>>>>>>>>>>>>> like a > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original > > > >>>>>>>>>>>>> methods > > > >>>>>>>>>>>>>>> are > > > >>>>>>>>>>>>>>>>>>> removed. > > > >>>>>>>>>>>>>>>>>>>>> The > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory > > > >>>>>>>>>> *with > > > >>>>>>>>>>>>>>>>> *subdag_args > > > >>>>>>>>>>>>>>>>>>> *and > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the > > > >>>>>>>>>> PythonOperator > > > >>>>>>>>>>>>>>>> signature. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add > > > >>>>>>>>>>>>>>> current_group > > > >>>>>>>>>>>>>>>> & > > > >>>>>>>>>>>>>>>>>>>>>> parent_group > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is > > > >>>>>>>>>>> used > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>> group > > > >>>>>>>>>>>>>>>>>>> tasks > > > >>>>>>>>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend > > > >>>>>>>>>>>>> further > > > >>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>> group > > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to > > > >>>>>>>>>>> allow > > > >>>>>>>>>>>>>>>>> group-level > > > >>>>>>>>>>>>>>>>>>>>>> operations > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>> dag) > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*: > > > >>>>>>>>>> Proposed > > > >>>>>>>>>>>> UI > > > >>>>>>>>>>>>>>>>>> modification > > > >>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>>> allow > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a > > > >>>>>>>>>>>> flat > > > >>>>>>>>>>>>>>>>> structure > > > >>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>> pair > > > >>>>>>>>>>>>>>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original > > > >>>>>>>>>>>>> hierarchical > > > >>>>>>>>>>>>>>>>>>> structure. > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and > > > >>>>>>>>>> PRs > > > >>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>> details: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP: > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue: > > > >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078 > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR: > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243 > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any > > > >>>>>>>>>>>>> aspects > > > >>>>>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>> third > > > >>>>>>>>>>>>>>>>>> change > > > >>>>>>>>>>>>>>>>>>>>>>> regarding > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup). > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am > > > >>>>>>>>>>>> looking > > > >>>>>>>>>>>>>>>> forward > > > >>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>> it! > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> -- > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>>> -- > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards > > > >>>>>>>>>>>>>>>>>>>>>>> Poornima > > > >>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> -- > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Jarek Potiuk > > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal > > > >> Software > > > >>>>>> Engineer > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> > > > >> <+48660796129 > > > >>>>>>>>>>>>> <+48%20660%20796%20129>> > > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> -- > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Jarek Potiuk > > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software > > > >>>>> Engineer > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129 > > > >>>>>>>>>>>>> <+48%20660%20796%20129>> > > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> -- > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> *Jacob Ferriero* > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> jferri...@google.com > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 617-714-2509 > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >