Agree on this being non-blocking. Regarding moving to vote, you can take care. Just open a new email thread on dev list and call for a vote. You can see this example from Tomek for AIP-31: https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
Best, Gerard Casas Saez Twitter | Cortex | @casassaez <http://twitter.com/casassaez> On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yuqian1...@gmail.com> wrote: > Hi, Gerard, yes I agree it's possible to do this at UI level without any > fundamental change to the implementation. If expand_group() sees that two > groups are fully connected (i.e. every task in one parent group depends on > every task in another parent group), it can decide to collapse all those > children edges into a single edge between the parent groups to reduce the > burden of the layout() function. However, I did not find any existing > algorithm to do this within dagre so we'll likely need to implement this > ourselves. Another hiccup is that at the moment it doesn't seem to be > possible to call setEdge() between two parent groups (aka clusters). If > someone has ideas how to do this please feel free to contribute. > > One other consideration is that this example is only an extreme case. There > are other in-between cases that still require user intervention. Let's say > if 90% of tasks in group1 depends on 90% of tasks in group2 and both groups > have more than 100 tasks. This will still cause a lot of edges on the graph > and it's even harder to reduce because the parent groups are not fully > connected so it's inaccurate to reduce them to a single edge between the > parents. In those cases, the user may still need to do something > themselves. e.g. adding some DummyOperator to the DAG to cut down the > edges. There will be some tradeoff because DummyOperator takes a short > while to execute like you mentioned. > > There are lots of room for improvements, but I don't think that's a > blocking issue for this AIP? So if you can move it to the voting stage > that'll be fantastic. > > > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zhouyao1...@icloud.com.invalid> > wrote: > > > +1 > > > > > 2020年8月18日 23:55,Gerard Casas Saez <gcasass...@twitter.com.INVALID> > 写道: > > > > > > Is it not possible to solve this at the UI level? Aka tell dagre to > only > > > add 1 edge to the group instead of to all nodes in the group? No need > to > > do > > > SubDag behaviour, but just reduce the edges on the graph. Should reduce > > > load time if I understand correctly. > > > > > > I would strongly avoid the Dummy operator since it will introduce > delays > > on > > > operator execution (as it will need to execute 1 dummy operator and > that > > > can be expensive imo). > > > > > > Overall though proposal looks good, unless anyone opposes it, I would > > move > > > this to vote mode :D > > > > > > Gerard Casas Saez > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > > > > > > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yuqian1...@gmail.com> wrote: > > > > > >> Hi, All, > > >> Here's the updated AIP-34 > > >> < > > >> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator > > >>> . > > >> The PR has been fine-tuned with better UI interactions and added > > >> serialization of TaskGroup: > > https://github.com/apache/airflow/pull/10153 > > >> > > >> Here's some experiment results: > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like this. > > Note > > >> there's a inside_section_2 is intentionally made to depend on all > tasks > > >> in inside_section_1 to generate a large number of edges. The > > observation is > > >> that opening the top level graph is very quick, around 270ms. > Expanding > > >> groups that don't have a lot of dense dependencies on other groups are > > also > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part that > > takes > > >> time is when expanding both groups inside_section_1 and > inside_section_2 > > >> Because there are 2500 edges between these two inner groups, it took > 63 > > >> seconds to expand both of them. Majority of the time (more than > > 62seconds) > > >> is actually taken by the layout() function in dagre. In other words, > > it's > > >> very fast to add nodes and edges, but laying them out on the graph > takes > > >> time. This issue is not actually a problem specific to TaskGroup. > > Without > > >> TaskGroup, if a DAG contains too many edges, it takes time to layout > the > > >> graph too. > > >> > > >> On the other hand, a more realistic experiment with production DAG > > >> containing about 400 tasks and 700 edges showed that grouping tasks > into > > >> three levels of nested TaskGroup cut the upfront page opening time > from > > >> around 6s to 500ms. (Obviously the time is paid back when user > gradually > > >> expands all the groups one by one, but normally people don't need to > > expand > > >> every group every time so it's still a big saving). The experiments > are > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome. > > >> > > >> I can see a few possible improvements to TaskGroup (or how it's used) > > that > > >> can be done as a next-step: > > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of > > >> displaying the whole DAG, we can limit the Graph View to show only a > > single > > >> TaskGroup, omitting its edges going out to other TaskGroups. This > > behaviour > > >> is more like SubDagOperator where users can zoom into/out of a > TaskGroup > > >> and look at only tasks within that TaskGroup as if those are the only > > tasks > > >> on the DAG. This can be done with either background javascript calls > or > > by > > >> making a new get request with filtering parameters. Obviously the > > downside > > >> is that it's not as explicit as showing all the dependencies on the > > graph. > > >> 2). Users can improve the organization of the DAG themselves to reduce > > the > > >> number of edges. E.g. if every task in group2 depends on every tasks > in > > >> group1, instead of doing group1 >> group2, they can add a > DummyOperator > > in > > >> between and do this: group1 >> dummy >> group2. This cuts down the > > number > > >> of edges significantly and page load becomes much faster. > > >> 3). If we really want, we can improve the >> operator of TaskGroup to > > do 2) > > >> automatically. If it sees that both sides of >> are TaskGroup, it can > > >> create a DummyOperator on behalf of the user. The downside is that it > > may > > >> be too much magic. > > >> > > >> Thanks, > > >> Qian > > >> > > >> def create_section(): > > >> """ > > >> Create tasks in the outer section. > > >> """ > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)] > > >> > > >> with TaskGroup("inside_section_1") as inside_section_1: > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)] > > >> > > >> with TaskGroup("inside_section_2") as inside_section_2: > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)] > > >> > > >> dummies[-1] >> inside_section_1 > > >> dummies[-2] >> inside_section_2 > > >> inside_section_1 >> inside_section_2 > > >> > > >> > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag: > > >> start = DummyOperator(task_id="start") > > >> > > >> with TaskGroup("section_1") as section_1: > > >> create_section() > > >> > > >> some_other_task = DummyOperator(task_id="some-other-task") > > >> > > >> with TaskGroup("section_2") as section_2: > > >> create_section() > > >> > > >> end = DummyOperator(task_id='end') > > >> > > >> start >> section_1 >> some_other_task >> section_2 >> end > > >> > > >> > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez > > >> <gcasass...@twitter.com.invalid> wrote: > > >> > > >>> Re graph times. That makes sense. Let me know what you find. We may > be > > >> able > > >>> to contribute on the lazy loading part. > > >>> > > >>> Looking forward to see the updated AIP! > > >>> > > >>> > > >>> Gerard Casas Saez > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > >>> > > >>> > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <kaxiln...@gmail.com> > > wrote: > > >>> > > >>>> Permissions granted, let me know if you face any issues. > > >>>> > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yuqian1...@gmail.com> > wrote: > > >>>> > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you! > > >>>>> > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <kaxiln...@gmail.com> > > >>> wrote: > > >>>>> > > >>>>>> What's your ID i.e. if you haven't created an account yet, please > > >>>> create > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and send > > >> us > > >>>>> your > > >>>>>> ID and we will add permissions. > > >>>>>> > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it? > > >>>>>>> My wiki user email is yuqian1...@gmail.com. > > >>>>>> > > >>>>>> > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1...@gmail.com> > > >>> wrote: > > >>>>>> > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission > > >> to > > >>>> edit > > >>>>>> it? > > >>>>>>> My wiki user email is yuqian1...@gmail.com. > > >>>>>>> > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web > > >>> server > > >>>>> at > > >>>>>>> once. However, it only adds the top level nodes and edges to the > > >>>> graph > > >>>>>> when > > >>>>>>> the Graph View page is first opened. And then adds the expanded > > >>> nodes > > >>>>> to > > >>>>>>> the graph as the user expands them. From what I've experienced > > >> with > > >>>>> DAGs > > >>>>>>> containing around 400 tasks (not using TaskGroup or > > >>> SubDagOperator), > > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds. Less > > >>>> than > > >>>>>> 60ms > > >>>>>>> of that is taken by loading the data from webserver. The > > >> remaining > > >>>>> 4.9s+ > > >>>>>> is > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as > > >>> createNodes, > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With TaskGroup > > >>>> being > > >>>>>> used > > >>>>>>> to group tasks into a smaller number of top-level nodes, the > > >> amount > > >>>> of > > >>>>>> data > > >>>>>>> loaded from webserver will remain about the same compared to a > > >> flat > > >>>> dag > > >>>>>> of > > >>>>>>> the same size, but the number of nodes and edges needed to be > > >> plot > > >>> on > > >>>>> the > > >>>>>>> graph can be reduced significantly. So in theory this should > > >> speed > > >>> up > > >>>>> the > > >>>>>>> time it takes to open Graph View even without lazy-loading the > > >> data > > >>>>> (I'll > > >>>>>>> experiment to find out). That said, if it comes to a point > > >>>> lazy-loading > > >>>>>>> helps, we can still implement it as an improvement. > > >>>>>>> > > >>>>>>> Re James: the Tree View looks as if all all the groups are fully > > >>>>>> expanded. > > >>>>>>> (because under the hood all the tasks are in a single DAG). I'm > > >>> less > > >>>>>>> worried about Tree View at the moment because it already has a > > >>>>> mechanism > > >>>>>>> for collapsing tasks by the dependency tree. That said, the Tree > > >>> View > > >>>>> can > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks > > >> in > > >>>> the > > >>>>>> same > > >>>>>>> TaskGroup when Tree View is first opened). > > >>>>>>> > > >>>>>>> For both suggestions, implementing them don't require fundamental > > >>>>> changes > > >>>>>>> to the idea. I think we can have a basic working TaskGroup first, > > >>> and > > >>>>>> then > > >>>>>>> improve it incrementally in several PRs as we get more feedback > > >>> from > > >>>>> the > > >>>>>>> community. What do you think? > > >>>>>>> > > >>>>>>> Qian > > >>>>>>> > > >>>>>>> > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jcode...@gmail.com> > > >>>>> wrote: > > >>>>>>> > > >>>>>>>> I agree this looks great, one question, how does the tree view > > >>>> look? > > >>>>>>>> > > >>>>>>>> James Coder > > >>>>>>>> > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez < > > >>>>>> gcasass...@twitter.com > > >>>>>>> .invalid> > > >>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> First of all, this is awesome!! > > >>>>>>>>> > > >>>>>>>>> Secondly, checking your UI code, seems you are loading all > > >>>>> operators > > >>>>>> at > > >>>>>>>>> once. Wondering if we can load them as needed (aka load > > >>> whenever > > >>>> we > > >>>>>>> click > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take > > >> forever > > >>>> to > > >>>>>> load > > >>>>>>>> on > > >>>>>>>>> the Graph view, so worried about this still being an issue > > >>> here. > > >>>> It > > >>>>>> may > > >>>>>>>> be > > >>>>>>>>> easily solvable by implementing lazy loading of the graph. > > >> Not > > >>>> sure > > >>>>>> how > > >>>>>>>>> easy to implement/add to the UI extension (and dont want to > > >>> push > > >>>>> for > > >>>>>>>> early > > >>>>>>>>> optimization as its the root of all evil). > > >>>>>>>>> Gerard Casas Saez > > >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang < > > >>>>>> bin.huan...@gmail.com> > > >>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>> Hi Yu, > > >>>>>>>>>> > > >>>>>>>>>> Thank you so much for taking on this. I was fairly > > >> distracted > > >>>>>>> previously > > >>>>>>>>>> and I didn't have the time to update the proposal. In fact, > > >>>> after > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this > > >>> AIP > > >>>>> has > > >>>>>>>> been > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of > > >> rewriting > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag > > >>> in a > > >>>>>>> future > > >>>>>>>>>> date.). > > >>>>>>>>>> > > >>>>>>>>>> Your PR is amazing and it has implemented the desire > > >>> features. I > > >>>>>> think > > >>>>>>>> we > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the > > >> AIP > > >>>>> based > > >>>>>>> on > > >>>>>>>>>> what you have done in your PR? > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Bin > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian < > > >>> yuqian1...@gmail.com> > > >>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed > > >>>>>>> implementation > > >>>>>>>> of > > >>>>>>>>>>> TaskGroup as UI grouping concept: > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153 > > >>>>>>>>>>> > > >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup > > >> so > > >>>> i'm > > >>>>>>>> quoting > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the > > >>>>> restriction > > >>>>>>>>>>> "... **cannot* > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either > > >> a* > > >>>>>>>>>>> * Task in a different TaskGroup or a Task not in any > > >>>> group*". I > > >>>>>>> think > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept, > > >>>> tasks > > >>>>>> can > > >>>>>>>> have > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any > > >>>> TaskGroup. > > >>>>>> In > > >>>>>>> my > > >>>>>>>>>> PR, > > >>>>>>>>>>> this is allowed. The graph edges will update accordingly > > >> when > > >>>>>>>> TaskGroups > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make > > >> the > > >>>> UI > > >>>>>> look > > >>>>>>>>>> less > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks > > >>> and > > >>>>>> edges > > >>>>>>>> so > > >>>>>>>>>>> things work normally. Here's a screenshot > > >>>>>>>>>>> < > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif > > >>>>>>>>>>>> > > >>>>>>>>>>> of the UI interaction. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> * - Tasks can be added to a TaskGroup - You *can* have > > >>>>>>> dependencies > > >>>>>>>>>>> between Tasks in the same TaskGroup, but *cannot* have > > >>>>>> dependencies > > >>>>>>>>>>> between a Task in a TaskGroup and either a Task in a > > >>>> different > > >>>>>>>>>> TaskGroup > > >>>>>>>>>>> or a Task not in any group - You *can* have dependencies > > >>>>> between > > >>>>>> a > > >>>>>>>>>>> TaskGroup and either other TaskGroups or Tasks not in any > > >>>> group > > >>>>>> - > > >>>>>>>> The > > >>>>>>>>>>> UI will by default render a TaskGroup as a single "object", > > >>> but > > >>>>>>> which > > >>>>>>>>>> you > > >>>>>>>>>>> expand or zoom into in some way - You'd need some way to > > >>>>>> determine > > >>>>>>>> what > > >>>>>>>>>>> the "status" of a TaskGroup was at least for UI display > > >>>>> purposes* > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to > > >> implement > > >>>> the > > >>>>>>>>>> "retrying > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional > > >> feature > > >>>> of > > >>>>>>>>>> TaskGroup > > >>>>>>>>>>> although that may go against having TaskGroup as a pure UI > > >>>>> concept. > > >>>>>>> For > > >>>>>>>>>> the > > >>>>>>>>>>> motivating example Jake provided, I suggest implementing > > >> both > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a > > >> single > > >>>>>>> operator. > > >>>>>>>> It > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in > > >>>>>>> "reschedule" > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long > > >>>> running > > >>>>>> job > > >>>>>>>> to > > >>>>>>>>>>> the external service, and store the state (e.g. in XCom). > > >>> Then > > >>>>>>>> reschedule > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion > > >> state. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero > > >>>>>>>>>> <jferri...@google.com.invalid > > >>>>>>>>>>>> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I > > >> think > > >>>> this > > >>>>>>> will > > >>>>>>>>>> be > > >>>>>>>>>>>> much easier to use than SubDag. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I'd like to propose an optional behavior for special retry > > >>>>>> mechanics > > >>>>>>>>>> via > > >>>>>>>>>>> a > > >>>>>>>>>>>> TaskGroup.retry_all property. > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use > > >> of > > >>>>>> SubDag > > >>>>>>>> for > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external > > >>>> state > > >>>>>> then > > >>>>>>>>>>>> reschedule poll until desired state reached". > > >>>>>>>>>>>> > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two > > >>>> task > > >>>>>>> group > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor]. > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry > > >> the > > >>>>>>>>>>> SubmitJobTask > > >>>>>>>>>>>> if something about the PollJobSensor fails. > > >>>>>>>>>>>> This pattern would be really nice for jobs that are > > >> expected > > >>>> to > > >>>>>> run > > >>>>>>> a > > >>>>>>>>>>> long > > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode > > >>>> freeing > > >>>>> up > > >>>>>>>>>> slots) > > >>>>>>>>>>>> but might fail for a retryable reason. > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the > > >>>> purpose > > >>>>>>>> because > > >>>>>>>>>>>> SubDag infamously > > >>>>>>>>>>>> < > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10 > > >>>>>>>>>>>>> > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration. > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is > > >> very > > >>>>> common > > >>>>>>> for > > >>>>>>>>>> a > > >>>>>>>>>>>> single operator to submit job / wait til done. > > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ, > > >>>>> Dataproc, > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> > > >>>> PollTask] > > >>>>>>> with > > >>>>>>>>>> an > > >>>>>>>>>>>> optional reschedule mode if user knows that this job may > > >>> take > > >>>> a > > >>>>>> long > > >>>>>>>>>>> time. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I'd be happy to the development work on adding this > > >> specific > > >>>>> retry > > >>>>>>>>>>> behavior > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if > > >> others > > >>> in > > >>>>> the > > >>>>>>>>>>>> community would find this a useful feature. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers, > > >>>>>>>>>>>> Jake > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk < > > >>>>>>>> jarek.pot...@polidea.com > > >>>>>>>>>>> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have > > >>> regular > > >>>>>>>>>> planning > > >>>>>>>>>>>> and > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting task > > >>>> force > > >>>>>> for > > >>>>>>> it > > >>>>>>>>>>>> soon, > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and > > >>> even > > >>>>>> start > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that > > >> we > > >>>> are > > >>>>>>>>>>>> prioritizing > > >>>>>>>>>>>>> 2.0 work. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> J, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian < > > >>>> yuqian1...@gmail.com> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi Jarek, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the > > >> existing > > >>>>>>>>>>>> SubDagOperator > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion > > >>>> about > > >>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>> as > > >>>>>>>>>>>>>> a brand new concept/feature independent from the > > >> existing > > >>>>>>>>>>>> SubDagOperator? > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping > > >>>>> concept > > >>>>>>>>>> like > > >>>>>>>>>>>> Ash > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all. > > >> Whenever > > >>> we > > >>>>> are > > >>>>>>>>>>> ready > > >>>>>>>>>>>>> with > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow > > >>> 2.1. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the > > >> SubDagOperator > > >>>>> idea > > >>>>>>>>>> into > > >>>>>>>>>>> a > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of > > >>>>>> "reattaching > > >>>>>>>>>> all > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James > > >>>>> pointed > > >>>>>>>>>> out > > >>>>>>>>>>> we > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies > > >>> setting > > >>>> of > > >>>>>>>>>>>> TaskGroup. > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think > > >>>> having > > >>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>> as > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can > > >>>>>> simplify > > >>>>>>>>>>>>> Xinbin's > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here: > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153 > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience > > >>>> with > > >>>>>> web > > >>>>>>>>>> UI. > > >>>>>>>>>>>> If > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Qian > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk < > > >>>>>>>>>>> jarek.pot...@polidea.com > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping > > >>> up. > > >>>>>> Maybe > > >>>>>>>>>> we > > >>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions > > >>> about > > >>>>>>>>>> further > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important > > >>> discussions > > >>>>> (and > > >>>>>>>>>> we > > >>>>>>>>>>>>> should > > >>>>>>>>>>>>>>> continue them in the near future !) I think at this > > >>> point > > >>>>>>>>>> focusing > > >>>>>>>>>>>> on > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus > > >>>> now ? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> J. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang < > > >>>>>>>>>>> bin.huan...@gmail.com> > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Hi Daniel > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API > > >> as a > > >>>> DAG > > >>>>>>>>>>> object > > >>>>>>>>>>>>>>> related > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything > > >>>> related > > >>>>> to > > >>>>>>>>>>>> actual > > >>>>>>>>>>>>>>>> execution or scheduling. > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the > > >>> weekend. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when > > >> you > > >>>>>>>>>> import > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> object > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the > > >> shape > > >>>> of > > >>>>>> the > > >>>>>>>>>>>> DAG. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a > > >>>>> similar > > >>>>>>>>>>>> purpose > > >>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>> DAG factory function? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman < > > >>>>>>>>>>>>>>> daniel.imber...@gmail.com > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Hi Bin, > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG > > >> object > > >>>>> (e.g. > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>> bitwise > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a > > >>>>>>>>>>>> “DAGTemplate” > > >>>>>>>>>>>>>>>> object > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it > > >> with > > >>>>>>>>>>> parameters > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> determine the shape of the DAG. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang < > > >>>>>>>>>>>>> bin.huan...@gmail.com > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a > > >>>>> parameter > > >>>>>>>>>>>>> itself, > > >>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my > > >> opinion, > > >>>> the > > >>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies, > > >>> and > > >>>>> the > > >>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any > > >>>>>>>>>>> execution/scheduling > > >>>>>>>>>>>>>> logic > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs > > >>>> etc.) > > >>>>>>>>>>> like > > >>>>>>>>>>>> a > > >>>>>>>>>>>>>> DAG > > >>>>>>>>>>>>>>>>> does. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule > > >>>>>>>>>> interval > > >>>>>>>>>>>> of > > >>>>>>>>>>>>>> DAG > > >>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 > > >>> min. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case > > >> that > > >>>> you > > >>>>>>>>>> want > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> achieve? > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 < > > >>>>>>>>>> thanosxnicho...@gmail.com > > >>>>>>>>>>>> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Hi Bin, > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of > > >> TaskGroup > > >>>> the > > >>>>>>>>>>> same > > >>>>>>>>>>>>> as > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule > > >>>>>>>>>> interval > > >>>>>>>>>>> of > > >>>>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For > > >> example, > > >>>>> there > > >>>>>>>>>>> is > > >>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> scenario > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the > > >>>>>>>>>> schedule > > >>>>>>>>>>>>>> interval > > >>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>> Nicholas > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang < > > >>>>>>>>>>>>>> bin.huan...@gmail.com > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Hi Nicholas, > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of > > >>> SubDagOperator, > > >>>>>>>>>>> maybe > > >>>>>>>>>>>>> it > > >>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>> throw > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the > > >> subdag's > > >>>>>>>>>>>>>>>> schedule_interval > > >>>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to > > >>> replace > > >>>>>>>>>>>> SubDag, > > >>>>>>>>>>>>>>> there > > >>>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 < > > >>>>>>>>>>>> thanosxnicho...@gmail.com > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Hi Bin, > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused > > >>> whether > > >>>>>>>>>> the > > >>>>>>>>>>>>>>> schedule > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the > > >>>> parent > > >>>>>>>>>>>> DAG? > > >>>>>>>>>>>>> I > > >>>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule > > >>>> interval > > >>>>>>>>>>> of > > >>>>>>>>>>>>>>> SubDAG. > > >>>>>>>>>>>>>>>> If > > >>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval, > > >>> what > > >>>>>>>>>>> will > > >>>>>>>>>>>>>>> happen > > >>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG? > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Regards, > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang < > > >>>>>>>>>>>>>>>> bin.huan...@gmail.com> > > >>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback! > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and > > >>> task > > >>>>>>>>>>>>>> groups. I > > >>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove > > >>>>>>>>>>> subdag > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>> introduce > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of > > >> tasks > > >>>>>>>>>>> along > > >>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>> their > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic > > >>> as a > > >>>>>>>>>>>> DAG*. > > >>>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>>> only > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but > > >> you > > >>>>>>>>>>> still > > >>>>>>>>>>>>> need > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> add > > >>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> ``` > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup: > > >>>>>>>>>>>>>>>>>>>>> """ > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default > > >>> args > > >>>>>>>>>>>> from > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> DAG. > > >>>>>>>>>>>>>>>>>>>>> """ > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args): > > >>>>>>>>>>>>>>>>>>>>> pass > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> """ > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to > > >> adding > > >>>>>>>>>>> tasks > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>> DAG > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the > > >>> dag > > >>>>>>>>>>> file > > >>>>>>>>>>>>>>>>>>>>> """ > > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download', > > >>>>>>>>>>>>>>>>>>>> default_args=default_args) > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1) > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> with download_group: > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3') > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3 > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution""" > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag', > > >>>>>>>>>>>>>>> default_args=default_args, > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag: > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start') > > >>>>>>>>>>>>>>>>>>>>> start >> download_group > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3 > > >>>>>>>>>>>>>>>>>>>>> ``` > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks > > >> and > > >>>>>>>>>> set > > >>>>>>>>>>>>>>>> dependencies > > >>>>>>>>>>>>>>>>>>>> between > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using > > >>>>>>>>>>>>>> SubDagOperator, > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>>>> can > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >> > > >>> task`. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before > > >>>>>>>>>> Airflow > > >>>>>>>>>>>> 2.0 > > >>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>> allow > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we > > >> still > > >>>>>>>>>> want > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>> keep > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Any thoughts? > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Cheers, > > >>>>>>>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime > > >> Beauchemin < > > >>>>>>>>>>>>>>>>>>>>> maximebeauche...@gmail.com> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks > > >>>>>>>>>>> groups > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>> zoom-in/out > > >>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the > > >>> DAG > > >>>>>>>>>>>>> object > > >>>>>>>>>>>>>>>> since > > >>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does > > >>>>>>>>>>> create > > >>>>>>>>>>>>>>>> underlying > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a > > >>>>>>>>>> group > > >>>>>>>>>>>> of > > >>>>>>>>>>>>>>> tasks. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Max > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi < > > >>>>>>>>>>>>>>>>>>>>> joshipoornim...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang < > > >>>>>>>>>>>>>>>>>>> bin.huan...@gmail.com > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This > > >>>>>>>>>>>>>> rewrites > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and > > >>>>>>>>>> it > > >>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>> give a > > >>>>>>>>>>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already > > >>>>>>>>>> does > > >>>>>>>>>>>>> this I > > >>>>>>>>>>>>>>>>> think. > > >>>>>>>>>>>>>>>>>> At > > >>>>>>>>>>>>>>>>>>>>> least > > >>>>>>>>>>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag > > >>>>>>>>>>> representation, > > >>>>>>>>>>>>> but > > >>>>>>>>>>>>>> at > > >>>>>>>>>>>>>>>>> least > > >>>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table? > > >>>>>>>>>> In > > >>>>>>>>>>> my > > >>>>>>>>>>>>>>>> proposal > > >>>>>>>>>>>>>>>>> as > > >>>>>>>>>>>>>>>>>>>> also > > >>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks > > >>>>>>>>>> from > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> subdag > > >>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>> add > > >>>>>>>>>>>>>>>>>>>>>> them > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG > > >> graph > > >>>>>>>>>>>> will > > >>>>>>>>>>>>>> look > > >>>>>>>>>>>>>>>>>> exactly > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata > > >>>>>>>>>> attached > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>> those > > >>>>>>>>>>>>>>>>>>>> sections. > > >>>>>>>>>>>>>>>>>>>>>>> These > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in > > >> the > > >>>>>>>>>>> UI. > > >>>>>>>>>>>>> So > > >>>>>>>>>>>>>>>> after > > >>>>>>>>>>>>>>>>>>>> parsing > > >>>>>>>>>>>>>>>>>>>>> ( > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output > > >>>>>>>>>> the > > >>>>>>>>>>>>>>> *root_dag > > >>>>>>>>>>>>>>>>>>>> *instead > > >>>>>>>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag + > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata > > >>>>>>>>>>>>>>>>>> current_group=section-1, > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for > > >>>>>>>>>>> naming > > >>>>>>>>>>>>>>>>>>> suggestions), > > >>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have > > >>>>>>>>>>> nested > > >>>>>>>>>>>>>> group > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>> still > > >>>>>>>>>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG: > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png] > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be > > >> something > > >>>>>>>>>>>> like > > >>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>> by > > >>>>>>>>>>>>>>>>>>>>> utilizing > > >>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into > > >>>>>>>>>> in > > >>>>>>>>>>>> some > > >>>>>>>>>>>>>>> way. > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png] > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that: > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra > > >>>>>>>>>>> complexity > > >>>>>>>>>>>> of > > >>>>>>>>>>>>>>>> SubDag > > >>>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>>>>> execution > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not > > >>>>>>>>>> using > > >>>>>>>>>>>>>> SubDag. > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and > > >>>>>>>>>>>>> reusable > > >>>>>>>>>>>>>>> dag > > >>>>>>>>>>>>>>>>> code > > >>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with > > >> the > > >>>>>>>>>>> new > > >>>>>>>>>>>>>>>>>>> SubDagOperator > > >>>>>>>>>>>>>>>>>>>>> (see > > >>>>>>>>>>>>>>>>>>>>>>> AIP > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory > > >>>>>>>>>>>>> function > > >>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>>> generating 1 > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag > > >>>>>>>>>>> (in > > >>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>> case, > > >>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>>>>>> just > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the > > >>>>>>>>>>> root > > >>>>>>>>>>>>>> dag). > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag > > >>>>>>>>>>>> with a > > >>>>>>>>>>>>>>>>>> simpler > > >>>>>>>>>>>>>>>>>>>>>> concept > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains > > >>>>>>>>>> out > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> contents > > >>>>>>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>> SubDag > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator > > >>>>>>>>>>>>>>>>>>>>>>> (forgive > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it > > >> is > > >>>>>>>>>>>> still > > >>>>>>>>>>>>>>>>>>> necessary > > >>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>> keep the > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a > > >>>>>>>>>>>> name? > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks > > >>>>>>>>>>>> Chris > > >>>>>>>>>>>>>>> Palmer > > >>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>>> helping > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, > > >> I > > >>>>>>>>>>>> will > > >>>>>>>>>>>>>> just > > >>>>>>>>>>>>>>>>> paste > > >>>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>>>> here. > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks > > >>>>>>>>>> in > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>> same > > >>>>>>>>>>>>>>>>>>>> TaskGroup, > > >>>>>>>>>>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in > > >>>>>>>>>> a > > >>>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>> either a > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not > > >>>>>>>>>> in > > >>>>>>>>>>>> any > > >>>>>>>>>>>>>>> group > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a > > >>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>> either > > >>>>>>>>>>>>>>>>>>>>> other > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup > > >>>>>>>>>> as > > >>>>>>>>>>> a > > >>>>>>>>>>>>>> single > > >>>>>>>>>>>>>>>>>>>> "object", > > >>>>>>>>>>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the > > >>>>>>>>>>>>> "status" > > >>>>>>>>>>>>>>> of a > > >>>>>>>>>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>>>>>>>>> was > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris: > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler & > > >>>>>>>>>>> executor), I > > >>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>>>>>>>>> should > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide > > >>>>>>>>>> to > > >>>>>>>>>>>>>>> implement > > >>>>>>>>>>>>>>>>>> some > > >>>>>>>>>>>>>>>>>>>>>> metadata > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of > > >>>>>>>>>>> tasks > > >>>>>>>>>>>>>> etc.) > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to > > >> pick > > >>>>>>>>>>> up > > >>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> individual > > >>>>>>>>>>>>>>>>>>>>>> tasks' > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's > > >>>>>>>>>> status > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel > > >>>>>>>>>> Imberman > > >>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imber...@gmail.com> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` > > >> operator > > >>>>>>>>>>> to > > >>>>>>>>>>>>> tie > > >>>>>>>>>>>>>>> dags > > >>>>>>>>>>>>>>>>>>>> together > > >>>>>>>>>>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if > > >> we > > >>>>>>>>>>>> could > > >>>>>>>>>>>>>>>>>> essentially > > >>>>>>>>>>>>>>>>>>>>> write > > >>>>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all > > >>>>>>>>>>>> starter-tasks > > >>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>> DAG. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly > > >>>>>>>>>> UI > > >>>>>>>>>>>>>> concept. > > >>>>>>>>>>>>>>>> It > > >>>>>>>>>>>>>>>>>>>> doesn’t > > >>>>>>>>>>>>>>>>>>>>>> need > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding > > >> more > > >>>>>>>>>>>> tasks > > >>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>> queue > > >>>>>>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources > > >>>>>>>>>> available. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [ > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 > > >>>>>>>>>>>>>>>>>>>>>>>>> ] > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer > > >>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>> ch...@crpalmer.com > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex > > >>>>>>>>>>>>>> abstraction. > > >>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>>> what > > >>>>>>>>>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a > > >>>>>>>>>> high > > >>>>>>>>>>>>> level > > >>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>>> want > > >>>>>>>>>>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality: > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks > > >> in > > >>>>>>>>>>> the > > >>>>>>>>>>>>>> same > > >>>>>>>>>>>>>>>>>>> TaskGroup, > > >>>>>>>>>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in > > >> a > > >>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>> either > > >>>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not > > >> in > > >>>>>>>>>>> any > > >>>>>>>>>>>>>> group > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a > > >>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>> either > > >>>>>>>>>>>>>>>>>>> other > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup > > >>>>>>>>>> as a > > >>>>>>>>>>>>>> single > > >>>>>>>>>>>>>>>>>>> "object", > > >>>>>>>>>>>>>>>>>>>>> but > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the > > >>>>>>>>>>>> "status" > > >>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>> TaskGroup > > >>>>>>>>>>>>>>>>>>>>>> was > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level > > >>>>>>>>>>> object > > >>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>> its > > >>>>>>>>>>>>>>>>>> own > > >>>>>>>>>>>>>>>>>>>>>> database > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on > > >>>>>>>>>>>> tasks. > > >>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>>> could > > >>>>>>>>>>>>>>>>>>>>>>>>> build > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers > > >>>>>>>>>> point > > >>>>>>>>>>> of > > >>>>>>>>>>>>>> view > > >>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>> DAG > > >>>>>>>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any > > >>>>>>>>>> differently. > > >>>>>>>>>>> So > > >>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>> really > > >>>>>>>>>>>>>>>>>>> just > > >>>>>>>>>>>>>>>>>>>>>>> becomes > > >>>>>>>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between > > >> sets > > >>>>>>>>>>> of > > >>>>>>>>>>>>>> Tasks, > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>> allows > > >>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>> UI > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov > > >>>>>>>>>>>>>>>>>>>>>>> <ddavy...@twitter.com.invalid > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually > > >>>>>>>>>> the > > >>>>>>>>>>>> more > > >>>>>>>>>>>>>>>>> important > > >>>>>>>>>>>>>>>>>>>> issue > > >>>>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>> fix), > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the > > >>>>>>>>>>> right > > >>>>>>>>>>>>> way > > >>>>>>>>>>>>>>>>> forward > > >>>>>>>>>>>>>>>>>>>> (just > > >>>>>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>>>>>>> might > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than > > >>>>>>>>>>> adding > > >>>>>>>>>>>>>>> visual > > >>>>>>>>>>>>>>>>>>> grouping > > >>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI). > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI > > >>>>>>>>>>> with > > >>>>>>>>>>>>> more > > >>>>>>>>>>>>>>>>> context > > >>>>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>> why > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions: > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>> > > >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html > > >>>>>>>>>>>>>>>>>>>>>> . A > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem > > >>>>>>>>>> is > > >>>>>>>>>>>> e.g. > > >>>>>>>>>>>>>>>>> enabling > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> operator > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as > > >>>>>>>>>>>> well. I > > >>>>>>>>>>>>>> see > > >>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>> being > > >>>>>>>>>>>>>>>>>>>>>>>>> separate > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the > > >>>>>>>>>> UI > > >>>>>>>>>>>> but > > >>>>>>>>>>>>>> one > > >>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>> two > > >>>>>>>>>>>>>>>>>>>>>> items > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag > > >>>>>>>>>>>>>> functionality. > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years > > >>>>>>>>>> and > > >>>>>>>>>>>>> they > > >>>>>>>>>>>>>>> are > > >>>>>>>>>>>>>>>>>>> always a > > >>>>>>>>>>>>>>>>>>>>>> giant > > >>>>>>>>>>>>>>>>>>>>>>>>> pain > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user > > >>>>>>>>>>>>> confusion > > >>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>> breakages > > >>>>>>>>>>>>>>>>>>>>>>>>> during > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :). > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James > > >>>>>>>>>> Coder < > > >>>>>>>>>>>>>>>>>>>> jcode...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a > > >>>>>>>>>> UI > > >>>>>>>>>>>>>>> concept. I > > >>>>>>>>>>>>>>>>> use > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>> subdag > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If > > >>>>>>>>>>> you > > >>>>>>>>>>>>>> have a > > >>>>>>>>>>>>>>>>> group > > >>>>>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>>>>>> tasks > > >>>>>>>>>>>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of > > >>>>>>>>>> tasks > > >>>>>>>>>>>>>> start, > > >>>>>>>>>>>>>>>>> using > > >>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>> subdag > > >>>>>>>>>>>>>>>>>>>>>>> is > > >>>>>>>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies > > >>>>>>>>>>>> and I > > >>>>>>>>>>>>>>> think > > >>>>>>>>>>>>>>>>>> also > > >>>>>>>>>>>>>>>>>>>> make > > >>>>>>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code. > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle > > >>>>>>>>>> Hamlin > > >>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>> hamlin...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash > > >>>>>>>>>>>>>> Berlin-Taylor > > >>>>>>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>>>>> a...@apache.org > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator > > >>>>>>>>>>>> anymore? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just > > >>>>>>>>>>>>> replacing > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>> UI > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less > > >>>>>>>>>> to > > >>>>>>>>>>>> get > > >>>>>>>>>>>>>>>> wrong, > > >>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>> closer > > >>>>>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>>> what > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with > > >>>>>>>>>>>> subdags? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in > > >>>>>>>>>>>> subdags > > >>>>>>>>>>>>>>> could > > >>>>>>>>>>>>>>>>>> start > > >>>>>>>>>>>>>>>>>>>>>> running > > >>>>>>>>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should > > >>>>>>>>>> we > > >>>>>>>>>>>> not > > >>>>>>>>>>>>>>> also > > >>>>>>>>>>>>>>>>> just > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_ > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace > > >>>>>>>>>> it > > >>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>> something > > >>>>>>>>>>>>>>>>>>>>>> simpler. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I > > >>>>>>>>>>> haven't > > >>>>>>>>>>>>> used > > >>>>>>>>>>>>>>>> them > > >>>>>>>>>>>>>>>>>>>>>> extensively > > >>>>>>>>>>>>>>>>>>>>>>> so > > >>>>>>>>>>>>>>>>>>>>>>>>>> may > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these): > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it > > >>>>>>>>>>>> has(?) > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>> form > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own > > >>>>>>>>>> schedule_interval, > > >>>>>>>>>>>> but > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>> has > > >>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>> match > > >>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own. > > >>>>>>>>>>>> (Does > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>> make > > >>>>>>>>>>>>>>>>>>> sense > > >>>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>> do > > >>>>>>>>>>>>>>>>>>>>>>>>>> this? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the > > >>>>>>>>>>> sub > > >>>>>>>>>>>>> dag > > >>>>>>>>>>>>>>>> would > > >>>>>>>>>>>>>>>>>>> never > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to > > >>>>>>>>>>>>> operator a > > >>>>>>>>>>>>>>>>> subdag > > >>>>>>>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>>>>>>>>>>>> always > > >>>>>>>>>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts? > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash > > >>>>>>>>>>>>>> Berlin-Taylor < > > >>>>>>>>>>>>>>>>>>>>>> a...@apache.org> > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm > > >>>>>>>>>>>>> excited > > >>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> see > > >>>>>>>>>>>>>>>>>> how > > >>>>>>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag > > >>>>>>>>>>> parsing*: > > >>>>>>>>>>>>> This > > >>>>>>>>>>>>>>>>>> rewrites > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while > > >>>>>>>>>>> parsing, > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>>> give a > > >>>>>>>>>>>>>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation > > >>>>>>>>>>>> already > > >>>>>>>>>>>>>> does > > >>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>>> I > > >>>>>>>>>>>>>>>>>>>>> think. > > >>>>>>>>>>>>>>>>>>>>>>> At > > >>>>>>>>>>>>>>>>>>>>>>>>>> least > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here > > >>>>>>>>>>>> correctly. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin > > >>>>>>>>>>>> Huang < > > >>>>>>>>>>>>>>>>>>>>>>> bin.huan...@gmail.com > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and > > >>>>>>>>>>>> collect > > >>>>>>>>>>>>>>>>> feedback > > >>>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34 > > >>>>>>>>>>>>>>>>>>>>>>>>>> on > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was > > >>>>>>>>>>>>>> previously > > >>>>>>>>>>>>>>>>>> briefly > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be > > >>>>>>>>>>> done > > >>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>> Airflow > > >>>>>>>>>>>>>>>>>>> 2.0, > > >>>>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>>>> one of > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator > > >>>>>>>>>>> attach > > >>>>>>>>>>>>>> tasks > > >>>>>>>>>>>>>>>> back > > >>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>> root > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving > > >>>>>>>>>>>>>> SubDagOperator > > >>>>>>>>>>>>>>>>>> related > > >>>>>>>>>>>>>>>>>>>>>> issues > > >>>>>>>>>>>>>>>>>>>>>>> by > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag > > >>>>>>>>>> while > > >>>>>>>>>>>>>>> respecting > > >>>>>>>>>>>>>>>>>>>>>> dependencies > > >>>>>>>>>>>>>>>>>>>>>>>>>> during > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping > > >>>>>>>>>> effect > > >>>>>>>>>>>> on > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>> UI > > >>>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory > > >>>>>>>>>>>> function > > >>>>>>>>>>>>>> more > > >>>>>>>>>>>>>>>>>>> reusable > > >>>>>>>>>>>>>>>>>>>>>>> because > > >>>>>>>>>>>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and > > >>>>>>>>>>>>>>> child_dag_name > > >>>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>> function > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag > > >>>>>>>>>>> parsing*: > > >>>>>>>>>>>>> This > > >>>>>>>>>>>>>>>>>> rewrites > > >>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag* > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while > > >>>>>>>>>>> parsing, > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>>>>> give a > > >>>>>>>>>>>>>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The > > >>>>>>>>>> new > > >>>>>>>>>>>>>>>>> SubDagOperator > > >>>>>>>>>>>>>>>>>>>> acts > > >>>>>>>>>>>>>>>>>>>>>>> like a > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original > > >>>>>>>>>>>>> methods > > >>>>>>>>>>>>>>> are > > >>>>>>>>>>>>>>>>>>> removed. > > >>>>>>>>>>>>>>>>>>>>> The > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory > > >>>>>>>>>> *with > > >>>>>>>>>>>>>>>>> *subdag_args > > >>>>>>>>>>>>>>>>>>> *and > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the > > >>>>>>>>>> PythonOperator > > >>>>>>>>>>>>>>>> signature. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add > > >>>>>>>>>>>>>>> current_group > > >>>>>>>>>>>>>>>> & > > >>>>>>>>>>>>>>>>>>>>>> parent_group > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is > > >>>>>>>>>>> used > > >>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> group > > >>>>>>>>>>>>>>>>>>> tasks > > >>>>>>>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend > > >>>>>>>>>>>>> further > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>> group > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to > > >>>>>>>>>>> allow > > >>>>>>>>>>>>>>>>> group-level > > >>>>>>>>>>>>>>>>>>>>>> operations > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within > > >>>>>>>>>>> the > > >>>>>>>>>>>>>> dag) > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*: > > >>>>>>>>>> Proposed > > >>>>>>>>>>>> UI > > >>>>>>>>>>>>>>>>>> modification > > >>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>> allow > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a > > >>>>>>>>>>>> flat > > >>>>>>>>>>>>>>>>> structure > > >>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>> pair > > >>>>>>>>>>>>>>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original > > >>>>>>>>>>>>> hierarchical > > >>>>>>>>>>>>>>>>>>> structure. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and > > >>>>>>>>>> PRs > > >>>>>>>>>>>> for > > >>>>>>>>>>>>>>>> details: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP: > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue: > > >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078 > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR: > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243 > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any > > >>>>>>>>>>>>> aspects > > >>>>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>> you > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially > > >>>>>>>>>>> the > > >>>>>>>>>>>>>> third > > >>>>>>>>>>>>>>>>>> change > > >>>>>>>>>>>>>>>>>>>>>>> regarding > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup). > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am > > >>>>>>>>>>>> looking > > >>>>>>>>>>>>>>>> forward > > >>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>> it! > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards > > >>>>>>>>>>>>>>>>>>>>>>> Poornima > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Jarek Potiuk > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal > > >> Software > > >>>>>> Engineer > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> > > >> <+48660796129 > > >>>>>>>>>>>>> <+48%20660%20796%20129>> > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> -- > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Jarek Potiuk > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software > > >>>>> Engineer > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129 > > >>>>>>>>>>>>> <+48%20660%20796%20129>> > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> -- > > >>>>>>>>>>>> > > >>>>>>>>>>>> *Jacob Ferriero* > > >>>>>>>>>>>> > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering > > >>>>>>>>>>>> > > >>>>>>>>>>>> jferri...@google.com > > >>>>>>>>>>>> > > >>>>>>>>>>>> 617-714-2509 > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > >