Thank you for your email. On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <bin.huan...@gmail.com> wrote:
> > > - *Unpack SubDags during dag parsing*: This rewrites the > *DagBag.bag_dag* > > > method to unpack subdag while parsing, and it will give a flat > > > structure at > > > the task level > > > > The serialized_dag representation already does this I think. At least if > > I've understood your idea here correctly. > > I am not sure about serialized_dag representation, but at least it will > still keep the subdag entry in the DAG table? In my proposal as also in the > draft PR, the idea is to *extract the tasks from the subdag and add them > back to the root_dag. *So the runtime DAG graph will look exactly the > same as without subdag but with metadata attached to those sections. These > metadata will be later on used to render in the UI. So after parsing ( > *DagBag.process_file()*), it will just output the *root_dag *instead of > *root_dag + > subdag + subdag + nested subdag* etc. > > - e.g. section-1-* will have metadata current_group=section-1, > parent_group=<the-root-dag-id> (welcome for naming suggestions), the > reason for parent_group is that we can have nested group and still be > able to capture the dependency. > > Runtime DAG: > [image: image.png] > > While at the UI, what we see would be something like this by utilizing the > metadata, and then we can expand or zoom into in some way. > [image: image.png] > > The benefits I can see is that: > 1. We don't need to deal with the extra complexity of SubDag for execution > and scheduling. It will be the same as not using SubDag. > 2. Still have the benefits of modularized and reusable dag code and > declare dependencies between them. And with the new SubDagOperator (see AIP > or draft PR), we can use the same dag_factory function for generating 1 > dag, a lot of dynamic dags, or used for SubDag (in this case, it will just > extract all underlying tasks and append to the root dag). > > - Then it gets to the idea of replacing subdag with a simpler concept > by Ash: the proposed change basically drains out the contents of a SubDag > and becomes more like ExtractSubdagTasksAndAppendToRootdagOperator (forgive > me about the crazy name..). In this case, it is still necessary to keep the > concept of subdag as it is nothing more than a name? > > That's why the TaskGroup idea comes up. Thanks Chris Palmer for helping > conceptualize the functionality of TaskGroup, I will just paste it here. > > > - Tasks can be added to a TaskGroup > > - You *can* have dependencies between Tasks in the same TaskGroup, but > > *cannot* have dependencies between a Task in a TaskGroup and either a > > Task in a different TaskGroup or a Task not in any group > > - You *can* have dependencies between a TaskGroup and either other > > TaskGroups or Tasks not in any group > > - The UI will by default render a TaskGroup as a single "object", but > > which you expand or zoom into in some way > > - You'd need some way to determine what the "status" of a TaskGroup was > > at least for UI display purposes > > I agree with Chris: > - From the backend's view (scheduler & executor), I think TaskGroup should > be ignored during execution. (unless we decide to implement some metadata > operations that allows start/stop a group of tasks etc.) > - From the UI's View, it should be able to pick up the individual tasks' > status and then determine the TaskGroup's status > > Bin > > On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman < > daniel.imber...@gmail.com> wrote: > >> I hadn’t thought about using the `>>` operator to tie dags together but I >> think that sounds pretty great! I wonder if we could essentially write in >> the ability to set dependencies to all starter-tasks for that DAG. >> >> I’m personally ok with SubDag being a mostly UI concept. It doesn’t need >> to execute separately, you’re just adding more tasks to the queue that will >> be executed when there are resources available. >> >> via Newton Mail [ >> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 >> ] >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <ch...@crpalmer.com> wrote: >> I agree that SubDAGs are an overly complex abstraction. I think what is >> needed/useful is a TaskGroup concept. On a high level I think you want >> this >> functionality: >> >> - Tasks can be added to a TaskGroup >> - You *can* have dependencies between Tasks in the same TaskGroup, but >> *cannot* have dependencies between a Task in a TaskGroup and either a >> Task in a different TaskGroup or a Task not in any group >> - You *can* have dependencies between a TaskGroup and either other >> TaskGroups or Tasks not in any group >> - The UI will by default render a TaskGroup as a single "object", but >> which you expand or zoom into in some way >> - You'd need some way to determine what the "status" of a TaskGroup was >> at least for UI display purposes >> >> Not sure if it would need to be a top level object with its own database >> table and model or just another attribute on tasks. I think you could >> build >> it in a way such that from the schedulers point of view a DAG with >> TaskGroups doesn't get treated any differently. So it really just becomes >> a >> shortcut for setting dependencies between sets of Tasks, and allows the UI >> to simplify the render of the DAG structure. >> >> Chris >> >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov <ddavy...@twitter.com.invalid >> > >> wrote: >> >> > Agree with James (and think it's actually the more important issue to >> fix), >> > but I am still convinced Ash' idea is the right way forward (just it >> might >> > require a bit more work to deprecate than adding visual grouping in the >> > UI). >> > >> > There was a previous thread about this FYI with more context on why >> subdags >> > are bad and potential solutions: >> > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html . A >> > solution I outline there to Jame's problem is e.g. enabling the >> >> operator >> > for Airflow operators to work with DAGs as well. I see this being >> separate >> > from Ash' solution for DAG grouping in the UI but one of the two items >> > required to replace all existing subdag functionality. >> > >> > I've been working with subdags for 3 years and they are always a giant >> pain >> > to use. They are a constant source of user confusion and breakages >> during >> > upgrades. Would love to see them gone :). >> > >> > On Fri, Jun 12, 2020 at 11:11 AM James Coder <jcode...@gmail.com> >> wrote: >> > >> > > I'm not sure I totally agree it's just a UI concept. I use the subdag >> > > operator to simplify dependencies too. If you have a group of tasks >> that >> > > need to finish before another group of tasks start, using a subdag is >> a >> > > pretty quick way to set those dependencies and I think also make it >> > easier >> > > to follow the dag code. >> > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <hamlin...@gmail.com> >> wrote: >> > > >> > > > I second Ash’s grouping concept. >> > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash Berlin-Taylor <a...@apache.org> >> > > wrote: >> > > > >> > > > > Question: >> > > > > >> > > > > Do we even need the SubDagOperator anymore? >> > > > > >> > > > > Would removing it entirely and just replacing it with a UI >> grouping >> > > > > concept be conceptually simpler, less to get wrong, and closer to >> > what >> > > > > users actually want to achieve with subdags? >> > > > > >> > > > > With your proposed change, tasks in subdags could start running in >> > > > > parallel (a good change) -- so should we not also just _enitrely_ >> > > remove >> > > > > the concept of a sub dag and replace it with something simpler. >> > > > > >> > > > > Problems with subdags (I think. I haven't used them extensively so >> > may >> > > > > be wrong on some of these): >> > > > > - They need their own dag_id, but it has(?) to be of the form >> > > > > `parent_dag_id.subdag_id`. >> > > > > - They need their own schedule_interval, but it has to match the >> > parent >> > > > dag >> > > > > - Sub dags can be paused on their own. (Does it make sense to do >> > this? >> > > > > Pausing just a sub dag would mean the sub dag would never >> execute, so >> > > > > the SubDagOperator would fail too. >> > > > > - You had to choose the executor to operator a subdag with -- >> always >> > a >> > > > > bit of a kludge. >> > > > > >> > > > > Thoughts? >> > > > > >> > > > > -ash >> > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash Berlin-Taylor <a...@apache.org> >> > wrote: >> > > > > >> > > > > > Workon sub-dags is much needed, I'm excited to see how this >> > > progresses. >> > > > > > >> > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*: This rewrites the >> > > > > *DagBag.bag_dag* >> > > > > >> method to unpack subdag while parsing, and it will give a flat >> > > > > >> structure at >> > > > > >> the task level >> > > > > > >> > > > > > The serialized_dag representation already does this I think. At >> > least >> > > > if >> > > > > > I've understood your idea here correctly. >> > > > > > >> > > > > > -ash >> > > > > > >> > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin Huang <bin.huan...@gmail.com >> > >> > > > wrote: >> > > > > > >> > > > > >> Hi everyone, >> > > > > >> >> > > > > >> Sending a message to everyone and collect feedback on the >> AIP-34 >> > on >> > > > > >> rewriting SubDagOperator. This was previously briefly >> mentioned in >> > > the >> > > > > >> discussion about what needs to be done for Airflow 2.0, and >> one of >> > > the >> > > > > >> ideas is to make SubDagOperator attach tasks back to the root >> DAG. >> > > > > >> >> > > > > >> This AIP-34 focuses on solving SubDagOperator related issues by >> > > > > reattaching >> > > > > >> all tasks back to the root dag while respecting dependencies >> > during >> > > > > >> parsing. The original grouping effect on the UI will be >> achieved >> > > > through >> > > > > >> grouping related tasks by metadata. >> > > > > >> >> > > > > >> This also makes the dag_factory function more reusable because >> you >> > > > don't >> > > > > >> need to have parent_dag_name and child_dag_name in the function >> > > > > signature >> > > > > >> anymore. >> > > > > >> >> > > > > >> Changes proposed: >> > > > > >> >> > > > > >> - *Unpack SubDags during dag parsing*: This rewrites the >> > > > > *DagBag.bag_dag* >> > > > > >> method to unpack subdag while parsing, and it will give a flat >> > > > > >> structure at >> > > > > >> the task level >> > > > > >> - *Simplify SubDagOperator*: The new SubDagOperator acts like a >> > > > > >> container and most of the original methods are removed. The >> > > > > >> signature is >> > > > > >> also changed to *subdag_factory *with *subdag_args *and >> > > > > *subdag_kwargs*. >> > > > > >> This is similar to the PythonOperator signature. >> > > > > >> - *Add a TaskGroup model and add current_group & parent_group >> > > > > attributes >> > > > > >> to BaseOperator*: This metadata is used to group tasks for >> > > > > >> rendering at >> > > > > >> UI level. It may potentially extend further to group arbitrary >> > > tasks >> > > > > >> outside the context of subdag to allow group-level operations >> > > (i.e. >> > > > > >> stop/trigger a group of task within the dag) >> > > > > >> - *Webserver UI for SubDag*: Proposed UI modification to allow >> > > > > >> (un)collapse a group of tasks for a flat structure to pair with >> > > the >> > > > > first >> > > > > >> change instead of the original hierarchical structure. >> > > > > >> >> > > > > >> >> > > > > >> Please see related documents and PRs for details: >> > > > > >> AIP: >> > > > > >> >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator >> > > > > >> >> > > > > >> Original Issue: https://github.com/apache/airflow/issues/8078 >> > > > > >> Draft PR: https://github.com/apache/airflow/pull/9243 >> > > > > >> >> > > > > >> Please let me know if there are any aspects that you >> > agree/disagree >> > > > > >> with or >> > > > > >> need more clarification (especially the third change regarding >> > > > > TaskGroup). >> > > > > >> Any comments are welcome and I am looking forward to it! >> > > > > >> >> > > > > >> Cheers >> > > > > >> Bin >> > > > > >> >> > > > > >> > > > -- >> > > > Kyle Hamlin >> > > > >> > > >> > > > -- Thanks & Regards Poornima