Agree with James (and think it's actually the more important issue to fix), but I am still convinced Ash' idea is the right way forward (just it might require a bit more work to deprecate than adding visual grouping in the UI).
There was a previous thread about this FYI with more context on why subdags are bad and potential solutions: https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html . A solution I outline there to Jame's problem is e.g. enabling the >> operator for Airflow operators to work with DAGs as well. I see this being separate from Ash' solution for DAG grouping in the UI but one of the two items required to replace all existing subdag functionality. I've been working with subdags for 3 years and they are always a giant pain to use. They are a constant source of user confusion and breakages during upgrades. Would love to see them gone :). On Fri, Jun 12, 2020 at 11:11 AM James Coder <jcode...@gmail.com> wrote: > I'm not sure I totally agree it's just a UI concept. I use the subdag > operator to simplify dependencies too. If you have a group of tasks that > need to finish before another group of tasks start, using a subdag is a > pretty quick way to set those dependencies and I think also make it easier > to follow the dag code. > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <hamlin...@gmail.com> wrote: > > > I second Ash’s grouping concept. > > > > On Fri, Jun 12, 2020 at 5:10 AM Ash Berlin-Taylor <a...@apache.org> > wrote: > > > > > Question: > > > > > > Do we even need the SubDagOperator anymore? > > > > > > Would removing it entirely and just replacing it with a UI grouping > > > concept be conceptually simpler, less to get wrong, and closer to what > > > users actually want to achieve with subdags? > > > > > > With your proposed change, tasks in subdags could start running in > > > parallel (a good change) -- so should we not also just _enitrely_ > remove > > > the concept of a sub dag and replace it with something simpler. > > > > > > Problems with subdags (I think. I haven't used them extensively so may > > > be wrong on some of these): > > > - They need their own dag_id, but it has(?) to be of the form > > > `parent_dag_id.subdag_id`. > > > - They need their own schedule_interval, but it has to match the parent > > dag > > > - Sub dags can be paused on their own. (Does it make sense to do this? > > > Pausing just a sub dag would mean the sub dag would never execute, so > > > the SubDagOperator would fail too. > > > - You had to choose the executor to operator a subdag with -- always a > > > bit of a kludge. > > > > > > Thoughts? > > > > > > -ash > > > > > > On Jun 12 2020, at 12:01 pm, Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > > > Workon sub-dags is much needed, I'm excited to see how this > progresses. > > > > > > > > > > > >> - *Unpack SubDags during dag parsing*: This rewrites the > > > *DagBag.bag_dag* > > > >> method to unpack subdag while parsing, and it will give a flat > > > >> structure at > > > >> the task level > > > > > > > > The serialized_dag representation already does this I think. At least > > if > > > > I've understood your idea here correctly. > > > > > > > > -ash > > > > > > > > > > > > On Jun 12 2020, at 9:51 am, Xinbin Huang <bin.huan...@gmail.com> > > wrote: > > > > > > > >> Hi everyone, > > > >> > > > >> Sending a message to everyone and collect feedback on the AIP-34 on > > > >> rewriting SubDagOperator. This was previously briefly mentioned in > the > > > >> discussion about what needs to be done for Airflow 2.0, and one of > the > > > >> ideas is to make SubDagOperator attach tasks back to the root DAG. > > > >> > > > >> This AIP-34 focuses on solving SubDagOperator related issues by > > > reattaching > > > >> all tasks back to the root dag while respecting dependencies during > > > >> parsing. The original grouping effect on the UI will be achieved > > through > > > >> grouping related tasks by metadata. > > > >> > > > >> This also makes the dag_factory function more reusable because you > > don't > > > >> need to have parent_dag_name and child_dag_name in the function > > > signature > > > >> anymore. > > > >> > > > >> Changes proposed: > > > >> > > > >> - *Unpack SubDags during dag parsing*: This rewrites the > > > *DagBag.bag_dag* > > > >> method to unpack subdag while parsing, and it will give a flat > > > >> structure at > > > >> the task level > > > >> - *Simplify SubDagOperator*: The new SubDagOperator acts like a > > > >> container and most of the original methods are removed. The > > > >> signature is > > > >> also changed to *subdag_factory *with *subdag_args *and > > > *subdag_kwargs*. > > > >> This is similar to the PythonOperator signature. > > > >> - *Add a TaskGroup model and add current_group & parent_group > > > attributes > > > >> to BaseOperator*: This metadata is used to group tasks for > > > >> rendering at > > > >> UI level. It may potentially extend further to group arbitrary > tasks > > > >> outside the context of subdag to allow group-level operations > (i.e. > > > >> stop/trigger a group of task within the dag) > > > >> - *Webserver UI for SubDag*: Proposed UI modification to allow > > > >> (un)collapse a group of tasks for a flat structure to pair with > the > > > first > > > >> change instead of the original hierarchical structure. > > > >> > > > >> > > > >> Please see related documents and PRs for details: > > > >> AIP: > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator > > > >> > > > >> Original Issue: https://github.com/apache/airflow/issues/8078 > > > >> Draft PR: https://github.com/apache/airflow/pull/9243 > > > >> > > > >> Please let me know if there are any aspects that you agree/disagree > > > >> with or > > > >> need more clarification (especially the third change regarding > > > TaskGroup). > > > >> Any comments are welcome and I am looking forward to it! > > > >> > > > >> Cheers > > > >> Bin > > > >> > > > > > -- > > Kyle Hamlin > > >