I don't think fixing subdags to run in the scheduler is enough, although it's a huge improvement over the current implementation (especially the part that lets Subdags specify custom executors). From my experience with Subdags, I think what makes more sense is adding various operators to allow combining regular DAGs.
Here are a some other issues with Subdags off the top of my head: - Confusing/separate UI and clearing/running semantics (e.g. tasks in the Subdag will not get scheduled if you clear them but not the parent operator) - Nested Subdags are hard to work with in the UI (and IIRC don't behave correctly but I might be wrong on this). - The abstraction is confusing, e.g. looking at the log for the SubdagOperator task can be a bit confusing as - Tons of custom special-case logic in the Airflow code and schemas in the DB to handle Subdags which have led to a lot of complexity and a constant source of tricky bugs and upgrade issues - Additional abstraction that users have to learn An alternative would be allowing combining DAGs, e.g. something like: dag = DAG() dag_task1 = Op(dag = dag) dag_task2 = Op(dag = dag) subdag = DAG() subdag_task1 = Op(dag = subdag) subdag_task2 = Op(dag = subdag) dag_task1 >> subdag >> dag_task2 # Results in the following topology: # ,-> subdag_task1 ---v # dag_task1 dag_task2 # '-> subdag_task2----^ This is also a lot more easily composable than subdags, and provides a more powerful abstraction, e.g. you don't need additional boilerplate to create subdags such as setting up an operator. On Sat, Apr 13, 2019 at 12:14 PM Felix Uellendall <felix.uellend...@gmx.de> wrote: > -1 on deprecating subdags, because of the extra level of abstraction > some of you already mentioned. > > We also use subdags in production. > > For example in cases where we get json data from an API but since we > mostly need it to be in csv format we have a subdag like > /specific_ap//i_specific_endpoint/_to_s3 that has two tasks one for > retrieving data from the API and loading it to s3 and one for > transforming it into csv. > This has the advantage that you don't need to think of how you transform > the json to csv. In our case Data Analyst don't want to think about > that. They want to work with tabular data. > > We also have subdags that are handling API cursoring/pagination (by > using xcom) and merging these multiple API response data into one file. > So you call one Task, a subdag operator with this subdag and get only > what you really need - the data. > > I really like subdags and I am for improving / maybe redesigning or > reimplementing of subdags. > > -feluelle > > Am 13/04/2019 um 07:52 schrieb Chao-Han Tsai: > > +1 on keeping it. > > > > I think we should keep the SubDags as it provides a good abstraction > layer. > > It just need some love from us to fix the underlying > > performance/reliability issues. > > > > On Fri, Apr 12, 2019 at 12:06 PM Ash Berlin-Taylor <a...@apache.org> > wrote: > > > >> This is what I was thinking - the dag collector in the scheduler should > >> "just" be able to collect the tasks for subdags up to the parent dag. > I'd > >> possibly go as far as saying no DagRun object for subdags too. > >> > >> (Yes, "just" will never be that simple). > >> > >> -a > >> > >> On 12 April 2019 18:37:24 BST, Bolke de Bruin <bdbr...@gmail.com> > wrote: > >>> +1 > >>> > >>> Sub dags should be fixed within the scheduler and run normally. > >>> > >>> > >>> > >>> > >>> On 12 April 2019 at 19:36:27, Feng Lu (fen...@google.com.invalid) > >>> wrote: > >>> > >>> Agree with others who think SubDag should stay, we should fix the > >>> SubDag > >>> implementation but not remove the abstraction itself. > >>> > >>> On Fri, Apr 12, 2019 at 8:42 AM Chen Tong <cix...@gmail.com> wrote: > >>> > >>>> Is it possible to re-implement it in the view-level, not in operator > >>> level? > >>>> And this operator is just define a different view in GUI, that these > >>> tasks > >>>> will be collapsed into another view. > >>>> > >>>> On Fri, Apr 12, 2019 at 11:31 AM James Meickle > >>>> <jmeic...@quantopian.com.invalid> wrote: > >>>> > >>>>> I have avoided using them because of outstanding issues like the > >>> open > >>>> JIRA > >>>>> issues I linked above, or similar issues that I've read about on > >>> blog > >>>>> posts. If it were just GUI or UX issues I'd use them, but many > >>> people > >>>> have > >>>>> reported issues which affect concurrency/stability, consistency, or > >>>>> correctness of results. I believe that it's working for you, but > >>> for > >>> me, > >>>>> it's not worth the risk to build using them in my environment (even > >>>> though > >>>>> they could be handy for many of our workflows). > >>>>> > >>>>> On Fri, Apr 12, 2019 at 11:18 AM Kaxil Naik <kaxiln...@gmail.com> > >>> wrote: > >>>>>> I have been using SubDags in production and haven't had much > >>> problem > >>>> with > >>>>>> it. > >>>>>> > >>>>>> Can you list the issues you had? > >>>>>> > >>>>>> Regards, > >>>>>> Kaxil > >>>>>> > >>>>>> > >>>>>> On Fri, Apr 12, 2019, 16:16 James Meickle > >>> <jmeic...@quantopian.com > >>>>>> .invalid> > >>>>>> wrote: > >>>>>> > >>>>>>> Given their bad reputation, would it be appropriate to > >>> deprecate > >>>>> subDAGs > >>>>>>> now to advertise that they're no longer considered a suitable > >>>>>>> implementation? If a new and better implementation is created, > >>> would > >>>> it > >>>>>>> even be similar enough to subDAGs that we'd want to continue to > >>> call > >>>>> the > >>>>>>> feature that? > >>>>>>> > >>>>>>> They feel like a "new Airflow user trap" right now - I have had > >>> to > >>>> tell > >>>>>> my > >>>>>>> team never to use them, because they seem appealing and are in > >>> the > >>>>>> official > >>>>>>> docs. > >>>>>>> > >>>>>>> On Fri, Apr 12, 2019 at 10:51 AM Ash Berlin-Taylor > >>> <a...@apache.org> > >>>>>> wrote: > >>>>>>>> I'd like to find time to fix subdags as they do provide a > >>> useful > >>>>>>>> abstraction - but I agree right now they aren't great (I > >>> avoid > >>> them > >>>>>>> because > >>>>>>>> of this) > >>>>>>>> > >>>>>>>> I have half thoughts of how to it should work, I just need to > >>> look > >>>> at > >>>>>> the > >>>>>>>> code in depth to see if that makes sense. Now 1.10.3 is out I > >>> might > >>>>>> have > >>>>>>> a > >>>>>>>> bit more time to do this. > >>>>>>>> > >>>>>>>> -ash > >>>>>>>> > >>>>>>>>> On 12 Apr 2019, at 15:48, James Meickle > >>> <jmeic...@quantopian.com > >>>>>>> .INVALID> > >>>>>>>> wrote: > >>>>>>>>> I think we should deprecate SubDAGs given the complexity > >>> they > >>> add > >>>>> and > >>>>>>> the > >>>>>>>>> limited usage and use cases. Or, we should invest effort in > >>>>>> redesigning > >>>>>>>>> their API and implementation. I think that having to > >>> account > >>> for > >>>>>>>>> subdag-introduced complexity makes Airflow's code much > >>> harder > >>> to > >>>>>>> maintain > >>>>>>>>> and buggier, looking at how many open issues there are that > >>>>> reference > >>>>>>>>> subdags (and how unrelated in topic they are): > >>>>>>>>> > >> > https://issues.apache.org/jira/browse/AIRFLOW-3292?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22subdag%22 > >>>>>>>> > > >