-1 on deprecating subdags, because of the extra level of abstraction
some of you already mentioned.
We also use subdags in production.
For example in cases where we get json data from an API but since we
mostly need it to be in csv format we have a subdag like
/specific_ap//i_specific_endpoint/_to_s3 that has two tasks one for
retrieving data from the API and loading it to s3 and one for
transforming it into csv.
This has the advantage that you don't need to think of how you transform
the json to csv. In our case Data Analyst don't want to think about
that. They want to work with tabular data.
We also have subdags that are handling API cursoring/pagination (by
using xcom) and merging these multiple API response data into one file.
So you call one Task, a subdag operator with this subdag and get only
what you really need - the data.
I really like subdags and I am for improving / maybe redesigning or
reimplementing of subdags.
-feluelle
Am 13/04/2019 um 07:52 schrieb Chao-Han Tsai:
+1 on keeping it.
I think we should keep the SubDags as it provides a good abstraction layer.
It just need some love from us to fix the underlying
performance/reliability issues.
On Fri, Apr 12, 2019 at 12:06 PM Ash Berlin-Taylor <[email protected]> wrote:
This is what I was thinking - the dag collector in the scheduler should
"just" be able to collect the tasks for subdags up to the parent dag. I'd
possibly go as far as saying no DagRun object for subdags too.
(Yes, "just" will never be that simple).
-a
On 12 April 2019 18:37:24 BST, Bolke de Bruin <[email protected]> wrote:
+1
Sub dags should be fixed within the scheduler and run normally.
On 12 April 2019 at 19:36:27, Feng Lu ([email protected])
wrote:
Agree with others who think SubDag should stay, we should fix the
SubDag
implementation but not remove the abstraction itself.
On Fri, Apr 12, 2019 at 8:42 AM Chen Tong <[email protected]> wrote:
Is it possible to re-implement it in the view-level, not in operator
level?
And this operator is just define a different view in GUI, that these
tasks
will be collapsed into another view.
On Fri, Apr 12, 2019 at 11:31 AM James Meickle
<[email protected]> wrote:
I have avoided using them because of outstanding issues like the
open
JIRA
issues I linked above, or similar issues that I've read about on
blog
posts. If it were just GUI or UX issues I'd use them, but many
people
have
reported issues which affect concurrency/stability, consistency, or
correctness of results. I believe that it's working for you, but
for
me,
it's not worth the risk to build using them in my environment (even
though
they could be handy for many of our workflows).
On Fri, Apr 12, 2019 at 11:18 AM Kaxil Naik <[email protected]>
wrote:
I have been using SubDags in production and haven't had much
problem
with
it.
Can you list the issues you had?
Regards,
Kaxil
On Fri, Apr 12, 2019, 16:16 James Meickle
<[email protected]
.invalid>
wrote:
Given their bad reputation, would it be appropriate to
deprecate
subDAGs
now to advertise that they're no longer considered a suitable
implementation? If a new and better implementation is created,
would
it
even be similar enough to subDAGs that we'd want to continue to
call
the
feature that?
They feel like a "new Airflow user trap" right now - I have had
to
tell
my
team never to use them, because they seem appealing and are in
the
official
docs.
On Fri, Apr 12, 2019 at 10:51 AM Ash Berlin-Taylor
<[email protected]>
wrote:
I'd like to find time to fix subdags as they do provide a
useful
abstraction - but I agree right now they aren't great (I
avoid
them
because
of this)
I have half thoughts of how to it should work, I just need to
look
at
the
code in depth to see if that makes sense. Now 1.10.3 is out I
might
have
a
bit more time to do this.
-ash
On 12 Apr 2019, at 15:48, James Meickle
<[email protected]
.INVALID>
wrote:
I think we should deprecate SubDAGs given the complexity
they
add
and
the
limited usage and use cases. Or, we should invest effort in
redesigning
their API and implementation. I think that having to
account
for
subdag-introduced complexity makes Airflow's code much
harder
to
maintain
and buggier, looking at how many open issues there are that
reference
subdags (and how unrelated in topic they are):
https://issues.apache.org/jira/browse/AIRFLOW-3292?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22subdag%22