Ok want to pick back up setup / teardown discussion as we get closer to 2.7.
For personal reasons I had to take some time off work just as we were
wrapping up work on 2.6, and at least partly due to that, we punted setup /
teardown to 2.7.
But we've picked it back up and continued along the path outlined in this
thread and things look good for 2.7.
In the time since the previous discussion, not a lot has changed except
that we've implemented in main some of the things discussed.
For one we added a one-liner syntax.
So now if you have an existing dag such as this:
create_cluster >> run_query >> delete_cluster
You can convert it to use setup / teardown with the following:
create_cluster >> run_query >>
delete_cluster.as_teardown(setups=create_cluster)
The context manager suggested by Ash has been added, so if you have setup
`s` and teardown `t`, you can do the following:
with s >> t:
some_work(other_work())
Notably, just as with task group context manager, you must instantiate the
task inside the context in order for it to be registered. Ephraim is
working on some improvements to the context manager to address this (so
that you can explicitly add tasks to the context in that scenario) and so
that we can throw an error on non-sensical usages such as `with a >> b >>
c:`
There is a docs PR outlining the current state of the feature here
<https://github.com/apache/airflow/pull/32169> and you are all welcome to
comment. There's still time to make adjustments if desired.
But I wanted to raise one point for discussion here. It's something I
could have gone either way on, and I want to tap into the wisdom of the
community.
The question of whether a setup with no teardown, or a teardown with no
setup, should be added.
There are two options as I see it.
Option 1: if a setup has no teardown, then all the downstreams are assumed
to be "in scope" for the setup
Note: the notion of "scope" is about the clearing behavior aspect of setup
/ teardown: you clear a task, it clears its setups too. A work task is in
the scope of s and t if it's between s and t.
So then with s1 >> w1 >> w2, clearing work task w1 would clear w1 w2 and s1.
And clearing w2 would clear w2 and s1.
This makes it easy to have a setup with no teardown. But it makes the
behavior slightly more... wiggly... for lack of a better term. With this
example, s1 is a setup for w2. But then suppose you add a teardown after
w1: s1 >> w1 >> [w2, t1.as_teardown(setups=s1)]. Now s1 is not a setup for
w2.
So the logic is ... "a work task is in scope of a setup if it is in between
the setup and its teardown, *or *if the setup has no teardowns".
Option 1: we require that a setup always have a teardown
With this approach, the rule is simpler and it forces unambiguously defined
scopes. If you just want to use a setup and don't need a teardown task,
you add an empty teardown. The rule is then "a work task is in scope of a
setup if it is in between the setup and its teardown" -- i.e. the same rule
minus the `or` clause.
So the example would have to be s1 >> w1 >> w2 >>
EmptyOperator().as_teardown(setups=s1).
Thanks in advance for your engagement.
P.S. note that those of us in the U.S. will be off for holiday soon so may
be slower to respond but just wanted to get this conversation started so
that we can get this feature to a place that people generally feel good
enough about in 2.7.
P.P.S. Random-ish note... we should consider marking some features as beta
I think. There's a difference between beta and experimental, though
historically we've only employed the latter. But I think beta can be good
to signal "we are committed to this feature we just might make a change or
two to the API". Not saying we should do this with setup / teardown
necessarily but it's an option. Sometimes I think it can be a little
overwhelming to put out features with the spectre of being locked in by
backcompat guarantees and perhaps something like a beta feature could be a
way to get things out there a little bit more freely and allow for a
feedback loop.
Thanks again,
Daniel