Just to add to the discussion - a discussion raised today https://github.com/apache/airflow/discussions/38311 where the user is sure that they can use operators in such a way as described above, and even used the term "nested operator".
I think getting https://github.com/apache/airflow/pull/37937 in will be a good way in the future to prevent this misunderstanding, but maybe there is something to think about - in the "Operators need to die" context by Bolke. BTW. I have a hypothesis why those questions started to appear frequently and people being reasonably sure they can do it. It's a pure speculation (and I asked the user this time to explain) but some of that might be fuelled by Chat GPT hallucinating about Airflow being able to do it. I saw similar hallucinations before - where people suggested some (completely wrong like that) solution to their problem and only after inquiry, they admitted that it was a solution that ChatGPT gave them I wonder if we have even more of those soon. J. On Sun, Mar 10, 2024 at 9:29 AM Elad Kalif <elad...@apache.org> wrote: > The issue here is not just about decorators it happens also with regular > operators (operator inside operator) and also with operator inside > on_x_callback > > For example: > > https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/ > > https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/ > > > > > I can't see which problem is solved by allowing running one operator > inside another. > > From the user's perspective, they have an operator that knows how to do > something and it's very easy to use. So they want to leverage that. > For example send Slack message: > > slack_operator_post_text = SlackAPIPostOperator( > task_id="slack_post_text", > channel=SLACK_CHANNEL, > text=("My message"), > ) > > It handles everything. Now if you want to send a Slack message from a > PythonOperator you need to initialize a hook, find the right function to > invoke etc. > Thus from the user perspective - There is already a class that does all > that. Why can't it just work? Why do they need to "reimplement" the > operator logic? (most of the time it will be copy paste the logic of the > execute function) > > So, the problem they are trying to solve is to avoid code duplication and > ease of use. > > Jarek - I think your solution focuses more on the templating side but I > think the actual problem is not limited to the templating. > I think the problem is more of "I know there is an operator that does X, so > I will just use it inside the python function I invoke from the python > operator" - regardless of whether Jinja/templating becomes an issue or not. > > On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > I see that we have already (thanks David!) a PR: > > https://github.com/apache/airflow/pull/37937 to forbid this use (which > is > > cool and I am glad my discussion had some ripple effect :D ). > > > > I am quite happy to get this one merged once it passes tests/reviews, > but I > > would still want to explore future departure / options we might have, > maybe > > there will be another - long term - ripple effect :). I thought a bit > more > > about - possibly - different reasons why this pattern we observe is > > emerging and I have a theory. > > > > To Andrey's comments: > > > > > I can't see which problem is solved by allowing running one operator > > inside another. > > > > For me, the main problem to solve is that using Hooks in the way I > > described in > > > > > https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2 > > in 2022 are almost non-discoverable by significant percentage of users. > > Especially those kinds of users that mostly treat Airflow Operators as > > black-box and **just** discovered task flow as a way that they can do > > simple things in Python - but they are not into writing their own custom > > operators, nor look at the operator's code. Generally they don't really > see > > DAG authoring as writing Python Code, it's mostly about using a little > > weird DSL to build their DAGs. Mostly copy&pasting some constructs that > > look like putting together existing building blocks and using patterns > like > > `>>` to add dependencies. > > > > Yes I try to be empathetic and try to guess how such users think about > DAG > > authoring - I might be wrong, but this is what I see as a recurring > > pattern. > > > > So in this context - @task is not Python code writing, it's yet another > DSL > > that people see as appealing. And the case (Here I just speculate - so I > > might be entirely wrong) I **think** the original pattern I posted above > > solve is that people think that they can slightly improve the flexibility > > of the operators by adding a bit of simple code before when they need a > bit > > more flexibility and JINJA is not enough. Basically replacing this > > > > operator = AnOperator(with_param='{{ here I want some dynamicness }}') > > > > with: > > > > @task > > def my_task(): > > calculated_param = calculate_the_param() # do something more complex > > that is difficult to do with JINJA expression > > operator = AnOperator(with_param=calculated_param) > > operator.execute() > > > > And I **think** the main issue to solve here is how to make it a bit more > > flexible to get parameters of operators pre-calculated **just** before > the > > execute() method > > > > This is speculation of course - and there might be different motivations > - > > but I think addressing this need better - might be actually solving the > > problem (combined with David's PR). If we find a way to pass more complex > > calculations to parameters of operators? > > > > So MAYBE (just maybe) we could do something like that (conceptual - name > > might be different) > > > > > > > > > operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param)) > > > > And let the user use a callable there: > > > > def calculate_the_param(context: dict) -> Any > > > > I **think** we could extend our "rendering JINJA template" to handle this > > special case for templated parameters. Plus, it would nicely solve the > > "render_as_native" problem - because that method could return the > expected > > object rather than string (and every parameter could have its own > > method.... > > > > Maybe that would be a good solution ? > > > > J. > > > > > > > > > > > > > > On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > One wrinkle to the have cake and eat it too approach is deferrable > > > operators. It doesn't seem it would be very practical to resume back > into > > > the operator that is nested inside a taskflow function. One solution > > would > > > be to run the trigger in process like we currently do with > `dag.test()`. > > > That would make it non-deferrable in effect. But at least it would run > > > properly. There may be other better solutions. > > > > > >