The reason users are sure they can use operators like that is that it has worked for a long time - hell I even wrote a custom nested operator in the past (pre 2.0 admittedly).
So this pr should only be a warning by default, or a config option to warn but not error Alternatively do we just document it as "this is not recommended, and if it breaks you get to keep both halves"? (I'm not a fan of the runtime enforcement of that, it seems very "heavy" for minimal benefit or protection and limits what adventurous users can do) -a On 20 March 2024 07:05:25 GMT, Jarek Potiuk <ja...@potiuk.com> wrote: >Just to add to the discussion - a discussion raised today >https://github.com/apache/airflow/discussions/38311 where the user is sure >that they can use operators in such a way as described above, and even used >the term "nested operator". > >I think getting https://github.com/apache/airflow/pull/37937 in will be a >good way in the future to prevent this misunderstanding, but maybe there is >something to think about - in the "Operators need to die" context by Bolke. > >BTW. I have a hypothesis why those questions started to appear frequently >and people being reasonably sure they can do it. It's a pure speculation >(and I asked the user this time to explain) but some of that might be >fuelled by Chat GPT hallucinating about Airflow being able to do it. I saw >similar hallucinations before - where people suggested some (completely >wrong like that) solution to their problem and only after inquiry, they >admitted that it was a solution that ChatGPT gave them > >I wonder if we have even more of those soon. > >J. > > >On Sun, Mar 10, 2024 at 9:29 AM Elad Kalif <elad...@apache.org> wrote: > >> The issue here is not just about decorators it happens also with regular >> operators (operator inside operator) and also with operator inside >> on_x_callback >> >> For example: >> >> https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/ >> >> https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/ >> >> >> >> > I can't see which problem is solved by allowing running one operator >> inside another. >> >> From the user's perspective, they have an operator that knows how to do >> something and it's very easy to use. So they want to leverage that. >> For example send Slack message: >> >> slack_operator_post_text = SlackAPIPostOperator( >> task_id="slack_post_text", >> channel=SLACK_CHANNEL, >> text=("My message"), >> ) >> >> It handles everything. Now if you want to send a Slack message from a >> PythonOperator you need to initialize a hook, find the right function to >> invoke etc. >> Thus from the user perspective - There is already a class that does all >> that. Why can't it just work? Why do they need to "reimplement" the >> operator logic? (most of the time it will be copy paste the logic of the >> execute function) >> >> So, the problem they are trying to solve is to avoid code duplication and >> ease of use. >> >> Jarek - I think your solution focuses more on the templating side but I >> think the actual problem is not limited to the templating. >> I think the problem is more of "I know there is an operator that does X, so >> I will just use it inside the python function I invoke from the python >> operator" - regardless of whether Jinja/templating becomes an issue or not. >> >> On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >> > I see that we have already (thanks David!) a PR: >> > https://github.com/apache/airflow/pull/37937 to forbid this use (which >> is >> > cool and I am glad my discussion had some ripple effect :D ). >> > >> > I am quite happy to get this one merged once it passes tests/reviews, >> but I >> > would still want to explore future departure / options we might have, >> maybe >> > there will be another - long term - ripple effect :). I thought a bit >> more >> > about - possibly - different reasons why this pattern we observe is >> > emerging and I have a theory. >> > >> > To Andrey's comments: >> > >> > > I can't see which problem is solved by allowing running one operator >> > inside another. >> > >> > For me, the main problem to solve is that using Hooks in the way I >> > described in >> > >> > >> https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2 >> > in 2022 are almost non-discoverable by significant percentage of users. >> > Especially those kinds of users that mostly treat Airflow Operators as >> > black-box and **just** discovered task flow as a way that they can do >> > simple things in Python - but they are not into writing their own custom >> > operators, nor look at the operator's code. Generally they don't really >> see >> > DAG authoring as writing Python Code, it's mostly about using a little >> > weird DSL to build their DAGs. Mostly copy&pasting some constructs that >> > look like putting together existing building blocks and using patterns >> like >> > `>>` to add dependencies. >> > >> > Yes I try to be empathetic and try to guess how such users think about >> DAG >> > authoring - I might be wrong, but this is what I see as a recurring >> > pattern. >> > >> > So in this context - @task is not Python code writing, it's yet another >> DSL >> > that people see as appealing. And the case (Here I just speculate - so I >> > might be entirely wrong) I **think** the original pattern I posted above >> > solve is that people think that they can slightly improve the flexibility >> > of the operators by adding a bit of simple code before when they need a >> bit >> > more flexibility and JINJA is not enough. Basically replacing this >> > >> > operator = AnOperator(with_param='{{ here I want some dynamicness }}') >> > >> > with: >> > >> > @task >> > def my_task(): >> > calculated_param = calculate_the_param() # do something more complex >> > that is difficult to do with JINJA expression >> > operator = AnOperator(with_param=calculated_param) >> > operator.execute() >> > >> > And I **think** the main issue to solve here is how to make it a bit more >> > flexible to get parameters of operators pre-calculated **just** before >> the >> > execute() method >> > >> > This is speculation of course - and there might be different motivations >> - >> > but I think addressing this need better - might be actually solving the >> > problem (combined with David's PR). If we find a way to pass more complex >> > calculations to parameters of operators? >> > >> > So MAYBE (just maybe) we could do something like that (conceptual - name >> > might be different) >> > >> > >> > >> > >> operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param)) >> > >> > And let the user use a callable there: >> > >> > def calculate_the_param(context: dict) -> Any >> > >> > I **think** we could extend our "rendering JINJA template" to handle this >> > special case for templated parameters. Plus, it would nicely solve the >> > "render_as_native" problem - because that method could return the >> expected >> > object rather than string (and every parameter could have its own >> > method.... >> > >> > Maybe that would be a good solution ? >> > >> > J. >> > >> > >> > >> > >> > >> > >> > On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish >> > <daniel.stand...@astronomer.io.invalid> wrote: >> > >> > > One wrinkle to the have cake and eat it too approach is deferrable >> > > operators. It doesn't seem it would be very practical to resume back >> into >> > > the operator that is nested inside a taskflow function. One solution >> > would >> > > be to run the trigger in process like we currently do with >> `dag.test()`. >> > > That would make it non-deferrable in effect. But at least it would run >> > > properly. There may be other better solutions. >> > > >> > >>