Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Ash Berlin-Taylor Wed, 20 Mar 2024 01:37:25 -0700

The reason users are sure they can use operators like that is that it has 
worked for a long time - hell I even wrote a custom nested operator in the past 
(pre 2.0 admittedly).


So this pr should only be a warning by default, or a config option to warn but 
not error

Alternatively do we just document it as "this is not recommended, and if it 
breaks you get to keep both halves"? (I'm not a fan of the runtime enforcement 
of that, it seems very "heavy" for minimal benefit or protection and limits 
what adventurous users can do)

-a

On 20 March 2024 07:05:25 GMT, Jarek Potiuk <[email protected]> wrote:
>Just to add to the discussion - a discussion raised today
>https://github.com/apache/airflow/discussions/38311  where the user is sure
>that they can use operators in such a way as described above, and even used
>the term "nested operator".
>
>I think getting  https://github.com/apache/airflow/pull/37937  in will be a
>good way in the future to prevent this misunderstanding, but maybe there is
>something to think about - in the "Operators need to die" context by Bolke.
>
>BTW. I have a hypothesis why those questions started to appear frequently
>and people being reasonably sure they can do it. It's a pure speculation
>(and I asked the user this time to explain) but some of that might be
>fuelled by Chat GPT hallucinating about Airflow being able to do it. I saw
>similar hallucinations before - where people suggested some (completely
>wrong like that) solution to their problem and only after inquiry, they
>admitted that it was a solution that ChatGPT gave them
>
>I wonder if we have even more of those soon.
>
>J.
>
>
>On Sun, Mar 10, 2024 at 9:29 AM Elad Kalif <[email protected]> wrote:
>
>> The issue here is not just about decorators it happens also with regular
>> operators (operator inside operator) and also with operator inside
>> on_x_callback
>>
>> For example:
>>
>> https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/
>>
>> https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/
>>
>>
>>
>> > I can't see which problem is solved by allowing running one operator
>> inside another.
>>
>> From the user's perspective, they have an operator that knows how to do
>> something and it's very easy to use. So they want to leverage that.
>> For example send Slack message:
>>
>> slack_operator_post_text = SlackAPIPostOperator(
>>         task_id="slack_post_text",
>>         channel=SLACK_CHANNEL,
>>         text=("My message"),
>>     )
>>
>> It handles everything. Now if you want to send a Slack message from a
>> PythonOperator you need to initialize a hook, find the right function to
>> invoke etc.
>> Thus from the user perspective - There is already a class that does all
>> that. Why can't it just work? Why do they need to "reimplement" the
>> operator logic? (most of the time it will be copy paste the logic of the
>> execute function)
>>
>> So, the problem they are trying to solve is to avoid code duplication and
>> ease of use.
>>
>> Jarek - I think your solution focuses more on the templating side but I
>> think the actual problem is not limited to the templating.
>> I think the problem is more of "I know there is an operator that does X, so
>> I will just use it inside the python function I invoke from the python
>> operator" - regardless of whether Jinja/templating becomes an issue or not.
>>
>> On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <[email protected]> wrote:
>>
>> > I see that we have already (thanks David!) a PR:
>> > https://github.com/apache/airflow/pull/37937 to forbid this use (which
>> is
>> > cool and I am glad my discussion had some ripple effect :D ).
>> >
>> > I am quite happy to get this one merged once it passes tests/reviews,
>> but I
>> > would still want to explore future departure / options we might have,
>> maybe
>> > there will be another - long term - ripple effect :). I thought a bit
>> more
>> > about  - possibly - different reasons why this pattern we observe is
>> > emerging and I have a theory.
>> >
>> > To Andrey's comments:
>> >
>> > > I can't see which problem is solved by allowing running one operator
>> > inside another.
>> >
>> > For me, the main problem to solve is that using Hooks in the way I
>> > described in
>> >
>> >
>> https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2
>> > in 2022 are almost non-discoverable by significant percentage of users.
>> > Especially those kinds of users that mostly treat Airflow Operators as
>> > black-box and **just** discovered task flow as a way that they can do
>> > simple things in Python - but they are not into writing their own custom
>> > operators, nor look at the operator's code. Generally they don't really
>> see
>> > DAG authoring as writing Python Code, it's mostly about using a little
>> > weird DSL to build their DAGs. Mostly copy&pasting some constructs that
>> > look like putting together existing building blocks and using patterns
>> like
>> > `>>` to add dependencies.
>> >
>> > Yes I try to be empathetic and try to guess how such users think about
>> DAG
>> > authoring - I might be wrong, but this is what I see as a recurring
>> > pattern.
>> >
>> > So in this context - @task is not Python code writing, it's yet another
>> DSL
>> > that people see as appealing. And the case (Here I just speculate - so I
>> > might be entirely wrong) I **think** the original pattern I posted above
>> > solve is that people think that they can slightly improve the flexibility
>> > of the operators by adding a bit of simple code before when they need a
>> bit
>> > more flexibility and JINJA is not enough. Basically replacing this
>> >
>> > operator = AnOperator(with_param='{{ here I want some dynamicness }}')
>> >
>> > with:
>> >
>> > @task
>> > def my_task():
>> >     calculated_param = calculate_the_param()  # do something more complex
>> > that is difficult to do with JINJA expression
>> >     operator = AnOperator(with_param=calculated_param)
>> >     operator.execute()
>> >
>> > And I **think** the main issue to solve here is how to make it a bit more
>> > flexible to get parameters of operators pre-calculated **just** before
>> the
>> > execute() method
>> >
>> > This is speculation of course - and there might be different motivations
>> -
>> > but I think addressing this need better - might be actually solving the
>> > problem (combined with David's PR). If we find a way to pass more complex
>> > calculations to parameters of operators?
>> >
>> > So MAYBE (just maybe) we could do something like that (conceptual - name
>> > might be different)
>> >
>> >
>> >
>> >
>> operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param))
>> >
>> > And let the user use a callable there:
>> >
>> > def calculate_the_param(context: dict) -> Any
>> >
>> > I **think** we could extend our "rendering JINJA template" to handle this
>> > special case for templated parameters. Plus, it would nicely solve the
>> > "render_as_native" problem - because that method could return the
>> expected
>> > object rather than string (and every parameter could have its own
>> > method....
>> >
>> > Maybe that would be a good solution ?
>> >
>> > J.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish
>> > <[email protected]> wrote:
>> >
>> > > One wrinkle to the have cake and eat it too approach is deferrable
>> > > operators. It doesn't seem it would be very practical to resume back
>> into
>> > > the operator that is nested inside a taskflow function.  One solution
>> > would
>> > > be to run the trigger in process like we currently do with
>> `dag.test()`.
>> > > That would make it non-deferrable in effect.  But at least it would run
>> > > properly.  There may be other better solutions.
>> > >
>> >
>>

Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Reply via email to