Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Andrey Anshin Wed, 20 Mar 2024 08:46:40 -0700

> It handles everything. Now if you want to send a Slack message from a 
> PythonOperator
you need to initialize a hook, find the right function to invoke etc.


> And to Elad point " "I know there is an operator that does X, so I will
just use it inside the python function I invoke from the python.operator".

That sounds like we do a bad job on documented Hooks, some of them, e.g.
SlackHook have more features rather than provided by the operator.

For me it still looks like an attempt to allow Operators to execute methods
outside of the worker. It is trying to solve one problem by another in a
very tricky and dangerous zone.
And there is a chance to reimplement SubDag in its worst scenario.

>     calculated_param = calculate_the_param()  # do something more complex that
is difficult to do with JINJA expression

We have airflow.utils.mixins.ResolveMixin (
https://github.com/apache/airflow/blob/1726b9372b2c00b94475e087bbaff3073e958c49/airflow/template/templater.py#L170-L171
)
and there is no problem to implement something which we might calli
EvaluateCallable aka python operator for resolve one templated field

> So this pr should only be a warning by default, or a config option to
warn but not error

I agree with it, it should warn by default, with option to suppress/disable
if users know what they do and option to raise an error in case if want to
prevent execution


On Wed, 20 Mar 2024 at 13:01, Jarek Potiuk <[email protected]> wrote:

> I am really torn on that one to be honest.
>
> I am OK with the error (with the note that it will likely break a lot of
> workflows), I am ok with the warning as well as a softer way of letting the
> user know they are doing it wrong).
> But ultimately, I'd really want we (re) consider if we cannot make it into
> a "working" solution.
>
> It should be possible IMHO to allow this usage with clever instrumentation
> - the code that we have now in Dave's PR is already detecting most (all
> that Elad mentioned as well including regular operators within regular
> operators I think Dave?) cases like that.
>
> It's really **only**  a matter of a) setting task_id to be parent's task if
> b) getting context from parent operator c) running pre-processing of
> templated fields (and in this case it can be done in the constructor of
> such a nested operator - because we already have context.
>
> This all seems doable from the technical point of view.
>
> We do not even have to handle pushing the return value to Xcom (that would
> be tricky with potentially returning value from the "upper" operator - but
> we can add a warning about that one if it happens and such "nested"
> operator has do_xcom_push and returns value from execute) and it should
> just run "as expected".
>
> I can't think of a big harm done this way to be honest - and it would make
> life of our users way easier (and also live of those who happen to look at
> the issues and discussion and attempt to help the users - because those
> kinds of discussions and issues would not at all appear if this case will
> **just work**(TM).
>
> I think this is the case where "perceived correctness" trumps "harm done".
> Unless of course I am missing some side effect here - which might well be I
> miss - but no-one pointed to an actual harm it can do.
>
> And to Elad point " "I know there is an operator that does X, so I will
> just use it inside the python function I invoke from the python.operator".
>
> Yes, many of our users might think this way. And I acknowledge that in this
> case (providing no harm is done) - I prefer to adapt to the way my user
> thinks, rather than force on them things that I think are the "ONLY" right
> way of doing things. If it makes Airflow easier to use, and increases the
> perception that it's a software written for imperfect humans who do not
> always have time to read the documentation in detail and implement things
> in the way that feels natural to them - I am all for it - voting with all
> my hands and legs. - this means better Airflow adoption, better
> word-of-mouth from our users, and most of all - our users losing as little
> time as possible on unnecessary overhead (and us on responding to user's
> worries).
>
> J.
>
> On Wed, Mar 20, 2024 at 9:37 AM Ash Berlin-Taylor <[email protected]> wrote:
>
> > The reason users are sure they can use operators like that is that it has
> > worked for a long time - hell I even wrote a custom nested operator in
> the
> > past (pre 2.0 admittedly).
> >
> > So this pr should only be a warning by default, or a config option to
> warn
> > but not error
> >
> > Alternatively do we just document it as "this is not recommended, and if
> > it breaks you get to keep both halves"? (I'm not a fan of the runtime
> > enforcement of that, it seems very "heavy" for minimal benefit or
> > protection and limits what adventurous users can do)
> >
> > -a
> >
> > On 20 March 2024 07:05:25 GMT, Jarek Potiuk <[email protected]> wrote:
> > >Just to add to the discussion - a discussion raised today
> > >https://github.com/apache/airflow/discussions/38311  where the user is
> > sure
> > >that they can use operators in such a way as described above, and even
> > used
> > >the term "nested operator".
> > >
> > >I think getting  https://github.com/apache/airflow/pull/37937  in will
> > be a
> > >good way in the future to prevent this misunderstanding, but maybe there
> > is
> > >something to think about - in the "Operators need to die" context by
> > Bolke.
> > >
> > >BTW. I have a hypothesis why those questions started to appear
> frequently
> > >and people being reasonably sure they can do it. It's a pure speculation
> > >(and I asked the user this time to explain) but some of that might be
> > >fuelled by Chat GPT hallucinating about Airflow being able to do it. I
> saw
> > >similar hallucinations before - where people suggested some (completely
> > >wrong like that) solution to their problem and only after inquiry, they
> > >admitted that it was a solution that ChatGPT gave them
> > >
> > >I wonder if we have even more of those soon.
> > >
> > >J.
> > >
> > >
> > >On Sun, Mar 10, 2024 at 9:29 AM Elad Kalif <[email protected]> wrote:
> > >
> > >> The issue here is not just about decorators it happens also with
> regular
> > >> operators (operator inside operator) and also with operator inside
> > >> on_x_callback
> > >>
> > >> For example:
> > >>
> > >>
> >
> https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/
> > >>
> > >>
> >
> https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/
> > >>
> > >>
> > >>
> > >> > I can't see which problem is solved by allowing running one operator
> > >> inside another.
> > >>
> > >> From the user's perspective, they have an operator that knows how to
> do
> > >> something and it's very easy to use. So they want to leverage that.
> > >> For example send Slack message:
> > >>
> > >> slack_operator_post_text = SlackAPIPostOperator(
> > >>         task_id="slack_post_text",
> > >>         channel=SLACK_CHANNEL,
> > >>         text=("My message"),
> > >>     )
> > >>
> > >> It handles everything. Now if you want to send a Slack message from a
> > >> PythonOperator you need to initialize a hook, find the right function
> to
> > >> invoke etc.
> > >> Thus from the user perspective - There is already a class that does
> all
> > >> that. Why can't it just work? Why do they need to "reimplement" the
> > >> operator logic? (most of the time it will be copy paste the logic of
> the
> > >> execute function)
> > >>
> > >> So, the problem they are trying to solve is to avoid code duplication
> > and
> > >> ease of use.
> > >>
> > >> Jarek - I think your solution focuses more on the templating side but
> I
> > >> think the actual problem is not limited to the templating.
> > >> I think the problem is more of "I know there is an operator that does
> > X, so
> > >> I will just use it inside the python function I invoke from the python
> > >> operator" - regardless of whether Jinja/templating becomes an issue or
> > not.
> > >>
> > >> On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <[email protected]> wrote:
> > >>
> > >> > I see that we have already (thanks David!) a PR:
> > >> > https://github.com/apache/airflow/pull/37937 to forbid this use
> > (which
> > >> is
> > >> > cool and I am glad my discussion had some ripple effect :D ).
> > >> >
> > >> > I am quite happy to get this one merged once it passes
> tests/reviews,
> > >> but I
> > >> > would still want to explore future departure / options we might
> have,
> > >> maybe
> > >> > there will be another - long term - ripple effect :). I thought a
> bit
> > >> more
> > >> > about  - possibly - different reasons why this pattern we observe is
> > >> > emerging and I have a theory.
> > >> >
> > >> > To Andrey's comments:
> > >> >
> > >> > > I can't see which problem is solved by allowing running one
> operator
> > >> > inside another.
> > >> >
> > >> > For me, the main problem to solve is that using Hooks in the way I
> > >> > described in
> > >> >
> > >> >
> > >>
> >
> https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2
> > >> > in 2022 are almost non-discoverable by significant percentage of
> > users.
> > >> > Especially those kinds of users that mostly treat Airflow Operators
> as
> > >> > black-box and **just** discovered task flow as a way that they can
> do
> > >> > simple things in Python - but they are not into writing their own
> > custom
> > >> > operators, nor look at the operator's code. Generally they don't
> > really
> > >> see
> > >> > DAG authoring as writing Python Code, it's mostly about using a
> little
> > >> > weird DSL to build their DAGs. Mostly copy&pasting some constructs
> > that
> > >> > look like putting together existing building blocks and using
> patterns
> > >> like
> > >> > `>>` to add dependencies.
> > >> >
> > >> > Yes I try to be empathetic and try to guess how such users think
> about
> > >> DAG
> > >> > authoring - I might be wrong, but this is what I see as a recurring
> > >> > pattern.
> > >> >
> > >> > So in this context - @task is not Python code writing, it's yet
> > another
> > >> DSL
> > >> > that people see as appealing. And the case (Here I just speculate -
> > so I
> > >> > might be entirely wrong) I **think** the original pattern I posted
> > above
> > >> > solve is that people think that they can slightly improve the
> > flexibility
> > >> > of the operators by adding a bit of simple code before when they
> need
> > a
> > >> bit
> > >> > more flexibility and JINJA is not enough. Basically replacing this
> > >> >
> > >> > operator = AnOperator(with_param='{{ here I want some dynamicness
> }}')
> > >> >
> > >> > with:
> > >> >
> > >> > @task
> > >> > def my_task():
> > >> >     calculated_param = calculate_the_param()  # do something more
> > complex
> > >> > that is difficult to do with JINJA expression
> > >> >     operator = AnOperator(with_param=calculated_param)
> > >> >     operator.execute()
> > >> >
> > >> > And I **think** the main issue to solve here is how to make it a bit
> > more
> > >> > flexible to get parameters of operators pre-calculated **just**
> before
> > >> the
> > >> > execute() method
> > >> >
> > >> > This is speculation of course - and there might be different
> > motivations
> > >> -
> > >> > but I think addressing this need better - might be actually solving
> > the
> > >> > problem (combined with David's PR). If we find a way to pass more
> > complex
> > >> > calculations to parameters of operators?
> > >> >
> > >> > So MAYBE (just maybe) we could do something like that (conceptual -
> > name
> > >> > might be different)
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> >
> operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param))
> > >> >
> > >> > And let the user use a callable there:
> > >> >
> > >> > def calculate_the_param(context: dict) -> Any
> > >> >
> > >> > I **think** we could extend our "rendering JINJA template" to handle
> > this
> > >> > special case for templated parameters. Plus, it would nicely solve
> the
> > >> > "render_as_native" problem - because that method could return the
> > >> expected
> > >> > object rather than string (and every parameter could have its own
> > >> > method....
> > >> >
> > >> > Maybe that would be a good solution ?
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish
> > >> > <[email protected]> wrote:
> > >> >
> > >> > > One wrinkle to the have cake and eat it too approach is deferrable
> > >> > > operators. It doesn't seem it would be very practical to resume
> back
> > >> into
> > >> > > the operator that is nested inside a taskflow function.  One
> > solution
> > >> > would
> > >> > > be to run the trigger in process like we currently do with
> > >> `dag.test()`.
> > >> > > That would make it non-deferrable in effect.  But at least it
> would
> > run
> > >> > > properly.  There may be other better solutions.
> > >> > >
> > >> >
> > >>
> >
>

Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Reply via email to