Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Jarek Potiuk Wed, 20 Mar 2024 02:01:57 -0700

I am really torn on that one to be honest.

I am OK with the error (with the note that it will likely break a lot of
workflows), I am ok with the warning as well as a softer way of letting the
user know they are doing it wrong).
But ultimately, I'd really want we (re) consider if we cannot make it into
a "working" solution.


It should be possible IMHO to allow this usage with clever instrumentation
- the code that we have now in Dave's PR is already detecting most (all
that Elad mentioned as well including regular operators within regular
operators I think Dave?) cases like that.

It's really **only**  a matter of a) setting task_id to be parent's task if
b) getting context from parent operator c) running pre-processing of
templated fields (and in this case it can be done in the constructor of
such a nested operator - because we already have context.

This all seems doable from the technical point of view.

We do not even have to handle pushing the return value to Xcom (that would
be tricky with potentially returning value from the "upper" operator - but
we can add a warning about that one if it happens and such "nested"
operator has do_xcom_push and returns value from execute) and it should
just run "as expected".

I can't think of a big harm done this way to be honest - and it would make
life of our users way easier (and also live of those who happen to look at
the issues and discussion and attempt to help the users - because those
kinds of discussions and issues would not at all appear if this case will
**just work**(TM).

I think this is the case where "perceived correctness" trumps "harm done".
Unless of course I am missing some side effect here - which might well be I
miss - but no-one pointed to an actual harm it can do.

And to Elad point " "I know there is an operator that does X, so I will
just use it inside the python function I invoke from the python.operator".

Yes, many of our users might think this way. And I acknowledge that in this
case (providing no harm is done) - I prefer to adapt to the way my user
thinks, rather than force on them things that I think are the "ONLY" right
way of doing things. If it makes Airflow easier to use, and increases the
perception that it's a software written for imperfect humans who do not
always have time to read the documentation in detail and implement things
in the way that feels natural to them - I am all for it - voting with all
my hands and legs. - this means better Airflow adoption, better
word-of-mouth from our users, and most of all - our users losing as little
time as possible on unnecessary overhead (and us on responding to user's
worries).

J.

On Wed, Mar 20, 2024 at 9:37 AM Ash Berlin-Taylor <a...@apache.org> wrote:

> The reason users are sure they can use operators like that is that it has
> worked for a long time - hell I even wrote a custom nested operator in the
> past (pre 2.0 admittedly).
>
> So this pr should only be a warning by default, or a config option to warn
> but not error
>
> Alternatively do we just document it as "this is not recommended, and if
> it breaks you get to keep both halves"? (I'm not a fan of the runtime
> enforcement of that, it seems very "heavy" for minimal benefit or
> protection and limits what adventurous users can do)
>
> -a
>
> On 20 March 2024 07:05:25 GMT, Jarek Potiuk <ja...@potiuk.com> wrote:
> >Just to add to the discussion - a discussion raised today
> >https://github.com/apache/airflow/discussions/38311  where the user is
> sure
> >that they can use operators in such a way as described above, and even
> used
> >the term "nested operator".
> >
> >I think getting  https://github.com/apache/airflow/pull/37937  in will
> be a
> >good way in the future to prevent this misunderstanding, but maybe there
> is
> >something to think about - in the "Operators need to die" context by
> Bolke.
> >
> >BTW. I have a hypothesis why those questions started to appear frequently
> >and people being reasonably sure they can do it. It's a pure speculation
> >(and I asked the user this time to explain) but some of that might be
> >fuelled by Chat GPT hallucinating about Airflow being able to do it. I saw
> >similar hallucinations before - where people suggested some (completely
> >wrong like that) solution to their problem and only after inquiry, they
> >admitted that it was a solution that ChatGPT gave them
> >
> >I wonder if we have even more of those soon.
> >
> >J.
> >
> >
> >On Sun, Mar 10, 2024 at 9:29 AM Elad Kalif <elad...@apache.org> wrote:
> >
> >> The issue here is not just about decorators it happens also with regular
> >> operators (operator inside operator) and also with operator inside
> >> on_x_callback
> >>
> >> For example:
> >>
> >>
> https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/
> >>
> >>
> https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/
> >>
> >>
> >>
> >> > I can't see which problem is solved by allowing running one operator
> >> inside another.
> >>
> >> From the user's perspective, they have an operator that knows how to do
> >> something and it's very easy to use. So they want to leverage that.
> >> For example send Slack message:
> >>
> >> slack_operator_post_text = SlackAPIPostOperator(
> >>         task_id="slack_post_text",
> >>         channel=SLACK_CHANNEL,
> >>         text=("My message"),
> >>     )
> >>
> >> It handles everything. Now if you want to send a Slack message from a
> >> PythonOperator you need to initialize a hook, find the right function to
> >> invoke etc.
> >> Thus from the user perspective - There is already a class that does all
> >> that. Why can't it just work? Why do they need to "reimplement" the
> >> operator logic? (most of the time it will be copy paste the logic of the
> >> execute function)
> >>
> >> So, the problem they are trying to solve is to avoid code duplication
> and
> >> ease of use.
> >>
> >> Jarek - I think your solution focuses more on the templating side but I
> >> think the actual problem is not limited to the templating.
> >> I think the problem is more of "I know there is an operator that does
> X, so
> >> I will just use it inside the python function I invoke from the python
> >> operator" - regardless of whether Jinja/templating becomes an issue or
> not.
> >>
> >> On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >> > I see that we have already (thanks David!) a PR:
> >> > https://github.com/apache/airflow/pull/37937 to forbid this use
> (which
> >> is
> >> > cool and I am glad my discussion had some ripple effect :D ).
> >> >
> >> > I am quite happy to get this one merged once it passes tests/reviews,
> >> but I
> >> > would still want to explore future departure / options we might have,
> >> maybe
> >> > there will be another - long term - ripple effect :). I thought a bit
> >> more
> >> > about  - possibly - different reasons why this pattern we observe is
> >> > emerging and I have a theory.
> >> >
> >> > To Andrey's comments:
> >> >
> >> > > I can't see which problem is solved by allowing running one operator
> >> > inside another.
> >> >
> >> > For me, the main problem to solve is that using Hooks in the way I
> >> > described in
> >> >
> >> >
> >>
> https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2
> >> > in 2022 are almost non-discoverable by significant percentage of
> users.
> >> > Especially those kinds of users that mostly treat Airflow Operators as
> >> > black-box and **just** discovered task flow as a way that they can do
> >> > simple things in Python - but they are not into writing their own
> custom
> >> > operators, nor look at the operator's code. Generally they don't
> really
> >> see
> >> > DAG authoring as writing Python Code, it's mostly about using a little
> >> > weird DSL to build their DAGs. Mostly copy&pasting some constructs
> that
> >> > look like putting together existing building blocks and using patterns
> >> like
> >> > `>>` to add dependencies.
> >> >
> >> > Yes I try to be empathetic and try to guess how such users think about
> >> DAG
> >> > authoring - I might be wrong, but this is what I see as a recurring
> >> > pattern.
> >> >
> >> > So in this context - @task is not Python code writing, it's yet
> another
> >> DSL
> >> > that people see as appealing. And the case (Here I just speculate -
> so I
> >> > might be entirely wrong) I **think** the original pattern I posted
> above
> >> > solve is that people think that they can slightly improve the
> flexibility
> >> > of the operators by adding a bit of simple code before when they need
> a
> >> bit
> >> > more flexibility and JINJA is not enough. Basically replacing this
> >> >
> >> > operator = AnOperator(with_param='{{ here I want some dynamicness }}')
> >> >
> >> > with:
> >> >
> >> > @task
> >> > def my_task():
> >> >     calculated_param = calculate_the_param()  # do something more
> complex
> >> > that is difficult to do with JINJA expression
> >> >     operator = AnOperator(with_param=calculated_param)
> >> >     operator.execute()
> >> >
> >> > And I **think** the main issue to solve here is how to make it a bit
> more
> >> > flexible to get parameters of operators pre-calculated **just** before
> >> the
> >> > execute() method
> >> >
> >> > This is speculation of course - and there might be different
> motivations
> >> -
> >> > but I think addressing this need better - might be actually solving
> the
> >> > problem (combined with David's PR). If we find a way to pass more
> complex
> >> > calculations to parameters of operators?
> >> >
> >> > So MAYBE (just maybe) we could do something like that (conceptual -
> name
> >> > might be different)
> >> >
> >> >
> >> >
> >> >
> >>
> operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param))
> >> >
> >> > And let the user use a callable there:
> >> >
> >> > def calculate_the_param(context: dict) -> Any
> >> >
> >> > I **think** we could extend our "rendering JINJA template" to handle
> this
> >> > special case for templated parameters. Plus, it would nicely solve the
> >> > "render_as_native" problem - because that method could return the
> >> expected
> >> > object rather than string (and every parameter could have its own
> >> > method....
> >> >
> >> > Maybe that would be a good solution ?
> >> >
> >> > J.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish
> >> > <daniel.stand...@astronomer.io.invalid> wrote:
> >> >
> >> > > One wrinkle to the have cake and eat it too approach is deferrable
> >> > > operators. It doesn't seem it would be very practical to resume back
> >> into
> >> > > the operator that is nested inside a taskflow function.  One
> solution
> >> > would
> >> > > be to run the trigger in process like we currently do with
> >> `dag.test()`.
> >> > > That would make it non-deferrable in effect.  But at least it would
> run
> >> > > properly.  There may be other better solutions.
> >> > >
> >> >
> >>
>

Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Reply via email to