> The providers tests will soon (but possibly not before 3.0 at this point)
need to be converted to use the TaskSDK properly which won’t/can't actually
use the DB, so we will need to do something soon.

Just to clarify - my goal is absolutely to have all providers use Task SDK
before Airflow 3.0. And I hope we can work out a half-automation
and half-crowdsourced way to achieve it similarly as we have almost done
with providers. That will be my focus as part of all the work around
packaging and some of my top priorities.

> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a
single row. I think I’m just confused or misdirected about your comment
about database here. Can you give a concrete example of the change you
would make, and how this will speed things up?

I think it's not about "slow" - it's about using ORM SQLalchemy objects and
effectively requiring the DB. As outlined before - I want all our Provider
tests to be non-DB tests. Soon. they will not even have DB to talk to.

> So what are you actually proposing?
> We have to be aware of making our tests overly fragile if we replace
everything with mocks, then we are only testing our mocks and not the real
behaviour.

What you wrote above - requiring providers to use Task SDK's equivalent of
Connection in this case. Which will be 100% what Providers will be doing in
"production" for Airflow 3.

J.



On Fri, Feb 7, 2025 at 10:28 AM Ash Berlin-Taylor <a...@apache.org> wrote:

> The providers tests will soon (but possibly not before 3.0 at this point)
> need to be converted to use the TaskSDK properly which won’t/can't actually
> use the DB, so we will need to do something soon.
>
> > Hence that’s why when I do refactorings in provider unit tests, I’ve
> already replaced those real connections with mocked ones making tests run
> faster locally (no database needed)
>
> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a
> single row. I think I’m just confused or misdirected about your comment
> about database here. Can you give a concrete example of the change you
> would make, and how this will speed things up?
>
> To my mind creating/obtaining the Connection object isn’t the slow part,
> but doing anything with that connection. But connections don’t actually do
> the connecting/opening sockets/network requests — that’s all in the Hook
> classes.
>
> So what are you actually proposing?
>
> We have to be aware of making our tests overly fragile if we replace
> everything with mocks, then we are only testing our mocks and not the real
> behaviour.
>
> -ash
>
>
> > On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > +10 on that. My next step after finishing Provider's move, was to make
> > essentially all unit tests in Providers non-DB tests and removing "real
> > connection" usage is part of it.
> >
> > This is essentially stage 3 of
> > https://github.com/apache/airflow/issues/42632 that is planned and I
> want
> > to make POC and indeed involve others in crowd-sourcing the change
> (similar
> > to provider's move) after I figure out how to do it.
> >
> > J.
> >
> >
> > On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be>
> wrote:
> >
> >> Hello,
> >>
> >>
> >>
> >> The caplog vote triggered me to launch this proposal as it’s also
> related
> >> to unit testing, and as I think we want our unit tests as clean and as
> >> simple and as fast as possible.
> >>
> >> I think it would be a good practise to not define and create real
> Airflow
> >> connections within the providers unit tests (which use the Airflow test
> >> database), as normally when writing unit tests those should be isolated
> and
> >> not be depend on any external systems like a database.
> >>
> >>
> >>
> >> Also in my case those make the tests to run slower.  Beside that I ‘ve
> >> also noticed when working on some PR regarding providers, sometimes
> there
> >> are some glitches within the CI/CD which seem to cause issues with those
> >> “real” connections, causing tests to randomly fail.
> >>
> >> Hence that’s why when I do refactorings in provider unit tests, I’ve
> >> already replaced those real connections with mocked ones making tests
> run
> >> faster locally (no database needed) and no more random failures during
> >> tests (possibly preceding tests that mess up connections).
> >>
> >> That’s doesn’t mean we don’t want to use the database of course during
> >> tests, I’m just saying it’s a bit of overkill to use a database in a
> unit
> >> test just to get a connection.
> >>
> >>
> >>
> >> We could also create a common mocking method for connections in
> >> tests_common and use it across all unit tests, now those are mostly
> >> redefined across different provider tests.
> >>
> >>
> >>
> >> Of course I’m willing to contribute on this one, what do you think about
> >> this idea?  Personally, I think this can only make maintenance easier
> (and
> >> prevent random failures and faster tests results).
> >>
> >>
> >>
> >> Curious of your thoughts.
> >>
> >>
> >>
> >> Kind regards,
> >>
> >> David
> >>
> >>
> >>
> >>
> >>
> >> *David Blain*
> >>
> >> Data Engineer *at* ICT-514 - BI End User Reporting
> >>
> >>
> >>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to