Just to comment on that -> this is happening with the recent changes by
Amogh !



On Sat, Feb 8, 2025 at 12:54 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> > Also it might improve performance a bit, mainly as they would run with
> xdist in parallel.
>
> Absolutely. That's one of the driving changes for me to make all our
> Provider tests non-DB ones. Xdist introduces it's own set of challenges of
> course - ie. it makes things more prone to side-effects, but so far it has
> played relatively well for us. And it allows us to spot some usage patterns
> where we are relying on "no-side effects" and get rid of those (example is
> the "caplog" one which we just consented to get rid of because it makes us
> prone to unintended side effects - and we get rid of those not by fixing
> them, but by making our code more resilient to those side-effects.
>
> I would - one day - want to enable order-randomisation in our tests, which
> is the ultimate way of making sure that there are no side effects. So far
> any time i tried it on providers, there were so many side-effects that it
> would be hugely detrimental to the sanity of our contributors, but if we
> gradually improve our test suite, get rid of DB and side-effects, and
> carefully enable randomisation - first per module, then per package and
> finally per all test suite, we might eventually get there. And Xdist is one
> of the ways to gently introduce it because it introduced a pretty
> "controlled" randomisation - i.e. mostly tests are executed in the same
> order but rarely there are variations due to adding new tests or some
> environmental conditions like speed of disk or CPU.
>
> J.
>
>
>
> On Sat, Feb 8, 2025 at 12:18 PM Jens Scheffler <j_scheff...@gmx.de.invalid>
> wrote:
>
>> Hi there,
>>
>> so then I hear that this is a very constructive proposal - compared to
>> clearing/writing/using connections for pytests in the DB there should
>> rather be a generic Fixture that can build on top of Monkeypatch.Setenv.
>> That saves time and as well can convert these tests to be non-DB,
>> correct? Also it might improve performance a bit, mainly as they would
>> run with xdist in parallel.
>>
>> Jens
>>
>> On 07.02.25 23:10, Ash Berlin-Taylor wrote:
>> > Mock and Connection don’t make sense together. A Connection is
>> effectively a dataclass that is loaded from the DB. There’s nothing to mock.
>> >
>> > Also are you aware that you can set `ARIFLOW_CONN_*` environment
>> variables and those will be looked at before the DB. That plus
>> [monkeypatch.setenv](
>> https://docs.pytest.org/en/stable/reference/reference.html#pytest.MonkeyPatch.setenv)
>> might be all we need, and a custom fixture in tests_common/pytest_plugin.py?
>> >
>> > Though again, I question how reading a row from the DB by it’s PK can
>> be slow even if you read it 1000 times in a test…
>> >
>> >
>> >
>> >> On 7 Feb 2025, at 17:43, Blain David <david.bl...@infrabel.be> wrote:
>> >>
>> >>> Pulling a connection from the DB itself shouldn’t/can’t be slow -
>> It’s a single row. I think I’m just confused or misdirected about your
>> comment about database here. Can you give a concrete example of the change
>> you would make, and how this will speed things up?
>> >> It's not only pulling, I saw some tests creating multiple test
>> connections before each tests, and the those connections being retrieved
>> multiple times within tests, not that it's super slow, but it won't make
>> tests lightning fast either imho.
>> >>
>> >>> So what are you actually proposing?
>> >> I would propose a common test helper function which returns a mocked
>> Airflow connection, and for the providers we will need to mock the SDK once
>> direct DB connection won't be possible anymore (which is good).
>> >>
>> >>> We have to be aware of making our tests overly fragile if we replace
>> everything with mocks, then we are only testing our mocks and not the real
>> behaviour.
>> >> Normally this behaviour should be tested in dedicated integration
>> tests, not on each unit test for hooks/operators/sensors/triggers and could
>> be tested in a generic matter imho.  That’s what I mean with "slow down",
>> too much tests are doing this (especially in the providers).  But I agree
>> (and very good point) we should be careful and make sure we don't purely
>> rely tests on mocks, hence why integration tests.  I've stumbled this week
>> on a test in Airflow provider where the mocked feature was indeed making
>> the test succeed, but was in fact lacking a real implementation (e.g.
>> Elasticsearch) thus not working at all.  Such things (only testing mocks or
>> tests that in fact test nothing like you suggest) can be detected through
>> mutation testing, but those are slow and expensive, not that they should
>> run each time (maybe only on master for example), but could help monitor
>> the quality of the tests, so it would have detected the case I described
>> and the point that you made (e.g. fragile tests).
>> >>
>> >> -----Original Message-----
>> >> From: Ash Berlin-Taylor <a...@apache.org <mailto:a...@apache.org>>
>> >> Sent: Friday, 7 February 2025 10:28
>> >> To: dev@airflow.apache.org <mailto:dev@airflow.apache.org>
>> >> Subject: Re: [PROPOSAL] Remove creation of real Airflow connections in
>> provider unit tests
>> >>
>> >> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
>> niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel,
>> stuur deze e-mail als bijlage naar ab...@infrabel.be <mailto:
>> ab...@infrabel.be><mailto:ab...@infrabel.be>.
>> >>
>> >> The providers tests will soon (but possibly not before 3.0 at this
>> point) need to be converted to use the TaskSDK properly which won’t/can't
>> actually use the DB, so we will need to do something soon.
>> >>
>> >>> Hence that’s why when I do refactorings in provider unit tests, I’ve
>> >>> already replaced those real connections with mocked ones making tests
>> >>> run faster locally (no database needed)
>> >> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s
>> a single row. I think I’m just confused or misdirected about your comment
>> about database here. Can you give a concrete example of the change you
>> would make, and how this will speed things up?
>> >>
>> >> To my mind creating/obtaining the Connection object isn’t the slow
>> part, but doing anything with that connection. But connections don’t
>> actually do the connecting/opening sockets/network requests — that’s all in
>> the Hook classes.
>> >>
>> >> So what are you actually proposing?
>> >>
>> >> We have to be aware of making our tests overly fragile if we replace
>> everything with mocks, then we are only testing our mocks and not the real
>> behaviour.
>> >>
>> >> -ash
>> >>
>> >>
>> >>> On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote:
>> >>>
>> >>> +10 on that. My next step after finishing Provider's move, was to make
>> >>> essentially all unit tests in Providers non-DB tests and removing
>> >>> "real connection" usage is part of it.
>> >>>
>> >>> This is essentially stage 3 of
>> >>>
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> >>> ub.com <http://ub.com/
>> >%2Fapache%2Fairflow%2Fissues%2F42632&data=05%7C02%7Cdavid.blain%
>> >>> 40infrabel.be%7Cc6eefb61c10240de8c0d08dd4759d0d0%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638745173290138741%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XZMKiZvZqGAUmm9tNXbewmoZW%2ByaOXyzGmoA%2BaRWEYE%3D&reserved=0
>> that is planned and I want to make POC and indeed involve others in
>> crowd-sourcing the change (similar to provider's move) after I figure out
>> how to do it.
>> >>>
>> >>> J.
>> >>>
>> >>>
>> >>> On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>>
>> >>>>
>> >>>> The caplog vote triggered me to launch this proposal as it’s also
>> >>>> related to unit testing, and as I think we want our unit tests as
>> >>>> clean and as simple and as fast as possible.
>> >>>>
>> >>>> I think it would be a good practise to not define and create real
>> >>>> Airflow connections within the providers unit tests (which use the
>> >>>> Airflow test database), as normally when writing unit tests those
>> >>>> should be isolated and not be depend on any external systems like a
>> database.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Also in my case those make the tests to run slower.  Beside that I
>> >>>> ‘ve also noticed when working on some PR regarding providers,
>> >>>> sometimes there are some glitches within the CI/CD which seem to
>> >>>> cause issues with those “real” connections, causing tests to
>> randomly fail.
>> >>>>
>> >>>> Hence that’s why when I do refactorings in provider unit tests, I’ve
>> >>>> already replaced those real connections with mocked ones making tests
>> >>>> run faster locally (no database needed) and no more random failures
>> >>>> during tests (possibly preceding tests that mess up connections).
>> >>>>
>> >>>> That’s doesn’t mean we don’t want to use the database of course
>> >>>> during tests, I’m just saying it’s a bit of overkill to use a
>> >>>> database in a unit test just to get a connection.
>> >>>>
>> >>>>
>> >>>>
>> >>>> We could also create a common mocking method for connections in
>> >>>> tests_common and use it across all unit tests, now those are mostly
>> >>>> redefined across different provider tests.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Of course I’m willing to contribute on this one, what do you think
>> >>>> about this idea?  Personally, I think this can only make maintenance
>> >>>> easier (and prevent random failures and faster tests results).
>> >>>>
>> >>>>
>> >>>>
>> >>>> Curious of your thoughts.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Kind regards,
>> >>>>
>> >>>> David
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> *David Blain*
>> >>>>
>> >>>> Data Engineer *at* ICT-514 - BI End User Reporting
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> >> For additional commands, e-mail: dev-h...@airflow.apache.org
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org <mailto:
>> dev-unsubscr...@airflow.apache.org>
>> >> For additional commands, e-mail: dev-h...@airflow.apache.org <mailto:
>> dev-h...@airflow.apache.org>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>

Reply via email to