Just to comment on that -> this is happening with the recent changes by Amogh !
On Sat, Feb 8, 2025 at 12:54 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > Also it might improve performance a bit, mainly as they would run with > xdist in parallel. > > Absolutely. That's one of the driving changes for me to make all our > Provider tests non-DB ones. Xdist introduces it's own set of challenges of > course - ie. it makes things more prone to side-effects, but so far it has > played relatively well for us. And it allows us to spot some usage patterns > where we are relying on "no-side effects" and get rid of those (example is > the "caplog" one which we just consented to get rid of because it makes us > prone to unintended side effects - and we get rid of those not by fixing > them, but by making our code more resilient to those side-effects. > > I would - one day - want to enable order-randomisation in our tests, which > is the ultimate way of making sure that there are no side effects. So far > any time i tried it on providers, there were so many side-effects that it > would be hugely detrimental to the sanity of our contributors, but if we > gradually improve our test suite, get rid of DB and side-effects, and > carefully enable randomisation - first per module, then per package and > finally per all test suite, we might eventually get there. And Xdist is one > of the ways to gently introduce it because it introduced a pretty > "controlled" randomisation - i.e. mostly tests are executed in the same > order but rarely there are variations due to adding new tests or some > environmental conditions like speed of disk or CPU. > > J. > > > > On Sat, Feb 8, 2025 at 12:18 PM Jens Scheffler <j_scheff...@gmx.de.invalid> > wrote: > >> Hi there, >> >> so then I hear that this is a very constructive proposal - compared to >> clearing/writing/using connections for pytests in the DB there should >> rather be a generic Fixture that can build on top of Monkeypatch.Setenv. >> That saves time and as well can convert these tests to be non-DB, >> correct? Also it might improve performance a bit, mainly as they would >> run with xdist in parallel. >> >> Jens >> >> On 07.02.25 23:10, Ash Berlin-Taylor wrote: >> > Mock and Connection don’t make sense together. A Connection is >> effectively a dataclass that is loaded from the DB. There’s nothing to mock. >> > >> > Also are you aware that you can set `ARIFLOW_CONN_*` environment >> variables and those will be looked at before the DB. That plus >> [monkeypatch.setenv]( >> https://docs.pytest.org/en/stable/reference/reference.html#pytest.MonkeyPatch.setenv) >> might be all we need, and a custom fixture in tests_common/pytest_plugin.py? >> > >> > Though again, I question how reading a row from the DB by it’s PK can >> be slow even if you read it 1000 times in a test… >> > >> > >> > >> >> On 7 Feb 2025, at 17:43, Blain David <david.bl...@infrabel.be> wrote: >> >> >> >>> Pulling a connection from the DB itself shouldn’t/can’t be slow - >> It’s a single row. I think I’m just confused or misdirected about your >> comment about database here. Can you give a concrete example of the change >> you would make, and how this will speed things up? >> >> It's not only pulling, I saw some tests creating multiple test >> connections before each tests, and the those connections being retrieved >> multiple times within tests, not that it's super slow, but it won't make >> tests lightning fast either imho. >> >> >> >>> So what are you actually proposing? >> >> I would propose a common test helper function which returns a mocked >> Airflow connection, and for the providers we will need to mock the SDK once >> direct DB connection won't be possible anymore (which is good). >> >> >> >>> We have to be aware of making our tests overly fragile if we replace >> everything with mocks, then we are only testing our mocks and not the real >> behaviour. >> >> Normally this behaviour should be tested in dedicated integration >> tests, not on each unit test for hooks/operators/sensors/triggers and could >> be tested in a generic matter imho. That’s what I mean with "slow down", >> too much tests are doing this (especially in the providers). But I agree >> (and very good point) we should be careful and make sure we don't purely >> rely tests on mocks, hence why integration tests. I've stumbled this week >> on a test in Airflow provider where the mocked feature was indeed making >> the test succeed, but was in fact lacking a real implementation (e.g. >> Elasticsearch) thus not working at all. Such things (only testing mocks or >> tests that in fact test nothing like you suggest) can be detected through >> mutation testing, but those are slow and expensive, not that they should >> run each time (maybe only on master for example), but could help monitor >> the quality of the tests, so it would have detected the case I described >> and the point that you made (e.g. fragile tests). >> >> >> >> -----Original Message----- >> >> From: Ash Berlin-Taylor <a...@apache.org <mailto:a...@apache.org>> >> >> Sent: Friday, 7 February 2025 10:28 >> >> To: dev@airflow.apache.org <mailto:dev@airflow.apache.org> >> >> Subject: Re: [PROPOSAL] Remove creation of real Airflow connections in >> provider unit tests >> >> >> >> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze >> niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, >> stuur deze e-mail als bijlage naar ab...@infrabel.be <mailto: >> ab...@infrabel.be><mailto:ab...@infrabel.be>. >> >> >> >> The providers tests will soon (but possibly not before 3.0 at this >> point) need to be converted to use the TaskSDK properly which won’t/can't >> actually use the DB, so we will need to do something soon. >> >> >> >>> Hence that’s why when I do refactorings in provider unit tests, I’ve >> >>> already replaced those real connections with mocked ones making tests >> >>> run faster locally (no database needed) >> >> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s >> a single row. I think I’m just confused or misdirected about your comment >> about database here. Can you give a concrete example of the change you >> would make, and how this will speed things up? >> >> >> >> To my mind creating/obtaining the Connection object isn’t the slow >> part, but doing anything with that connection. But connections don’t >> actually do the connecting/opening sockets/network requests — that’s all in >> the Hook classes. >> >> >> >> So what are you actually proposing? >> >> >> >> We have to be aware of making our tests overly fragile if we replace >> everything with mocks, then we are only testing our mocks and not the real >> behaviour. >> >> >> >> -ash >> >> >> >> >> >>> On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> >> >>> +10 on that. My next step after finishing Provider's move, was to make >> >>> essentially all unit tests in Providers non-DB tests and removing >> >>> "real connection" usage is part of it. >> >>> >> >>> This is essentially stage 3 of >> >>> >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith >> >>> ub.com <http://ub.com/ >> >%2Fapache%2Fairflow%2Fissues%2F42632&data=05%7C02%7Cdavid.blain% >> >>> 40infrabel.be%7Cc6eefb61c10240de8c0d08dd4759d0d0%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638745173290138741%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XZMKiZvZqGAUmm9tNXbewmoZW%2ByaOXyzGmoA%2BaRWEYE%3D&reserved=0 >> that is planned and I want to make POC and indeed involve others in >> crowd-sourcing the change (similar to provider's move) after I figure out >> how to do it. >> >>> >> >>> J. >> >>> >> >>> >> >>> On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be> >> wrote: >> >>> >> >>>> Hello, >> >>>> >> >>>> >> >>>> >> >>>> The caplog vote triggered me to launch this proposal as it’s also >> >>>> related to unit testing, and as I think we want our unit tests as >> >>>> clean and as simple and as fast as possible. >> >>>> >> >>>> I think it would be a good practise to not define and create real >> >>>> Airflow connections within the providers unit tests (which use the >> >>>> Airflow test database), as normally when writing unit tests those >> >>>> should be isolated and not be depend on any external systems like a >> database. >> >>>> >> >>>> >> >>>> >> >>>> Also in my case those make the tests to run slower. Beside that I >> >>>> ‘ve also noticed when working on some PR regarding providers, >> >>>> sometimes there are some glitches within the CI/CD which seem to >> >>>> cause issues with those “real” connections, causing tests to >> randomly fail. >> >>>> >> >>>> Hence that’s why when I do refactorings in provider unit tests, I’ve >> >>>> already replaced those real connections with mocked ones making tests >> >>>> run faster locally (no database needed) and no more random failures >> >>>> during tests (possibly preceding tests that mess up connections). >> >>>> >> >>>> That’s doesn’t mean we don’t want to use the database of course >> >>>> during tests, I’m just saying it’s a bit of overkill to use a >> >>>> database in a unit test just to get a connection. >> >>>> >> >>>> >> >>>> >> >>>> We could also create a common mocking method for connections in >> >>>> tests_common and use it across all unit tests, now those are mostly >> >>>> redefined across different provider tests. >> >>>> >> >>>> >> >>>> >> >>>> Of course I’m willing to contribute on this one, what do you think >> >>>> about this idea? Personally, I think this can only make maintenance >> >>>> easier (and prevent random failures and faster tests results). >> >>>> >> >>>> >> >>>> >> >>>> Curious of your thoughts. >> >>>> >> >>>> >> >>>> >> >>>> Kind regards, >> >>>> >> >>>> David >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> *David Blain* >> >>>> >> >>>> Data Engineer *at* ICT-514 - BI End User Reporting >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org <mailto: >> dev-unsubscr...@airflow.apache.org> >> >> For additional commands, e-mail: dev-h...@airflow.apache.org <mailto: >> dev-h...@airflow.apache.org> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >