+1 Definitely a +1 from me, seems like a relatively small effort to get good returns
-- Regards, Aritra Basu On Tue, Oct 24, 2023, 11:01 PM Vincent Beck <vincb...@apache.org> wrote: > +1 I like this one. I think it is totally worth it adding this decorator, > mostly because I think the effort is not huge. I also think, as a > contributor, it is a good exercise to, when writing tests, figure out > whether the tests I am writing is using the DB. > > On 2023/10/24 16:31:57 Jarek Potiuk wrote: > > Hello everyone, > > > > *TL;DR; *I have a proposal on how we can probably significantly decrease > > time needed to run our CI tests. All that at the expense of small - I > > think- effort by the need to mark tests as "db-tests" by contributors > > enforced by our CI (in most cases that won't even be needed).. > > > > *A bit more context:* > > > > I have started to focus on AIP-44 work (internal API) and part of this > work > > is to make it possible to run subset of our tests (related to > > untrusted components and providers) in a db-isolation mode to see if we > > have not forgotten anything (and to keep db isolation mode in the > future). > > Part of this task is to see how many tests we have that actually use the > > DB. I had some interesting findings and I thought I would explore an idea > > to split our tests to DB and NON-DB tests. > > > > After some initial exploration I believe we can get at least 30-50% > > reduction in our CI test execution time (and build test time) at the > > expense of the necessity of marking tests as "db_tests" when they need > > the DB. > > > > This topic has been brought up occasionally in the past - why do we run > our > > tests on multiple versions of multiple databases. And the problem is that > > many of our tests are using the DB and we need to test them on all of > them. > > But also - turns out we have quite a lot of tests that do not use the DB. > > This - for example - does not allow us to use "pytest-xdist" to run the > > test in parallel freely, we have to use the "custom" docker based > > parallelism in CI. > > > > So the idea is that If we could separate DB from NON-DB tests we could > only > > run the DB tests on multiple databases, but then all the "non-db tests" > > could only be run once. It would only make sense though if we have a > > significant number of non-DB tests, if we could clearly separate them and > > if it was easy in the future to keep them separated (so only mark tests > as > > db_tests when they actually need DB). > > > > *Findings:* > > > > I tried it out and I found out that: > > > > a) Turns out that we have quite a significant number of tests that are > > non-db tests. Likely between 30% to 50% by my estimate - both in core and > > providers and they are scattered among our code - mostly whole "modules" > > are either db or non-db tests. > > > > b) It is rather easy to separate db from non-db tests without any > > structural change to our test structure (just by using a > > custom @pytest.mark.db_test - on test, class, or even module level). So > we > > can easily handle cases where the whole module is "mostly non-db" and > only > > few test cases are "db" - there are some cases like that, but the custom > > markers are rather nice to help with that. > > > > c) It should be easy to keep them separated in the future with only very > > little overhead. Basically when your tests will start failing in CI and > > tell you "mark your test with @pytest.mark.db_test" you will have to do > it > > - but in most cases you won't even have to do it because your test will > be > > added to the module that is already marked as db_test module > > > > d) We can use pytest-xdist for parallel execution of non-db tests which > > will make the test suites of "non-db" tests much more straightforward - > > basically `pytest tests --skip-db-tests -n auto` should do the trick). > And > > you could do the same locally as well. > > > > *Impact on contributors: * > > > > There is absolutely no impact on running the tests locally. Non-db tests > > will run when DB is there, no problem with that. I think in a vast number > > of cases it will be mostly no change for regular contributors. Only > > occasionally (when new test module is added for example) their tests will > > fail in CI with very clear instructions from the CI to mark their tests > as > > "@pytest.mark.db_test" or marking the whole module as "db_test module" by > > `pytestmark = pytest.mark.db_test`. This will happen when their tests > will > > be treated as "non-db" tests and attempted to run in CI. > > > > *Summarising:* > > > > I think we can achieve some 30%-50% reduction in time/CPU needed if we > > implement it. Of course it means again complicating our CI builds quite a > > bit. But with those gains I think it's totally worth it. > > > > I have a draft (still needs quite some work) - PR showing how it could > > look like https://github.com/apache/airflow/pull/35160. I think in 1-2 > > days I should have a ready-to-merge change with all the tests passing and > > with 30%-50% reduction of the resources needed and time. > > > > Would love your comments - is it worth it? Should I proceed with it? Any > > other comments? > > > > J. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >