I got the first "fully green pass" of the "Improve testing harness to
separate DB and non-DB tests" that looks stable and shows the "real"
numbers and improvements.

Copying it here as well from
https://github.com/apache/airflow/pull/35160#issuecomment-1784152557 as
this one will impact (I hope positively) everyone contributing to Airflow.

(Quarantined tests are green but really "skipped" so I will need to fix
that one).

Looking for reviews - while adding some best practices and updating docs
about testing and testing stability of it (including Public Runners). I
added comments in important places explaining some of the changes/decisions
made to make it easier to review.

It seems that I managed to get the promised speed improvements.

* We got just under 10 minutes for full DB tests in most cases (down from
16-20)
* Stability of the tests is greatly improved, I also added some
optimization in Python Virtualenv that should optimize them quite a bit on
their own and make much more stable (using venv caching from Jens)
* The non-DB tests run on self-hosted runners run in under 5 minutes and in
most PRs they will run only once


*Overall:*
* for most PRs with core changes instead of tests = 7x 20 (140) minutes
build time we will have 7x 10 DB tests and + 1x 5 non-DB test = 75 minutes.
This gives ~ 45% shorter build time needed
* for many "structural" changes that include adding providers, changing
builds and everything that require "full tests needed", instead of 20x 20
(400 minutes) build time we will have 20 x10 (DB) + 5x5 (Non-DB) minutes =
225 minutes which is a bit more than 45% improvement.
* for many smaller PRs there will be smaller or greater improvements -
depending on which area of code is involved - some parts have more DB tests
percentage-wise, some less, but the improvements in all: stability, build
time and elapsed time should be visible across the board.  I have yet to
gather the "Public Runners" numbers, but they should be similar -
proportionally.

Currently it looks like we have *9000 non-DB *tests and *7578 DB* tests.
Those are only "unit" tests - this change does not impact "integration",
"helm", "kubernetes" , "docker-compose" (and "system") tests of ours, so
this will generally mean that now the focus should be put there - as they
will become the ones that will have biggest on impact the "elapsed" time of
the build (i.e. feedback time for contributor). But those are not run for
many PRs and take far less "build" time.

Looking forward to reviews (while I will be doing some final touches /
adding docs).

J.

Reply via email to