I got the first "fully green pass" of the "Improve testing harness to separate DB and non-DB tests" that looks stable and shows the "real" numbers and improvements.
Copying it here as well from https://github.com/apache/airflow/pull/35160#issuecomment-1784152557 as this one will impact (I hope positively) everyone contributing to Airflow. (Quarantined tests are green but really "skipped" so I will need to fix that one). Looking for reviews - while adding some best practices and updating docs about testing and testing stability of it (including Public Runners). I added comments in important places explaining some of the changes/decisions made to make it easier to review. It seems that I managed to get the promised speed improvements. * We got just under 10 minutes for full DB tests in most cases (down from 16-20) * Stability of the tests is greatly improved, I also added some optimization in Python Virtualenv that should optimize them quite a bit on their own and make much more stable (using venv caching from Jens) * The non-DB tests run on self-hosted runners run in under 5 minutes and in most PRs they will run only once *Overall:* * for most PRs with core changes instead of tests = 7x 20 (140) minutes build time we will have 7x 10 DB tests and + 1x 5 non-DB test = 75 minutes. This gives ~ 45% shorter build time needed * for many "structural" changes that include adding providers, changing builds and everything that require "full tests needed", instead of 20x 20 (400 minutes) build time we will have 20 x10 (DB) + 5x5 (Non-DB) minutes = 225 minutes which is a bit more than 45% improvement. * for many smaller PRs there will be smaller or greater improvements - depending on which area of code is involved - some parts have more DB tests percentage-wise, some less, but the improvements in all: stability, build time and elapsed time should be visible across the board. I have yet to gather the "Public Runners" numbers, but they should be similar - proportionally. Currently it looks like we have *9000 non-DB *tests and *7578 DB* tests. Those are only "unit" tests - this change does not impact "integration", "helm", "kubernetes" , "docker-compose" (and "system") tests of ours, so this will generally mean that now the focus should be put there - as they will become the ones that will have biggest on impact the "elapsed" time of the build (i.e. feedback time for contributor). But those are not run for many PRs and take far less "build" time. Looking forward to reviews (while I will be doing some final touches / adding docs). J.
