potiuk commented on code in PR #35160: URL: https://github.com/apache/airflow/pull/35160#discussion_r1376495329
########## tests/operators/test_python.py: ########## @@ -844,17 +848,30 @@ def f(exit_code): assert ti.state == expected_state +venv_cache_path = tempfile.mkdtemp(prefix="venv_cache_path") Review Comment: Yeah. Isolation is a key. And we cannot get isolation when some (most of the DB ones) of our tests do something like this: ```python @pytest.fixture(autouse=True) def base_tests_setup(self, request, create_task_instance_of_operator, dag_maker): self.dag_id = f"dag_{slugify(request.cls.__name__)}" self.task_id = f"task_{slugify(request.node.name, max_length=40)}" self.run_id = f"run_{slugify(request.node.name, max_length=40)}" self.ds_templated = self.default_date.date().isoformat() self.ti_maker = create_task_instance_of_operator self.dag_maker = dag_maker self.dag = self.dag_maker(self.dag_id, template_searchpath=TEMPLATE_SEARCHPATH).dag clear_db_runs() yield clear_db_runs() ``` This is actual code from `BasePythonTest` which is base for all the Python tests. if you see at the last three lines ... all dag_runs will be cleared alwasys before and after every test. This is the implementation: ```python def clear_db_runs(): with create_session() as session: session.query(Job).delete() session.query(Trigger).delete() session.query(DagRun).delete() session.query(TaskInstance).delete() ``` This is of course key for the test to have proper setup/teardown but also it means that if it is run via `pytest-xdist` it wil randomly delete all the dag_runs while other tests are using the same database... (and likely most of the 7000 other DB tests is doing something very similar to achieve reproducibility). Not good. Not good at all. And even if we could fix it in some tests, there is no way we prevent someone sneaking in a fixture or setup_method with `session.query(DagRun).delete()` in order to "clean-up" before and after the test (which is good idea as long as it does not impact other tests running in paralllel) . Not when we merge 30 PRs a day. And one faulty test can start randomly break any other tests, chasing such a "bad apple" would be a nightmare. That's why I think there are three reasonable approaches: 1) Split your DB tests in serialized test groups where tests are executed sequentially in a group and each group has their own DB. This is what we have now. In our self-hosted runners in CI we have up to 8 (this is how many CPUS we have) complete docker compose instances running - each of them having a separate DB and each of them running sequentially a subset of tests (TEST_TYPE). In Public Runners we have up to 2 of those (because we only have 2 CPUS). This powers our tests for years now and saves countless hours of build time utilising parallelism of those instances - I implemented it long time ago :) 2) separate out tests that do not use DB (this is what this PR aims for) and run on them with xdist and to add mechanism to keep it working for the future - detecting running DB tests in `xdist` environment. And good thing is that we can "graduate" some tests from DB to non-DB easily after this PR gets merged (just make them not use DB and remove the marker), This is what I refer to 7000 -> 1500 DB tests. That should be our goal IMHO (roughly - I made that 1500 number up of course, just thinking 10% of our tests should be DB tests) 3) Give each test, their own DB. That's probably far too costly as an overhead (especially that with DB tests we want to - likely test all supported DBs). I attempted once to expand the approach from 1) and have a separate DB "per test module" rather than "per test group" - so instead of 12 or so TEST_TYPES we have now to have 200 or so separately started docker-compose instances with DB and CI containers (up to 8 at a time of course)- but the overhead for starting/stoping images and DBs was so huge that the tests took way more than now to complete (at least 5 times more I think) So I believe 1) is a good middle-ground between parallelism and isolation and when we get far less number of DB tests by followng 2) route and moving more tests to be non-DB, we will get "optimal" solution. I think chasing "let's get the DB tests run with xdist` might not achieve the goal due to overhead/complexity/flakiness , -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org