potiuk commented on code in PR #30705: URL: https://github.com/apache/airflow/pull/30705#discussion_r1170563693
########## dev/breeze/src/airflow_breeze/utils/selective_checks.py: ########## @@ -606,7 +606,43 @@ def parallel_test_types(self) -> str: ) test_types_to_remove.add(test_type) current_test_types = current_test_types - test_types_to_remove - return " ".join(sorted(current_test_types)) + for test_type in tuple(current_test_types): + if test_type == "Providers": + current_test_types.remove(test_type) + current_test_types.update( + ("Providers[amazon]", "Providers[google]", "Providers[-amazon,google]") + ) + elif test_type.startswith("Providers[") and "amazon" in test_type or "google" in test_type: + current_test_types.remove(test_type) + if "amazon" in test_type: + current_test_types.add("Providers[amazon]") + if "google" in test_type: + current_test_types.add("Providers[google]") Review Comment: We cannot run tests in parallell, because far too many of our tests rely on a shared database (for example connections are not mocked. DagRuns are created, etc. etc. ) . Simply speaking HUGE percent of our tests are not pure unit tests with everything mocked but they rely on a shared database to be there (and they prepare data. use it and sometimes delete and sometimes not when running). We even run all our tests WITH specific database. New tests that continue using the shared database are added / modified /updated in every PR. if we run them in parallel, the tests will start override each other data in the database. So we can do either 1) review the 12.500 of tests of ours and separate out "real unit test" from the "DB tests" and add mechanisms to keep the separaiton - then we would be able to parallelise the "real unit tests". Possibly even rewrite the tests to be "real unit tests" and mock the DB access 2) or do what we are doing - i.e. split the tests into more-or-less equal chunks (in terms of execution time) and run them sequentially, each of the test type with its own database (this is what we do now). Option 1) seems to require enormous effort - but if you (or anyone) would like to take on the task, this is a good idea. I would love to have it, but it seems not feasible (but I would love to be proven wrong) Option 2) make a deliberate effort to split the tests and balance-optimize them from time to time with few hours effort and custom parallel running framework (this is what we have now) Option 3.) ... I do not see a 3rd option But maybe there is one? Courious to hear your thoughts :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org