*TL;DR; I have a proposal how we can remedy often failing CI tests and I have a kind request to other committers to help me to fix it in a good way.*
As we all noticed we had recently some often (far too often) failing tests in Travis. The situation is not very good and we have to remedy it fairly quickly. I think we can do it without compromising the quality and without temporary disabling some of the tests. *Root cause of the problem* The root cause of the problem seems to be memory used during tests. After adding instafail we know that often the tests are failing because there is not enough memory on Travis machines. This is a combination of the way how our virtual machines are allocated in Travis infrastructure, more and more tests we have, the fact that our tests require a lot of "integrations" (running as separate images - cassandra, rabbitmq, postgres/mysql databases etc) and the fact that running them with pytests (pytest apparently uses more memory). *Proposal* One of the proposals on slack was to get rid of Cassandra tests and disable cassandra temporarily - but I think we can do better and I can get it merged in a day or two and get it sorted out for now (and good for the future). I already wrote an integration test proposal recently <https://lists.apache.org/thread.html/120af497f4adf482162be9583a93651fd206b71db213255f52ad8b7a%40%3Cdev.airflow.apache.org%3E> (I will resurrect that thread now) how we can split our integration tests using pytest markers and get support from Breeze and our CI testing framework into separate integrations. I already have working code for that (it is a result of my resumed work on Production Image) and most of the code is already Green in Travis and they need to get a review from other committers. At the end of the message I copy the excerpt from the docs how this will work. Once we have that in, we will have a very easy (and maintainable for the future) way that helps both with CI resources but also make Breeze far more usable (and less resource-hungry): - add pytest markers so that we know which tests are "integration" ones. - start Breeze locally without any external integrations (right now only Kerberos is needed) - most of the tests works there. Far less resource usage - start Breeze easily with *--integration mongo --integration cassandra *etc. whenever we need to run tests for that integration - run all the "non-integration" tests in CI without the integrations started - run only the "integration-related" tests in CI with the integrations started - we will have more jobs in CI but they should run much more reliably and faster in general - also one of the changes is to improve the way we build kind/kubernetes tests in order to unblock migration to GithubActions that Tomek works on - that might be our ultimate speedup/stabilisation - For those curious ones is updated documentation in my PR: "Launching Breeze integrations" <https://github.com/PolideaInternal/airflow/blob/separate-integrations/BREEZE.rst#launching-breeze-integrations>, "Running <https://github.com/PolideaInternal/airflow/blob/separate-integrations/BREEZE.rst#running-tests-with-kubernetes-in-breeze> tests with Kubernetes in Breeze" <https://github.com/PolideaInternal/airflow/blob/separate-integrations/BREEZE.rst#running-tests-with-kubernetes-in-breeze> *PRs - kind request to other committers* I have a series of PRs that are already implementing almost all of it (I needed that in order to implement Production Image support). They are depending on each other - I added unit test support for Bash scripts and several improvements and added simplifications: - [AIRFLOW-6489] Add BATS support for Bash unit testing <https://github.com/apache/airflow/pull/7081> [ready for review] - needed to get more control over other changes. - [AIRFLOW-6475] Remove duplication of volume mount specs in Breeze. <https://github.com/apache/airflow/pull/7065>[ready for review]- improves the consistency on how we run Breeze/CI - [AIRFLOW-6491] improve parameter handling in breeze <https://github.com/apache/airflow/pull/7084> [ready for review] - tested and improved way how we handle --options in Breeze (needed for Kubernetes improvements - [AIRFLOW-5704] Improve Kind Kubernetes scripts for local testing <https://github.com/apache/airflow/pull/6516>. - [testing] improve handling Kubernetes Kind testing (fixes issues with loading images/ upgrades kind to latest version). - [AIRFLOW-6489] Separate integrations [WIP] <https://github.com/apache/airflow/pull/7091> - [WIP] this is the test introducing different integrations - it already works for Breeze and has support for deciding which integrations should be started - I just need to separate out the "Integration tests" to separate jobs. I have a kind request to other committers - can you please take a look at those and help to merge it quickly? I have also follow-up PRs for production image, but the above PRs should help us to solve the CI crisis in a good way and let me continue on the prod image ones. J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
