Dear community, I have some great news that I would like to share with you.
About three weeks ago, I submitted our application for the “Sovereign Tech Agency - Bug Resilience Program”. I have received confirmation that our application was successful. 🙀🎉🚀 Some members of the PMC have been thinking about which parts of CouchDB could be optimized and improved. We would like to share this list with you to get further feedback. If you have any additional ideas or thoughts on the existing points, please let us know by Sunday (2025-09-28). On behalf of the CouchDB PMC, Ronny Initial project ideas =============== # CouchDB STA Bug Resilience Program Projects ## Improve Testing As a distributed database with strong durabilty promises, CouchDB takes automated testing very seriously. Over various test suites a test run even on very modern hardware can take 10 to 20 minutes. In CI with a wide matrix, this can lead to PR and main branch CI checks that run for hours. This slows down both individual developer productivity as well as project velocity. CouchDB has multiple test suites (eunit, exunit and Python) that have grown over time with little coordination or continued maintenance. There are multiple areas of work that could lead to significant improvements. The main goal for all of these is to speed up the test suite so: - developers working on features have a faster edit-test-repeat cycle: - currently even running a single test takes tens of seconds because of complex setup and teardown steps - in addition all tests are run in sequence, so even if a developer needs a larger subset or even all tests, not all available computing resources are being used to run the tests as fast as possible - CI jobs take a long time and can benefit from improving setup and teardown mechanisms, faster test execution and additional caching of intermediate results across multiple runs. - this benefits the project development velocity and cuts the time we need to get critical seceurity fixes into releases - cuts down on computing needs to run CI, so we can either run even more CI with the same resources the project currently has, or scale down CI The areas of work we have identified so far: 1. Parallelising eunit tests eunit tests are written in Erlang both as separate test files as well as (by eunit convention) inline with many modules. All eunit tests are executed sequentially. They are a combination of unit tests and integration tests. On a multi-core machine, it is easily noticable that CPU cores are idling while the test suite is running. That means there are computing resources available that the test suite is not using. Investigate how to safely parallelise the execution of eunit tests, so CPUs can be maxed out and test runs sped up. Make sure the parallelism can also be disabled for specific runs for debugging purposes. 2. Parallelising exunit tests exunit tests are written in Elixir as seprate test files. They are integration tests. On a multi-core machine, it is easily noticable that CPU cores are idling while the test suite is running. That means there are computing resources available that the test suite is not using. Investigate how to safely parallelise the execution of exunit tests, so CPUs can be maxed out and test runs sped up. Make sure the parallelism can also be disabled for specific runs for debugging purposes. 3. Run test suites in parallel Should projects 1. and 2. still leave CPU resources available, investigate if multiple test suites (eunit, exunit, Python…) can be run in parallel to futher maximise resource use. 4. Speed up setup and teardown Investiage ways to safely speed up our individual test setup and teardown code. These bits of code are run many times during a test run and any improvement here will benefit all of the above scenarios. 5. Convert Python test suite to eunit or exunit A sub-component of CouchDB (Mango) has historically been tested with a Python-based test suite. It is an outlier among all other testing which is either done in Erlang or Elixir, both of which are Erlang-ecosystem-native options while Python is not. Both writing new tests for this component as well as maintaining this test suite requires a completely different set of skills than the rest of the testing environment. CouchDB testing would be greatly improved by converting these Python-based tests into the existing eunit or exunit test suites. Either is fine. 6. Add caching to CI CouchDB CI runs the various test suites on a wide variety of different operating systems and multiple versions and architectures. As such, many operations are repeated during a single test run across all of these variants _as well as_ across multiple runs of these variants. Some of these operations produce intermediate results that could be cache for a single or even across multiple matrix builds. One example is the `npm install` run for CouchDB’s web-based Admin panel. This technically only needs to run when we update the dependency version (rarely), but to date, we run it on each run in each variant, leading to lots of duplicated work. Identify more of these examples and implement a strong caching policy that reduces duplicate work and saves time. Note this probably will depend on using a caching plugin for the Jenkins CI setup that CouchDB is running. During preparation of the Statement of Work (SoW), it should be investigated if the CouchDB setup can easily make use of this plugin and if it solves the above issues as we anticipate. ## Improve Operational Security A completely separate area of CouchDB is its security system. It also has grown organically over a long time. While the project probably can’t overhaul this system completely within this project scope, there is a sub-component could be greatly improved. CouchDB defines access to databases in three tiers: 1. Per-user authentication (prove who you are) 2. Per-user and per-role authorizaion (what are you allowed to do) 3. Per-database document update validations (VDU) All of these are optional, but 1. and 2. are enabled by default. 3. remains optional since not all applications need it. While 1. and 2. are declarative, 3. is procedural and expressed in JavaScript code. This makes it a very flexible system, but it means that if used, each and every write operation into CouchDB (create, update, delete) requires shelling out to a JavaScript process (worst case even fork one first). In practice, the implementation of this is so slow, that the CouchDB team does not recommend using this security mechanism for even medium-scale setups. This presents a substantial wart in the CouchDB security line-up and the project would like to introduce a new, declaratively provided way of achieving most of the same things that VDUs are being used for. VDUs currently can do both type assertion / input validation as well as access control. As a first order of business, we would like to separate these concerns into distrinct mechanisms. We have an existing discussion with a straw-proposal to get an idea for what this would look like: https://github.com/apache/couchdb/issues/1554 The goal of this task would be to design and implement these new mechanisms. ===============
