CouchDB STA Bug Resilience Program Projects

Ronny Berndt Wed, 24 Sep 2025 23:27:28 -0700

Dear community,

I have some great news that I would like to share with you.


About three weeks ago, I submitted our application for the “Sovereign
Tech Agency - Bug Resilience Program”.
I have received confirmation that our application was successful. 🙀🎉🚀

Some members of the PMC have been thinking about which parts of
CouchDB could be optimized and improved.
We would like to share this list with you to get further feedback.

If you have any additional ideas or thoughts on the existing points,
please let us know by Sunday (2025-09-28).

On behalf of the CouchDB PMC,
Ronny


Initial project ideas
===============

# CouchDB STA Bug Resilience Program Projects

## Improve Testing

As a distributed database with strong durabilty promises, CouchDB takes
automated testing very seriously. Over various test suites a test run even
on very modern hardware can take 10 to 20 minutes. In CI with a wide
matrix, this can lead to PR and main branch CI checks that run for hours.
This slows down both individual developer productivity as well as project
velocity.

CouchDB has multiple test suites (eunit, exunit and Python) that have grown
over time with little coordination or continued maintenance.

There are multiple areas of work that could lead to significant
improvements. The main goal for all of these is to speed up the test suite
so:

- developers working on features have a faster edit-test-repeat cycle:
  - currently even running a single test takes tens of seconds because of
complex setup and teardown steps
  - in addition all tests are run in sequence, so even if a developer needs
a larger subset or even all tests, not all available computing resources
are being used to run the tests as fast as possible
- CI jobs take a long time and can benefit from improving setup and
teardown mechanisms, faster test execution and additional caching of
intermediate results across multiple runs.
  - this benefits the project development velocity and cuts the time we
need to get critical seceurity fixes into releases
  - cuts down on computing needs to run CI, so we can either run even more
CI with the same resources the project currently has, or scale down CI

The areas of work we have identified so far:

1. Parallelising eunit tests

   eunit tests are written in Erlang both as separate test files as well as
(by eunit convention) inline with many modules. All eunit tests are
executed sequentially. They are a combination of unit tests and integration
tests.

   On a multi-core machine, it is easily noticable that CPU cores are
idling while the test suite is running. That means there are computing
resources available that the test suite is not using.

   Investigate how to safely parallelise the execution of eunit tests, so
CPUs can be maxed out and test runs sped up. Make sure the parallelism can
also be disabled for specific runs for debugging purposes.


2. Parallelising exunit tests

   exunit tests are written in Elixir as seprate test files. They are
integration tests.

   On a multi-core machine, it is easily noticable that CPU cores are
idling while the test suite is running. That means there are computing
resources available that the test suite is not using.

   Investigate how to safely parallelise the execution of exunit tests, so
CPUs can be maxed out and test runs sped up. Make sure the parallelism can
also be disabled for specific runs for debugging purposes.


3. Run test suites in parallel

   Should projects 1. and 2. still leave CPU resources available,
investigate if multiple test suites (eunit, exunit, Python…) can be run in
parallel to futher maximise resource use.


4. Speed up setup and teardown

   Investiage ways to safely speed up our individual test setup and
teardown code. These bits of code are run many times during a test run and
any improvement here will benefit all of the above scenarios.

5. Convert Python test suite to eunit or exunit

   A sub-component of CouchDB (Mango) has historically been tested with a
Python-based test suite. It is an outlier among all other testing which is
either done in Erlang or Elixir, both of which are Erlang-ecosystem-native
options while Python is not.

   Both writing new tests for this component as well as maintaining this
test suite requires a completely different set of skills than the rest of
the testing environment.

   CouchDB testing would be greatly improved by converting these
Python-based tests into the existing eunit or exunit test suites. Either is
fine.

6. Add caching to CI

   CouchDB CI runs the various test suites on a wide variety of different
operating systems and multiple versions and architectures.

   As such, many operations are repeated during a single test run across
all of these variants _as well as_ across multiple runs of these variants.

   Some of these operations produce intermediate results that could be
cache for a single or even across multiple matrix builds.

   One example is the `npm install` run for CouchDB’s web-based Admin
panel. This technically only needs to run when we update the dependency
version (rarely), but to date, we run it on each run in each variant,
leading to lots of duplicated work.

   Identify more of these examples and implement a strong caching policy
that reduces duplicate work and saves time.

   Note this probably will depend on using a caching plugin for the Jenkins
CI setup that CouchDB is running. During preparation of the Statement of
Work (SoW), it should be investigated if the CouchDB setup can easily make
use of this plugin and if it solves the above issues as we anticipate.

## Improve Operational Security

A completely separate area of CouchDB is its security system. It also has
grown organically over a long time. While the project probably can’t
overhaul this system completely within this project scope, there is a
sub-component could be greatly improved.

CouchDB defines access to databases in three tiers:

1. Per-user authentication (prove who you are)
2. Per-user and per-role authorizaion (what are you allowed to do)
3. Per-database document update validations (VDU)

All of these are optional, but 1. and 2. are enabled by default. 3. remains
optional since not all applications need it.

While 1. and 2. are declarative, 3. is procedural and expressed in
JavaScript code. This makes it a very flexible system, but it means that if
used, each and every write operation into CouchDB (create, update, delete)
requires shelling out to a JavaScript process (worst case even fork one
first). In practice, the implementation of this is so slow, that the
CouchDB team does not recommend using this security mechanism for even
medium-scale setups.

This presents a substantial wart in the CouchDB security line-up and the
project would like to introduce a new, declaratively provided way of
achieving most of the same things that VDUs are being used for.

VDUs currently can do both type assertion / input validation as well as
access control. As a first order of business, we would like to separate
these concerns into distrinct mechanisms. We have an existing discussion
with a straw-proposal to get an idea for what this would look like:
https://github.com/apache/couchdb/issues/1554

The goal of this task would be to design and implement these new mechanisms.

===============

CouchDB STA Bug Resilience Program Projects

Reply via email to