Hi,
CouchDB 2.0 introduced clustering; the ability to scale a single database
across multiple nodes, increasing both the maximum size of a database and
adding native fault-tolerance. This welcome and considerable step forward was
not without its trade-offs. In the years since 2.0 was released, users
frequently encounter the following issues as a direct consequence of the 2.0
clustering approach:
1. Conflict revisions can be created on normal concurrent updates issued to a
single database, since each replica of a database shard independently chooses
whether to accept a given update, and all replicas will eventually propagate
updates that any one of them has chosen to accept.
2. Secondary indexes ("views") do not scale the same way as document lookups,
as they are sharded by doc id, not emitted view key (thus forcing a
consultation of all shard ranges for each query).
3. The changes feed is no longer totally ordered and, worse, could replay
earlier changes in the event of a node failure (even a temporary one).
The idea is to use FoundationDB as the new CouchDB foundational layer, letting
it take care of data storage and placement. An introduction to FoundationDB
would take up too much space here so I will summarise it as a highly scalable
ordered key-value store with transactional semantics, provides strong
consistency, scaling from a single node to many. It is licensed under the ASLv2
but is not an Apache project.
By using FoundationDB we can solve all three of the problems listed above and
deliver semantics much closer to CouchDB 1.x's behaviour while improving upon
the scalability advantages that 2.0 introduced. The essential character of
CouchDB would be preserved (MVCC for documents, replication between CouchDB
databases) but the underlying plumbing would change significantly. In addition,
this new foundation will allow us to add long wished-for features more easily.
For example, multi-document transactions become possible, as does efficient
field-level reading and writing. A further thought is the ability to update
views transactionally with the database update.
For those familiar with the CouchDB 2.0 architecture, the proposal is, in
effect, to change all the functions in fabric.erl so that they work against a
(possibly remote) FoundationDB cluster instead of the current implementation of
calling into the original CouchDB 1.x code (couch_btree, couch_file, etc).
This is a large change and, for full disclosure, the IBM Cloudant team are
proposing it. We have done our due diligence in investigating FoundationDB as
well as detailed investigation into how CouchDB semantics would be built on top
of FoundationDB. Any and all decisions on that must take place here on the
CouchDB developer mailing list, of course, but we are confident that this is
feasible.
During those investigations we have identified a small number of CouchDB
features that we do not yet see a way to do on FoundationDB, the main one being
custom (Javascript) reduces. This is a direct consequence of no longer rolling
our own persistence layer (couch_btree and friends) and would likely apply to
any alternative technology.
I think this would be a great advance for CouchDB, preserving what makes
CouchDB special but taking advantage of the superbly engineered FoundationDB
software at the bottom of the stack.
Regards,
Robert Newson