(I'm excited about this list! There have been some topics I've wanted to bring
up that are too implementation-oriented for the user@ list, but I haven't been
brave enough to dive into the dev@ list because I don't know Erlang or the
internals of CouchDB. I also really appreciate folks sharing the viewpoint that
CouchDB is an ecosystem and an open replication protocol, not just a particular
database implementation.)
Anyway. One topic I'd like to bring up is that, in my non-scientific
observations, the major performance bottleneck in pull replications is the fact
that revisions have to be transferred using individual GET requests. I've seen
very poor performance when pulling lots of small documents from a distant
server, like an order of magnitude below the throughput of sending a single
huge document.
(Yes, it's possible to get multiple revisions at once by POSTing to _all_docs.
Unfortunately this has limitations that make it unsuitable for replication; see
my explanation at the page linked below.)
A few months ago I experimentally implemented a new "_bulk_get" REST call in
Couchbase's replicators (Couchbase Lite and the Sync Gateway), which
significantly improves performance by allowing the puller to request any number
of revisions in a single HTTP request. Again, no scientific tests or hard
numbers, but it was enough to convince me it's worthwhile. I've documented it
here:
https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
It's pretty straightforward and I've tried to make it consistent with the
standard API. The only unusual thing is that the response can contain nested
MIME multipart bodies: the response format is multipart, with every requested
revision in a part, but revisions containing attachments are themselves sent as
multipart. (This shouldn't be an issue for any decent multipart parser, since
nested multipart is pretty common in emails, but I think it's the first time
it's happened in the CouchDB API.)
I'd be happy if this were implemented in CouchDB and made an official part of
the API. Hopefully the spec I wrote is detailed enough to make that
straightforward. (I don't have the Erlang skills to do it myself, though.)
—Jens