(I'm excited about this list! There have been some topics I've wanted to bring 
up that are too implementation-oriented for the user@ list, but I haven't been 
brave enough to dive into the dev@ list because I don't know Erlang or the 
internals of CouchDB. I also really appreciate folks sharing the viewpoint that 
CouchDB is an ecosystem and an open replication protocol, not just a particular 
database implementation.)

Anyway. One topic I'd like to bring up is that, in my non-scientific 
observations, the major performance bottleneck in pull replications is the fact 
that revisions have to be transferred using individual GET requests. I've seen 
very poor performance when pulling lots of small documents from a distant 
server, like an order of magnitude below the throughput of sending a single 
huge document.

(Yes, it's possible to get multiple revisions at once by POSTing to _all_docs. 
Unfortunately this has limitations that make it unsuitable for replication; see 
my explanation at the page linked below.)

A few months ago I experimentally implemented a new "_bulk_get" REST call in 
Couchbase's replicators (Couchbase Lite and the Sync Gateway), which 
significantly improves performance by allowing the puller to request any number 
of revisions in a single HTTP request. Again, no scientific tests or hard 
numbers, but it was enough to convince me it's worthwhile. I've documented it 
here:
        https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
It's pretty straightforward and I've tried to make it consistent with the 
standard API. The only unusual thing is that the response can contain nested 
MIME multipart bodies: the response format is multipart, with every requested 
revision in a part, but revisions containing attachments are themselves sent as 
multipart. (This shouldn't be an issue for any decent multipart parser, since 
nested multipart is pretty common in emails, but I think it's the first time 
it's happened in the CouchDB API.)

I'd be happy if this were implemented in CouchDB and made an official part of 
the API. Hopefully the spec I wrote is detailed enough to make that 
straightforward. (I don't have the Erlang skills to do it myself, though.)

—Jens

Reply via email to