Hi all, me again. This one will be shorter :) As I see it we have three 
different options for serving the _all_docs endpoint from FDB: 

## Option 1: Read the document data, discard the bodies

We likely will have the documents stored in docid order already; we could do 
range reads and discard everything but the ID and _rev by default. This can be 
a very efficient implementation of include_docs=true (though one needs to be 
careful about skipping the conflict bodies), but pretty wasteful otherwise.

## Option 2: Read the “revisions” subspace

We also have an entry for every document in ID order in the “revisions” 
subspace. The disadvantage of this approach is that every deleted edit branch 
shows up there, too, and some databases will have lots of deleted documents. We 
may need to build skiplists to know how to scan efficiently. This subspace is 
also doing a lot of heavy lifting for us already, and if we wanted to toy with 
alternative revision history representations in the future it could get 
complicated

## Option 3: Add specific entries to support _all_docs

We can also write an extra KV containing the ID and winning _rev in a special 
subspace just to support this endpoint. It would be a blind write because we’re 
already coordinating concurrent transactions through reads on the “revisions” 
subspace. This would be conceptually quite clean and simple, and the fastest 
implementation for constructing the default response.

===

My sense is Option 2 is a non-starter but I include it for completeness in case 
anyone else thought of the same. I think Option 3 is a reasonable space / 
efficiency / simplicity tradeoff, and it might also be worth testing out Option 
1 as an optimized implementation for include_docs=true.

Thoughts? I imagine we can move quickly to an RFC for at least having the extra 
KVs for Option 3, and in that design also acknowledge the option for scanning 
the docs space directly to support include_docs.

Adam

Reply via email to