+1 on what Bob said.

> On 21. Mar 2019, at 20:57, Robert Newson <rnew...@apache.org> wrote:
> 
> Hi,
> 
> Thanks for pushing forward, and I owe feedback on other threads you've 
> started.
> 
> Rather feebly, I'm just agreeing with you. option 3 for include_docs=false 
> and option 1 for include_docs=true sounds ideal. both flavours are very 
> common so it makes sense to build a solution for each. At a pinch we can just 
> do option 3 + async doc lookups in a first release and then circle back, but 
> the RFC should propose 1 and 3 as our design intention.
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org
> 
> On Thu, 21 Mar 2019, at 19:50, Adam Kocoloski wrote:
>> Hi all, me again. This one will be shorter :) As I see it we have three 
>> different options for serving the _all_docs endpoint from FDB: 
>> 
>> ## Option 1: Read the document data, discard the bodies
>> 
>> We likely will have the documents stored in docid order already; we 
>> could do range reads and discard everything but the ID and _rev by 
>> default. This can be a very efficient implementation of 
>> include_docs=true (though one needs to be careful about skipping the 
>> conflict bodies), but pretty wasteful otherwise.
>> 
>> ## Option 2: Read the “revisions” subspace
>> 
>> We also have an entry for every document in ID order in the “revisions” 
>> subspace. The disadvantage of this approach is that every deleted edit 
>> branch shows up there, too, and some databases will have lots of 
>> deleted documents. We may need to build skiplists to know how to scan 
>> efficiently. This subspace is also doing a lot of heavy lifting for us 
>> already, and if we wanted to toy with alternative revision history 
>> representations in the future it could get complicated
>> 
>> ## Option 3: Add specific entries to support _all_docs
>> 
>> We can also write an extra KV containing the ID and winning _rev in a 
>> special subspace just to support this endpoint. It would be a blind 
>> write because we’re already coordinating concurrent transactions 
>> through reads on the “revisions” subspace. This would be conceptually 
>> quite clean and simple, and the fastest implementation for constructing 
>> the default response.
>> 
>> ===
>> 
>> My sense is Option 2 is a non-starter but I include it for completeness 
>> in case anyone else thought of the same. I think Option 3 is a 
>> reasonable space / efficiency / simplicity tradeoff, and it might also 
>> be worth testing out Option 1 as an optimized implementation for 
>> include_docs=true.
>> 
>> Thoughts? I imagine we can move quickly to an RFC for at least having 
>> the extra KVs for Option 3, and in that design also acknowledge the 
>> option for scanning the docs space directly to support include_docs.
>> 
>> Adam

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Reply via email to