This is pretty interesting. I find myself doing a mental double-take when I 
realize that it is

- storing an ordered set of KVs,
- in an ordered KV storage engine,
- without relying on the ordering of the storage engine!

If I’m understanding things correctly (and that’s a big if), adding a reduce 
function to a view completely changes how the “map” portion of that view is 
stored. A view with no reduce function stores the rows of the view as 
individual KVs in FDB, but the same map output when paired with a reduce 
function would be chunked up and stored in the leaf nodes of this b+tree. Has 
anyone compared the performance of those two different data models, e.g. by 
querying a map-only view and an mr-view with reduce=false?

If it turns out that there’s a significant difference, is it worth considering 
a model where the leaf nodes of the b+tree just point to KV ranges in FDB, 
rather than holding the actual user-emitted KV data themselves? Then the inner 
nodes of the b+tree would just provide the data structure to incrementally 
maintain the aggregations. Inserts to that range of KVs during indexing would 
still go through the b+tree code path, but queries for the map portion of the 
view could go directly to the relevant range of KVs in FDB, skipping the 
traversal through the inner nodes and limiting the data transfer only to the 
rows requested by the client.

Conversely, if the b+tree approach is actually better even without a 
user-supplied reduce function, shouldn’t we use it for all views?

As an aside, I’m very glad to see a departure from couch_btree’s approach of 
dynamically modifying the “order” of the “b+tree” based on the size of the 
reduction. Such an ugly edge case there ...

Cheers, Adam

> On Jul 21, 2020, at 8:48 AM, Robert Newson <rnew...@apache.org> wrote:
> 
> Thank you for those kind words. 
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org
> 
> On Tue, 21 Jul 2020, at 13:45, Jan Lehnardt wrote:
>> Heya Garren an Bob,
>> 
>> this looks really nice. I remember when this was a twinkle in our
>> planning eyes. Seeing the full thing realised is very very cool.
>> 
>> I’m additionally impressed by the rather pretty and clean code.
>> This doesn’t have to to be hard :)
>> 
>> Looking forward to see this in action.
>> 
>> Best
>> Jan
>> —
>> 
>>> On 21. Jul 2020, at 14:01, Garren Smith <gar...@apache.org> wrote:
>>> 
>>> Hi All
>>> 
>>> We have a new reduce design for FoundationDB and we think this one will
>>> work.
>>> Recently I proposed a simpler reduce design [1] and at the same time, Bob
>>> (rnewson) looked at implementing a B+tree [2], called ebtree, on top of
>>> FoundationDB. The b+tree implementation has turned out really nicely, the
>>> code is quite readable and works really well. I would like to propose that
>>> instead of using the simpler reduce design I mentioned in the previous
>>> email, we rather go with a reduce implementation on top of ebtree. The big
>>> advantage of ebtree is that it allows us to keep the behaviour of CouchDB
>>> 3.x.
>>> 
>>> We have run some basic performance tests on the Cloudant performance
>>> clusters and so far the performance is looking quite good and performs very
>>> similar to my simpler reduce work.
>>> 
>>> There is an unknown around the ebtree Order value. The Order is the number
>>> of key/values stored for a node. We need to determine the optimal order
>>> value for ebtree so that it doesn't exceed FoundationDB's key/value limits
>>> and still performs well. This is something we will be looking at as we
>>> finish up the reduce work. The work in progress for the reduce PR is
>>> https://github.com/apache/couchdb/pull/3018.
>>> 
>>> A great thanks to Bob for implementing the B+tree. I would love to hear
>>> your thoughts or questions around this?
>>> 
>>> Cheers
>>> Garren
>>> 
>>> [1]
>>> https://lists.apache.org/thread.html/r1d77cf9bb9c86eddec57ca6ea2aad90f396ee5f0dfe43450f730b1cf%40%3Cdev.couchdb.apache.org%3E
>>> 
>>> [2] https://github.com/apache/couchdb/pull/3017
>>> [3] https://github.com/apache/couchdb/pull/3018
>> 
>> 

Reply via email to