This is pretty interesting. I find myself doing a mental double-take when I realize that it is
- storing an ordered set of KVs, - in an ordered KV storage engine, - without relying on the ordering of the storage engine! If I’m understanding things correctly (and that’s a big if), adding a reduce function to a view completely changes how the “map” portion of that view is stored. A view with no reduce function stores the rows of the view as individual KVs in FDB, but the same map output when paired with a reduce function would be chunked up and stored in the leaf nodes of this b+tree. Has anyone compared the performance of those two different data models, e.g. by querying a map-only view and an mr-view with reduce=false? If it turns out that there’s a significant difference, is it worth considering a model where the leaf nodes of the b+tree just point to KV ranges in FDB, rather than holding the actual user-emitted KV data themselves? Then the inner nodes of the b+tree would just provide the data structure to incrementally maintain the aggregations. Inserts to that range of KVs during indexing would still go through the b+tree code path, but queries for the map portion of the view could go directly to the relevant range of KVs in FDB, skipping the traversal through the inner nodes and limiting the data transfer only to the rows requested by the client. Conversely, if the b+tree approach is actually better even without a user-supplied reduce function, shouldn’t we use it for all views? As an aside, I’m very glad to see a departure from couch_btree’s approach of dynamically modifying the “order” of the “b+tree” based on the size of the reduction. Such an ugly edge case there ... Cheers, Adam > On Jul 21, 2020, at 8:48 AM, Robert Newson <[email protected]> wrote: > > Thank you for those kind words. > > -- > Robert Samuel Newson > [email protected] > > On Tue, 21 Jul 2020, at 13:45, Jan Lehnardt wrote: >> Heya Garren an Bob, >> >> this looks really nice. I remember when this was a twinkle in our >> planning eyes. Seeing the full thing realised is very very cool. >> >> I’m additionally impressed by the rather pretty and clean code. >> This doesn’t have to to be hard :) >> >> Looking forward to see this in action. >> >> Best >> Jan >> — >> >>> On 21. Jul 2020, at 14:01, Garren Smith <[email protected]> wrote: >>> >>> Hi All >>> >>> We have a new reduce design for FoundationDB and we think this one will >>> work. >>> Recently I proposed a simpler reduce design [1] and at the same time, Bob >>> (rnewson) looked at implementing a B+tree [2], called ebtree, on top of >>> FoundationDB. The b+tree implementation has turned out really nicely, the >>> code is quite readable and works really well. I would like to propose that >>> instead of using the simpler reduce design I mentioned in the previous >>> email, we rather go with a reduce implementation on top of ebtree. The big >>> advantage of ebtree is that it allows us to keep the behaviour of CouchDB >>> 3.x. >>> >>> We have run some basic performance tests on the Cloudant performance >>> clusters and so far the performance is looking quite good and performs very >>> similar to my simpler reduce work. >>> >>> There is an unknown around the ebtree Order value. The Order is the number >>> of key/values stored for a node. We need to determine the optimal order >>> value for ebtree so that it doesn't exceed FoundationDB's key/value limits >>> and still performs well. This is something we will be looking at as we >>> finish up the reduce work. The work in progress for the reduce PR is >>> https://github.com/apache/couchdb/pull/3018. >>> >>> A great thanks to Bob for implementing the B+tree. I would love to hear >>> your thoughts or questions around this? >>> >>> Cheers >>> Garren >>> >>> [1] >>> https://lists.apache.org/thread.html/r1d77cf9bb9c86eddec57ca6ea2aad90f396ee5f0dfe43450f730b1cf%40%3Cdev.couchdb.apache.org%3E >>> >>> [2] https://github.com/apache/couchdb/pull/3017 >>> [3] https://github.com/apache/couchdb/pull/3018 >> >>
