Hi, I've updated this RFC https://github.com/apache/couchdb-documentation/pull/441 with an implementation of the skiplist algorithm using FDB.
Cheers Garren On Tue, Sep 24, 2019 at 4:01 PM Garren Smith <gar...@apache.org> wrote: > Hi All, > > Digging up this thread again just to let you know I wrote an RFC on > implementing reduce functions on top of FoundationDB > https://github.com/apache/couchdb-documentation/pull/441 > > Cheers > Garren > > On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <kocol...@apache.org> > wrote: > >> Hiya Garren, thanks for starting this one. A few comments inline: >> >> > On Apr 23, 2019, at 11:38 AM, Garren Smith <gar...@apache.org> wrote: >> > >> > Hi All, >> > >> > Following on from the map discussion, I want to start the discussion on >> > built-in reduce indexes. >> > >> > ## Builtin reduces >> > Builtin reduces are definitely the easier of the two reduce options to >> > reason about and design. The one factor to keep in mind for reduces is >> that >> > we need to be able reduce at different group levels. So a data model for >> > that would like this: >> > >> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE, >> > <group_level>, <group_key>, <_reduce_function_name>} -> >> <aggregrate_value>} >> >> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re >> intending for the actual checksum to go in that element of the tuple, >> right? I wonder about using a Directory in that element of the tuple; >> adding the full signature to every key will blow up the index substantially. >> >> - I don’t think the second ?VIEWS is needed — you already scoped to >> ?VIEWS above it. >> >> - Why is the <_reduce_function_name> after the <group_key>? If someone >> defines multiple reduce functions on a single view and then wants to do a >> range query on the grouped result this will mean reading and discarding the >> output from the other reduce function. It seems to me that it would make >> more sense to put this directly after ?REDUCE. >> >> - You could consider having specific constants for each builtin reduce >> function, e.g. ?SUM, instead of the full “_sum” byte string in >> <_reduce_function_name>, which would save several bytes per key. >> >> > >> > Most of that is similar to the map data model, where it changes is from >> the >> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys >> emitted >> > in the map function), then the group key used in the reduce, the reduce >> > function name e.g _sum, _count and then we store the aggregated value as >> > the FDB value. >> >> I think we need a better definition of <group_level> here. It’s not >> really the number of keys emitted from the map function, it’s the number of >> elements of an array key emitted by the map function that are used to >> determine the degree of aggregation. >> >> Also, before we even get to group_level we should describe how reduce >> functions work with keys that are not arrays. In the current API this is >> group=true (separate aggregations for each set of KV pairs that share the >> same key) or group=false (one aggregated result for the entire view). >> Internally in the current codebase we map these two settings to >> group_level=exact and group_level=0, respectively. Basically, we need a way >> to represent “exact” in the <group_level>. >> >> > ### Index management >> > >> > To update the reduce indexes, we will rely on the `id_index` and the >> > `update_seq` defined in the map discussion. Then to apply changes, we >> > calculate the change of an aggregate value for the keys at the highest >> > group level, then apply that change to all the group levels lower than >> it >> > using fdb’s atomic operations [1]. >> >> This is a really important point and worth calling out more explicitly. >> One reason most of the builtins are significantly simpler than the custom >> functions is because we know exactly how the aggregate values change when >> we get the map output for each updated doc in isolation. We don’t need to >> go retrieve all the rest of the map output that happens to share the same >> map key. Moreover, the change to the aggregate value can be applied using >> atomic operations so we don’t need to do anything special to avoid txn >> conflicts for views where lots of documents emit the same map key. >> >> > ### Reducer functions >> > >> > The FDB’s atomic functions support all the built in reduce functions >> > CouchDB supports. So we can use those as part of our red function. For >> the >> > `_stats` reduce function, we will have to split that across multiple key >> > values. So its data model will have an extra key in it to record what >> stat >> > it is for the _stats reducer: >> > >> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE, >> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} -> >> > <aggregrate_value>} >> > >> > We do have some problems, with `_approx_count_distinct` because it does >> > not support removing keys from the filter. So we have three options: >> > >> > 1. We can ignore key removal entirely in the filter since this is just >> an >> > estimate >> > 2. Implement a real COUNT DISTINCT function, we can do because we’re >> not >> > trying to merge results from different local shards >> > 3. Don’t support it going forward >> >> There’s actually a fourth option here, which is to maintain >> _approx_count_distinct using the future machinery for custom reduce >> functions, generating a new filter from the raw map output at the lowest >> level when keys are removed. I don’t have an informed opinion yet about >> that approach. >> >> > ### Group_level=0 >> > >> > One tricker situation is if a user does a group_level=0 query with a key >> > range, this would require us to do some client level aggregation. We >> would >> > have to get the aggregate values for a `group_level=1` for the supplied >> key >> > range and then aggregate those values together. >> >> This isn’t unique to group_level=0, it could happen for any >> group_level!=exact. It can also require descending more than one >> group_level depending on the precision of the supplied key range; e.g. for >> a date-based key like [YYYY, MM, DD]; someone could ask for yearly >> aggregations with group_level=1 but supply a startkey and endkey with >> specific dates rather than months. >> >> Good start! Cheers, Adam >> >>