Hiya Garren, thanks for starting this one. A few comments inline:

> On Apr 23, 2019, at 11:38 AM, Garren Smith <gar...@apache.org> wrote:
> 
> Hi All,
> 
> Following on from the map discussion, I want to start the discussion on
> built-in reduce indexes.
> 
> ## Builtin reduces
> Builtin reduces are definitely the easier of the two reduce options to
> reason about and design. The one factor to keep in mind for reduces is that
> we need to be able reduce at different group levels. So a data model for
> that would like this:
> 
> {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> <group_level>, <group_key>, <_reduce_function_name>} -> <aggregrate_value>}

- I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re 
intending for the actual checksum to go in that element of the tuple, right? I 
wonder about using a Directory in that element of the tuple; adding the full 
signature to every key will blow up the index substantially.

- I don’t think the second ?VIEWS is needed — you already scoped to ?VIEWS 
above it.

- Why is the <_reduce_function_name> after the <group_key>? If someone defines 
multiple reduce functions on a single view and then wants to do a range query 
on the grouped result this will mean reading and discarding the output from the 
other reduce function. It seems to me that it would make more sense to put this 
directly after ?REDUCE.

- You could consider having specific constants for each builtin reduce 
function, e.g. ?SUM, instead of the full “_sum” byte string in 
<_reduce_function_name>, which would save several bytes per key.

> 
> Most of that is similar to the map data model, where it changes is from the
> ?REDUCE subspace, we add the group_level (from 1 -> number of keys emitted
> in the map function), then the group key used in the reduce, the reduce
> function name e.g _sum, _count and then we store the aggregated value as
> the FDB value.

I think we need a better definition of <group_level> here. It’s not really the 
number of keys emitted from the map function, it’s the number of elements of an 
array key emitted by the map function that are used to determine the degree of 
aggregation.

Also, before we even get to group_level we should describe how reduce functions 
work with keys that are not arrays. In the current API this is group=true 
(separate aggregations for each set of KV pairs that share the same key) or 
group=false (one aggregated result for the entire view). Internally in the 
current codebase we map these two settings to group_level=exact and 
group_level=0, respectively. Basically, we need a way to represent “exact” in 
the <group_level>.

> ### Index management
> 
> To update the reduce indexes, we will rely on the `id_index` and the
> `update_seq` defined in the map discussion. Then to apply changes,  we
> calculate the change of an aggregate value for the keys at the highest
> group level, then apply that change to all the group levels lower than it
> using fdb’s atomic operations [1].

This is a really important point and worth calling out more explicitly. One 
reason most of the builtins are significantly simpler than the custom functions 
is because we know exactly how the aggregate values change when we get the map 
output for each updated doc in isolation. We don’t need to go retrieve all the 
rest of the map output that happens to share the same map key. Moreover, the 
change to the aggregate value can be applied using atomic operations so we 
don’t need to do anything special to avoid txn conflicts for views where lots 
of documents emit the same map key.

> ### Reducer functions
> 
> The FDB’s atomic functions support all the built in reduce functions
> CouchDB supports. So we can use those as part of our red function. For the
> `_stats` reduce function, we will have to split that across multiple key
> values. So its data model will have an extra key in it to record what stat
> it is for the _stats reducer:
> 
> {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
> <aggregrate_value>}
> 
> We do have some problems, with `_approx_count_distinct`  because it does
> not support removing keys from the filter. So we have three options:
> 
> 1. We can ignore key removal entirely in the filter since this is just an
> estimate
> 2.  Implement a real COUNT DISTINCT function, we can do because we’re not
> trying to merge results from different local shards
> 3. Don’t support it going forward

There’s actually a fourth option here, which is to maintain 
_approx_count_distinct using the future machinery for custom reduce functions, 
generating a new filter from the raw map output at the lowest level when keys 
are removed. I don’t have an informed opinion yet about that approach.

> ### Group_level=0
> 
> One tricker situation is if a user does a group_level=0 query with a key
> range, this would require us to do some client level aggregation. We would
> have to get the aggregate values for a `group_level=1` for the supplied key
> range and then aggregate those values together.

This isn’t unique to group_level=0, it could happen for any group_level!=exact. 
It can also require descending more than one group_level depending on the 
precision of the supplied key range; e.g. for a date-based key like [YYYY, MM, 
DD]; someone could ask for yearly aggregations with group_level=1 but supply a 
startkey and endkey with specific dates rather than months.

Good start! Cheers, Adam

Reply via email to