Hi All, Following on from the map discussion, I want to start the discussion on built-in reduce indexes.
## Builtin reduces Builtin reduces are definitely the easier of the two reduce options to reason about and design. The one factor to keep in mind for reduces is that we need to be able reduce at different group levels. So a data model for that would like this: {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE, <group_level>, <group_key>, <_reduce_function_name>} -> <aggregrate_value>} Most of that is similar to the map data model, where it changes is from the ?REDUCE subspace, we add the group_level (from 1 -> number of keys emitted in the map function), then the group key used in the reduce, the reduce function name e.g _sum, _count and then we store the aggregated value as the FDB value. ### Index management To update the reduce indexes, we will rely on the `id_index` and the `update_seq` defined in the map discussion. Then to apply changes, we calculate the change of an aggregate value for the keys at the highest group level, then apply that change to all the group levels lower than it using fdb’s atomic operations [1]. ### Reducer functions The FDB’s atomic functions support all the built in reduce functions CouchDB supports. So we can use those as part of our red function. For the `_stats` reduce function, we will have to split that across multiple key values. So its data model will have an extra key in it to record what stat it is for the _stats reducer: {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE, <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} -> <aggregrate_value>} We do have some problems, with `_approx_count_distinct` because it does not support removing keys from the filter. So we have three options: 1. We can ignore key removal entirely in the filter since this is just an estimate 2. Implement a real COUNT DISTINCT function, we can do because we’re not trying to merge results from different local shards 3. Don’t support it going forward ### Group_level=0 One tricker situation is if a user does a group_level=0 query with a key range, this would require us to do some client level aggregation. We would have to get the aggregate values for a `group_level=1` for the supplied key range and then aggregate those values together. I would love to hear your thoughts, ideas on this? If you are wondering about custom reduce indexes, I’m still working on that and will start a discussion email on that a little later. Cheers Garren