On Tue, Mar 8, 2011 at 2:28 PM, Randall Leeds <randall.le...@gmail.com> wrote: > When I start hacking on COUCHDB-1076 a couple nights ago I found that the > cleanest way I could see for allowing kp_nodes to be skipped during > traversal was for the fold function to couch_btree:foldl to be arity 4, with > kp_node | kv_node as the first argument. > > Evaluating that function for every node (instead of just kv_nodes) lets us > return {skip, Acc} to skip whole subtrees. > Diving back into the couch_btree code to read over Damien's patch for > COUCHDB-1084, it hit me that ModifyFun could instead be a function that was > called for all the actions. We wouldn't need to return the query results > because this action function could send them back to the client as they're > found. Then we just keep the actions list as part of the accumulator and > query_modify collapses into a btree:foldl and we no longer need so many > similar-looking recursive functions. Off the cuff I envision modify_node > being exported and simplified to be non-recursive and query_modify being a > helper function to generate a fold function that uses modify_node (or > something like this). > > Is this similar to anything you've done already, Paul? Would you all be > interested if I took a crack at doing this kind of refactor? >
If we want to talk about refactoring, I think the first place would be to go back and finish up getting [1] into trunk. I almost got to it one night but had a failing test and then was distracted by something shiny. I'd have to say I'm fairly against making the fold function use a single function that does double duty on handling kp and kv nodes. It just strikes me as not very elegant, not to mention, imagine rewriting the httpd view functions with such a construct. Though riffing off your idea, I could see passing a second function that takes a Red/Acc pair (the same Acc as the normal fold fun) and then uses that information to do whatever it is it needs to do. The separation of concerns here seems quite a bit more elegant. Though I'd also have a bit of concern about how the various edge cases interact, ie, do we use the SkipFun before we reach the start key? Also, how does this interact with reduce folds? Supported? Not supported? Etc etc. As to munging stream/query_modify into a single function that uses a ModifyFun or some such also concerns me a bit. I would prefer to keep them separate to make my brain hurt a bit less when considering things. Especially to avoid errors like "omg! we lost docs cause of a bug in handling _all_docs traversal" (which would become not impossible). Though I did already mention that we might want to reconsider changing the ModifyFun idea from 1084 to be a function that's passed and evaluated for each (Key, OldVal, NewVal) triple. Whether we go ahead and do that for all actions and then make anyone wanting to use the lookup part of query_modify use an Acc to gather the interesting bits I'm not certain. I can already say, that if we did go that route, it'd probably end up easiest to change all uses of query_modify to have a signature of something like couch_btree:modify(Bt, Keys, ModifyFun, InitAcc) and then in the node handling for the modify function it'd allow people to do whatever they wanted. The thing about this method is that it could theoretically handle our current situation directly to involve zero rewrites to existing usage of btree's yet allow us to systematically go through and make similar sorts of improvements that Damien's proposed in 1084. [1] https://github.com/davisp/couchdb/blob/btree/src/couchdb/couch_btree.erl