> On Mar 30, 2016, at 6:26 AM, Jan Lehnardt <j...@apache.org> wrote: > >> >> On 29 Mar 2016, at 20:14, Adam Kocoloski <kocol...@apache.org >> <mailto:kocol...@apache.org>> wrote: >> >> Neat stuff. Years ago I actually committed this feature to the codebase >> using a table scan and then Damien backed it out because of the scalability >> concern. Glad to see we’re approaching it in a more considered fashion this >> time around :) >> >> One thing we might consider is to maintain a *count* of the number of >> conflicted documents in the database automatically. If the count is nonzero >> when you expected it to be zero, build the conflicted documents index and do >> your inspection. In the happy case where there are no conflicts we just >> saved you a bunch of effort. >> >> We don’t really need a separate index to accomplish this; we just need to >> modify the reducer function supplied to the by_id btree. We’ve played that >> game before to add things like data size accumulators to the DB info object. >> There may be a modest hit to the write performance to count the number of >> non-deleted leafs in the rev tree on document update, but honestly that says >> as much about the inefficiencies in couch_key_tree as anything else - that >> quantity ought to be very cheap to uncover. > > Bob Newson and I talked about this on IRC some more and I think this is all > similar if not the same thinking: remember how we optimised `skip` in view > results? We could keep track of the number of conflicts per b-tree node and > then easily skip over the subtrees that don’t have any conflicts, so a > table-scan would be relatively cheap. > > Best > Jan > —
Yes, even better. I only started with the count but it’s easy to add the efficient scan once you have the count in the subtree. Adam