On Mon, Dec 26, 2011 at 08:49, Jason Smith <[email protected]> wrote: > Hi, Bob. Thanks for your feedback. > > On Mon, Dec 26, 2011 at 12:24 PM, Robert Dionne > <[email protected]> wrote: >> Jason, >> >> After looking into this a bit I do not think it's a bug, at most poor >> documentation. update_seq != last_seq > > Nobody knows what update_seq means. Even a CouchDB committer got it wrong. > > Fine. It is "poor documentation." > > Adding last_seq into db_info is not helpful because last_seq also does > not mean what we think it means. My last email demonstrates that > last_seq is in fact incoherent.
<snip> On Mon, Dec 26, 2011 at 23:03, Benoit Chesneau <[email protected]> wrote: > Mmm right that confusing (maybe except if you consider update_seq as a > way to know the numbers of updates in the databases but in this case > the wording is confiusing) . Imo changes seq & commited_seq should be > quites the same. At least a changes seq should only happen when there > is a doc update ie each time and only if a revision is created. Does > that make sense? > > - benoiît Yes it does. There is mostly consistent relationship between update sequence (seq, update_seq, last_seq, committed_seq) and the by_seq index. It seems entirely too confusing that there are things which affect update_seq but do not appear in the by_seq btree. That is just plain wrong, else a massive confusion of vocabulary. Benoit, I believe you are right to suggest that none of these sequences-related things should change unless a revision is created. Bear with me for I believe ther is a related discussion about replicability for _security, _local docs, etc. It's clear that there are clustering and operational motivations for making this information replicable, thus making them proper documents with a place in the by_seq index, in the _changes feed, and affecting update_seq. Either these things have a proper place in the sequential history of a database or they do not. That there are things which affect update_seq but do not appear in the by_seq index and _changes feed feels like a mistake. Placing additional metadata in the db header feels like rubbing salt in this wound. Right now only replicable documents surface in the _changes feed and are added to the by_seq btree but some other things affect the update_seq. I've just gone and checked, as described in my previous email, that none of these appear to require a change to update_seq for any technical reason, though Jason properly points out that it is perhaps useful for operational tasks such as knowing when to back up a .couch file. I see two reasonable ways forward. 1) Stop incrementing update_seq for anything but replicable document changes 2) Make things which already affect update_seq but do not appear in _changes appear there, likely by turning them into proper MVCC documents. Regarding option 1: This is easy. I already outlined how to do this. It requires removing about 3 characters from our codebase. However, it spits at Jason's operations concerns, which I think are quite valid, and misses an opportunity for great improvement. Regarding option 2: There is a cluster-aware use case, an operations use case, and, I think, a purity argument here. As for how to accomplish this feat without terrible API breakage, we get a lot of help from our URL structure. We have reserved paths which cannot conflict with documents so it does not create ambiguity if '{"seq":20,"id":"_security", ...}' appears in a changes feed. However, I think _security is a bad name for this document because it requires that /_security API compatibility is broken. One solution I like right now is to add a _meta (without loss of generality -- insert your own preferred name) document, with the normal MVCC document API, referenced by the by_seq index and appearing in the _changes feed, which contains both _revs_limit and _security while preserving the legacy, cloberring, MVCC-oblivious APIs. Voila! No breaking changes. Keep a pointer latest revision of this document in the database header for fast access (and perhaps cache it in memory). It would probably be acceptable to keep these out of a vanilla changes request (after all, they require db admin credentials to modify, and in the case of _security to view). Opening the door to additional flags for _changes also allows us to provide a natural extension of this idea to replicable _local docs for the clustering use case. Thoughts, concerns, emotions and relevant, famous quotations encouraged. -Randall
