What's interesting is that modulo one edge case, last_seq in the changes feed 
and update_seq in the db_info record are exactly as defined on the WIKI. 

update_seq:

Current number of updates to the database (int)

last_seq:

last_seq is the sequence number of the last update returned. (Currently it will 
always be the same as the seq of the last item in results.)

this holds true also when there are no changes to documents, the value of 
last_seq is zero. The one edge case (which is a bit odd) is seen when you 
retrieve last_seq using ?descending=true&limit=1. If there are no changes the 
value will still be zero unless you call _set_revs_limit first in which case 
the value will be one. The value will still be zero if the normal _changes is 
called with no args. What makes it odd is that calling _changes?descending... 
after a call to _set_revs_limit does not impact the value of last_seq. This is 
a bug.

So yes it's a bit weird but it does pretty much agree with the documentation. 
The quote I'm looking for is the one about angels on the head of a pin. 

I guess it needs more thought. In general I don't like metadata because I think 
it creates more things that need to be handled differently, adding complexity 
for the sake of something that doesn't exist (metadata).

Do you have any more swatches in magenta?



On Dec 27, 2011, at 12:04 AM, Randall Leeds wrote:

> On Mon, Dec 26, 2011 at 08:49, Jason Smith <j...@iriscouch.com> wrote:
>> Hi, Bob. Thanks for your feedback.
>> 
>> On Mon, Dec 26, 2011 at 12:24 PM, Robert Dionne
>> <dio...@dionne-associates.com> wrote:
>>> Jason,
>>> 
>>>  After looking into this a bit I do not think it's a bug, at most poor 
>>> documentation. update_seq != last_seq
>> 
>> Nobody knows what update_seq means. Even a CouchDB committer got it wrong.
>> 
>> Fine. It is "poor documentation."
>> 
>> Adding last_seq into db_info is not helpful because last_seq also does
>> not mean what we think it means. My last email demonstrates that
>> last_seq is in fact incoherent.
> 
> <snip>
> 
> On Mon, Dec 26, 2011 at 23:03, Benoit Chesneau <bchesn...@gmail.com> wrote:
>> Mmm right that confusing (maybe except if you consider update_seq as a
>> way to know the numbers of updates in the databases but in this case
>> the wording is confiusing) . Imo changes seq & commited_seq should be
>> quites the same. At least a changes seq should only happen when there
>> is a doc update ie each time and only if a revision is created.  Does
>> that make sense?
>> 
>> - benoiƮt
> 
> Yes it does. There is mostly consistent relationship between update
> sequence (seq, update_seq, last_seq, committed_seq) and the by_seq
> index. It seems entirely too confusing that there are things which
> affect update_seq but do not appear in the by_seq btree. That is just
> plain wrong, else a massive confusion of vocabulary. Benoit, I believe
> you are right to suggest that none of these sequences-related things
> should change unless a revision is created.
> 
> Bear with me for I believe ther is a related discussion about
> replicability for _security, _local docs, etc. It's clear that there
> are clustering and operational motivations for making this information
> replicable, thus making them proper documents with a place in the
> by_seq index, in the _changes feed, and affecting update_seq. Either
> these things have a proper place in the sequential history of a
> database or they do not. That there are things which affect update_seq
> but do not appear in the by_seq index and _changes feed feels like a
> mistake. Placing additional metadata in the db header feels like
> rubbing salt in this wound.
> 
> Right now only replicable documents surface in the _changes feed and
> are added to the by_seq btree but some other things affect the
> update_seq. I've just gone and checked, as described in my previous
> email, that none of these appear to require a change to update_seq for
> any technical reason, though Jason properly points out that it is
> perhaps useful for operational tasks such as knowing when to back up a
> .couch file.
> 
> I see two reasonable ways forward.
> 
> 1) Stop incrementing update_seq for anything but replicable document changes
> 2) Make things which already affect update_seq but do not appear in
> _changes appear there, likely by turning them into proper MVCC
> documents.
> 
> Regarding option 1:
> This is easy. I already outlined how to do this. It requires removing
> about 3 characters from our codebase. However, it spits at Jason's
> operations concerns, which I think are quite valid, and misses an
> opportunity for great improvement.
> 
> Regarding option 2:
> There is a cluster-aware use case, an operations use case, and, I
> think, a purity argument here. As for how to accomplish this feat
> without terrible API breakage, we get a lot of help from our URL
> structure. We have reserved paths which cannot conflict with documents
> so it does not create ambiguity if '{"seq":20,"id":"_security", ...}'
> appears in a changes feed. However, I think _security is a bad name
> for this document because it requires that /_security API
> compatibility is broken.
> 
> One solution I like right now is to add a _meta (without loss of
> generality -- insert your own preferred name) document, with the
> normal MVCC document API, referenced by the by_seq index and appearing
> in the _changes feed, which contains both _revs_limit and _security
> while preserving the legacy, cloberring, MVCC-oblivious APIs. Voila!
> No breaking changes. Keep a pointer latest revision of this document
> in the database header for fast access (and perhaps cache it in
> memory).
> 
> It would probably be acceptable to keep these out of a vanilla changes
> request (after all, they require db admin credentials to modify, and
> in the case of _security to view). Opening the door to additional
> flags for _changes also allows us to provide a natural extension of
> this idea to replicable _local docs for the clustering use case.
> 
> Thoughts, concerns, emotions and relevant, famous quotations encouraged.
> 
> -Randall

Reply via email to