Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

Robert Dionne Mon, 26 Dec 2011 04:24:55 -0800

Jason,

  After looking into this a bit I do not think it's a bug, at most poor 
documentation. update_seq != last_seq  Most of the time it does but as we know 
now sometimes it doesn't. It's a different thing. I"m not sure where else in 
the code we depend on update_seq reflecting all the changes to the database, 
perhaps as Randall suggests we might be able to *not* bump it in those other 
calls.


  Another way to handle this is hang on to the last_seq when a changes call is 
made and use that as a since parameter in the next call. This to me seems like 
what's needed in this use case anyway.

  In any event it's likely easy to add last_seq to the db_info record, and I'm 
more than happy to do that, we should open a new ticket for that.

Cheers,

Bob

  




On Dec 26, 2011, at 4:10 AM, Jason Smith wrote:

> Hi, Randall. Thanks for inviting me to argue a bit more. I hope you'll
> be persuaded that, if -1367 is not a bug, at least there is *some*
> bug.
> 
> tl;dr summary:
> 
> This is a real bug--a paper cut with a workaround, but still a real bug.
> 
> 1. Apps want a changes feed since 0, but they want to know when
> they've "caught up" (defined below)
> 2. These apps (and robust apps generally) probably start out by
> pinging the /db anyway. Bob N. and I independently did so.
> 3. update_seq looks deceptively like the sequence id of the latest
> change, and people assume so. They define "caught up" as receiving a
> change at or above this value. They expect to "catch up" in finite
> time, and even if the db receives no subsequent updates.
> 4. In fact, CouchDB does not disclose the sequence id of the latest
> change in the /db response. To know that value:
>  4a. If you want to process every change anyway, just get _changes
> and use last_seq
>  4b. If you just want the last sequence id, query
> _changes?descending=true&limit=1
>    4b(1). If the response has a change, use its last_seq value
>    4b(2). If the response has no changes, ignore the last_seq value
> (it is really the update_seq) and use 0
> 
> Step 3 is the major paper cut. That step 4 exists and is complicated
> is the minor paper cut.
> 
> On Mon, Dec 26, 2011 at 5:36 AM, Randall Leeds (Commented) (JIRA)
> <j...@apache.org> wrote:
>> 
>>    [ 
>> https://issues.apache.org/jira/browse/COUCHDB-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175892#comment-13175892
>>  ]
>> 
>> Randall Leeds commented on COUCHDB-1367:
>> ----------------------------------------
>> 
>>> Wait a second. Robert, you are not fixing a bug in C-L, you are working 
>>> around a deficiency in CouchDB.
>> 
>> Can't both be true?
> 
> Only in the trivial sense. This ticket reveals that app
> developers--Henrik and me, but also a committer--misunderstand
> update_seq, thinking it is last_seq. last_seq is not easy to learn.
> 
>> Nope. You can not ever know. You always know the latest sequence number at 
>> some arbitrarily recent point in time.
> 
> Sorry, I cut corners and was not clear. Of course, nobody ever really
> knows anything except events in the very recent past. But I mean in
> the context of a _changes query one-two punch: get the last_seq, then
> begin a continuous feed since that value.
> 
> The bug is that users cannot readily know the id of the most recent
> change. In fact, "the id of the most recent change" has no explicit
> label or name in the CouchDB interface. Neither update_seq nor
> last_seq mean exactly that.
> 
>>> What if I want to see the most recent five changes? What if there are a 
>>> hundred million documents? What if 99% of the time, update_seq equals 
>>> last_seq and so developers assume it means something it doesn't?
>> 
>> In order:
>>  * /_changes?descending=true&limit=5
> 
> I stand corrected. I had forgotten about a descending changes query.
> That resolves the hundred-million-docs problem. (My erroneous point
> was, 100M docs makes it too expensive to learn last_seq.)
> 
> But that response looks bizarre.
> 
> GET /db/_changes?descending=true\&limit=5
> {"results":[
> {"seq":22,"id":"after_3","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]},
> {"seq":21,"id":"after_2","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]},
> {"seq":20,"id":"after_1","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]},
> {"seq":19,"id":"conc","changes":[{"rev":"2-584a4a504a97009241d2587fee8b5eb8"}]},
> {"seq":17,"id":"preload_create","changes":[{"rev":"1-28bf6cd8af83c40c6e3fb82b608ce98f"}]}
> ],
> "last_seq":17}
> 
> last_seq is the *least recent* change. If you query with &limit=1 then
> they will be equal, and that is nice. *Except* if there were no
> changes yet.
> 
>    $ curl -X PUT localhost:5984/x
>    {"ok":true}
> 
>    $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
>    {"ok":true}
>    $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
>    {"ok":true}
>    $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
>    {"ok":true}
> 
>    $ curl localhost:5984/x/_changes
>    {"results":[
> 
>    ],
>    "last_seq":0}
> 
>    $ curl localhost:5984/x/_changes?descending=true
>    {"results":[
> 
>    ],
>    "last_seq":3}
> 
> Weird.
> 
>>  * Add additional information to the changes feed, perhaps with a query 
>> parameter (almost the reverse of include docs)
>>  * Stop incrementing the update sequence on certain kinds of non-document 
>> changes
>>  * Add more information to the db information response
> 
> A commonly-needed and valuable piece of data like this seems most
> appropriate cached in the db header and served in the db information.
> 
> -- 
> Iris Couch

Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

Reply via email to