Jason, After looking into this a bit I do not think it's a bug, at most poor documentation. update_seq != last_seq Most of the time it does but as we know now sometimes it doesn't. It's a different thing. I"m not sure where else in the code we depend on update_seq reflecting all the changes to the database, perhaps as Randall suggests we might be able to *not* bump it in those other calls.
Another way to handle this is hang on to the last_seq when a changes call is made and use that as a since parameter in the next call. This to me seems like what's needed in this use case anyway. In any event it's likely easy to add last_seq to the db_info record, and I'm more than happy to do that, we should open a new ticket for that. Cheers, Bob On Dec 26, 2011, at 4:10 AM, Jason Smith wrote: > Hi, Randall. Thanks for inviting me to argue a bit more. I hope you'll > be persuaded that, if -1367 is not a bug, at least there is *some* > bug. > > tl;dr summary: > > This is a real bug--a paper cut with a workaround, but still a real bug. > > 1. Apps want a changes feed since 0, but they want to know when > they've "caught up" (defined below) > 2. These apps (and robust apps generally) probably start out by > pinging the /db anyway. Bob N. and I independently did so. > 3. update_seq looks deceptively like the sequence id of the latest > change, and people assume so. They define "caught up" as receiving a > change at or above this value. They expect to "catch up" in finite > time, and even if the db receives no subsequent updates. > 4. In fact, CouchDB does not disclose the sequence id of the latest > change in the /db response. To know that value: > 4a. If you want to process every change anyway, just get _changes > and use last_seq > 4b. If you just want the last sequence id, query > _changes?descending=true&limit=1 > 4b(1). If the response has a change, use its last_seq value > 4b(2). If the response has no changes, ignore the last_seq value > (it is really the update_seq) and use 0 > > Step 3 is the major paper cut. That step 4 exists and is complicated > is the minor paper cut. > > On Mon, Dec 26, 2011 at 5:36 AM, Randall Leeds (Commented) (JIRA) > <j...@apache.org> wrote: >> >> [ >> https://issues.apache.org/jira/browse/COUCHDB-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175892#comment-13175892 >> ] >> >> Randall Leeds commented on COUCHDB-1367: >> ---------------------------------------- >> >>> Wait a second. Robert, you are not fixing a bug in C-L, you are working >>> around a deficiency in CouchDB. >> >> Can't both be true? > > Only in the trivial sense. This ticket reveals that app > developers--Henrik and me, but also a committer--misunderstand > update_seq, thinking it is last_seq. last_seq is not easy to learn. > >> Nope. You can not ever know. You always know the latest sequence number at >> some arbitrarily recent point in time. > > Sorry, I cut corners and was not clear. Of course, nobody ever really > knows anything except events in the very recent past. But I mean in > the context of a _changes query one-two punch: get the last_seq, then > begin a continuous feed since that value. > > The bug is that users cannot readily know the id of the most recent > change. In fact, "the id of the most recent change" has no explicit > label or name in the CouchDB interface. Neither update_seq nor > last_seq mean exactly that. > >>> What if I want to see the most recent five changes? What if there are a >>> hundred million documents? What if 99% of the time, update_seq equals >>> last_seq and so developers assume it means something it doesn't? >> >> In order: >> * /_changes?descending=true&limit=5 > > I stand corrected. I had forgotten about a descending changes query. > That resolves the hundred-million-docs problem. (My erroneous point > was, 100M docs makes it too expensive to learn last_seq.) > > But that response looks bizarre. > > GET /db/_changes?descending=true\&limit=5 > {"results":[ > {"seq":22,"id":"after_3","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]}, > {"seq":21,"id":"after_2","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]}, > {"seq":20,"id":"after_1","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]}, > {"seq":19,"id":"conc","changes":[{"rev":"2-584a4a504a97009241d2587fee8b5eb8"}]}, > {"seq":17,"id":"preload_create","changes":[{"rev":"1-28bf6cd8af83c40c6e3fb82b608ce98f"}]} > ], > "last_seq":17} > > last_seq is the *least recent* change. If you query with &limit=1 then > they will be equal, and that is nice. *Except* if there were no > changes yet. > > $ curl -X PUT localhost:5984/x > {"ok":true} > > $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM > {"ok":true} > $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM > {"ok":true} > $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM > {"ok":true} > > $ curl localhost:5984/x/_changes > {"results":[ > > ], > "last_seq":0} > > $ curl localhost:5984/x/_changes?descending=true > {"results":[ > > ], > "last_seq":3} > > Weird. > >> * Add additional information to the changes feed, perhaps with a query >> parameter (almost the reverse of include docs) >> * Stop incrementing the update sequence on certain kinds of non-document >> changes >> * Add more information to the db information response > > A commonly-needed and valuable piece of data like this seems most > appropriate cached in the db header and served in the db information. > > -- > Iris Couch