Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds randall.le...@gmail.com wrote: Yes it does. There is mostly consistent relationship between update sequence (seq, update_seq, last_seq, committed_seq) and the by_seq index. It seems entirely too confusing that there are things which affect update_seq but do not appear in the by_seq btree. That is just plain wrong, else a massive confusion of vocabulary. I think it is confused definitions. If by_seq is defined as the sequence ids of regular documents then it is implemented correctly. Bear with me for I believe ther is a related discussion about replicability for _security, _local docs, etc. It's clear that there are clustering and operational motivations for making this information replicable, thus making them proper documents with a place in the by_seq index, in the _changes feed, and affecting update_seq. It is common to track changes to data vs. changes to metadata independently. Changing Unix file permissions updates ctime but not mtime. Changing Unix files updates both ctime and mtime. In CouchDB, update_seq plays the role of ctime (data or metadata updates), and nobody's been cast for mtime (metadata-only updates). Either these things have a proper place in the sequential history of a database or they do not. That there are things which affect update_seq but do not appear in the by_seq index and _changes feed feels like a mistake. The first sentence is, well, a tautology actually, but it asks the right question and the answer is they DO NOT belong. _changes shows data, not metadata. By definition, _changes is anything worth replicating. But I hope my filesystem example above shows why it is okay to increment update_seq but not change by_seq. The bug with update_seq is not that it it is too eager (increments for _security, _revs_limit), but it is not eager enough (it should bump for _local too). 2. As a frequent consumer of _changes, I would prefer *not* to see _local documents, nor _security or other updates in there. They are metadata, not data. Maybe I misunderstood, but nobody wants to *replicate* _security objects or _local docs; they just want MVCC semantics (Adam on _security, IIRC) and a simplified API (me, on making all metadata a _local doc, and making _local docs full MVCC). Placing additional metadata in the db header feels like rubbing salt in this wound. On the contrary, IMHO, we want 1. A new value: the sequence id of the most recent document update (pretty sure) 2. Available to the client alongside existing values like doc_count doc_del_count (somewhat sure) Right now only replicable documents surface in the _changes feed and are added to the by_seq btree but some other things affect the update_seq. I've just gone and checked, as described in my previous email, that none of these appear to require a change to update_seq for any technical reason, though Jason properly points out that it is perhaps useful for operational tasks such as knowing when to back up a .couch file. Here is where get into migrating to more _local docs. I am actually not sure if that's good for this discussion. But anyway, my basic feeling is * All metadata that clients can change is _local docs, with MVCC, *not in* the by_seq tree * update_seq counts changes to data or metadata * update_sikh (WLOG) counts changes to documents only (changes to the by_seq tree) Doable? It bears mentioning that I haven't any idea what I am talking about. Thoughts, concerns, emotions and relevant, famous quotations encouraged. WELCOME TO THE PARTY, PAL! -- Iris Couch
Re: browserid support
On Tue, Dec 27, 2011 at 6:09 AM, Randall Leeds randall.le...@gmail.com wrote: On Sun, Dec 25, 2011 at 22:02, Jason Smith j...@iriscouch.com wrote: On Mon, Dec 26, 2011 at 9:51 AM, Michiel de Jong mich...@unhosted.org wrote: The other thing, CouchDB as a BrowserId RP, would simply be instead of clicking 'login' at the bottom right in futon, there would be a BrowserId sign in button there. This is nice because then people don't have to remember their CouchDB password all the time. Or for that matter, their password in whatever app uses CouchDB. This would have to be something in front of CouchDB, which check the BrowserId assertion, and opens a session - which may involve storing the plain text admin password and sending this to the client, or creating a session token and staying inbetween as a proxy, or creating a session token and adding this into the _users database as you send it in plain text to the client. We are further along than that. CouchDB can confirm a valid BrowserID identity (however it uses the mozilla.org web service). But the experience for the Couch application developer is quite good (IMO). https://github.com/iriscouch/browserid_couchdb -- Iris Couch As Jason points, out, CouchDB can already act as an RP with the BrowserID plugin mentoined. I still have a lot of interest in making CouchDB both a primary identity provider and a verifier, but I've lost track of the state of BrowserID. I'm including dev@ in the hopes that a discussion about implementation can grow there. -Randall I exchanged some mails recently on the browserid ml, to know the status of primary services , it sound like the spec isn't finished yet. I will wait for that before doing anything myself. Current implementation of browserid is worthless imo, since it need to rely on a centralized service. It's good to show how it could work, but I'm eagerly waiting for the final spec, so we could use any mail server as an ID validation. Once it's done, there are some pretty interesting libs in Erlang that will make the implementation easy. - benoît
Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
What's interesting is that modulo one edge case, last_seq in the changes feed and update_seq in the db_info record are exactly as defined on the WIKI. update_seq: Current number of updates to the database (int) last_seq: last_seq is the sequence number of the last update returned. (Currently it will always be the same as the seq of the last item in results.) this holds true also when there are no changes to documents, the value of last_seq is zero. The one edge case (which is a bit odd) is seen when you retrieve last_seq using ?descending=truelimit=1. If there are no changes the value will still be zero unless you call _set_revs_limit first in which case the value will be one. The value will still be zero if the normal _changes is called with no args. What makes it odd is that calling _changes?descending... after a call to _set_revs_limit does not impact the value of last_seq. This is a bug. So yes it's a bit weird but it does pretty much agree with the documentation. The quote I'm looking for is the one about angels on the head of a pin. I guess it needs more thought. In general I don't like metadata because I think it creates more things that need to be handled differently, adding complexity for the sake of something that doesn't exist (metadata). Do you have any more swatches in magenta? On Dec 27, 2011, at 12:04 AM, Randall Leeds wrote: On Mon, Dec 26, 2011 at 08:49, Jason Smith j...@iriscouch.com wrote: Hi, Bob. Thanks for your feedback. On Mon, Dec 26, 2011 at 12:24 PM, Robert Dionne dio...@dionne-associates.com wrote: Jason, After looking into this a bit I do not think it's a bug, at most poor documentation. update_seq != last_seq Nobody knows what update_seq means. Even a CouchDB committer got it wrong. Fine. It is poor documentation. Adding last_seq into db_info is not helpful because last_seq also does not mean what we think it means. My last email demonstrates that last_seq is in fact incoherent. snip On Mon, Dec 26, 2011 at 23:03, Benoit Chesneau bchesn...@gmail.com wrote: Mmm right that confusing (maybe except if you consider update_seq as a way to know the numbers of updates in the databases but in this case the wording is confiusing) . Imo changes seq commited_seq should be quites the same. At least a changes seq should only happen when there is a doc update ie each time and only if a revision is created. Does that make sense? - benoiît Yes it does. There is mostly consistent relationship between update sequence (seq, update_seq, last_seq, committed_seq) and the by_seq index. It seems entirely too confusing that there are things which affect update_seq but do not appear in the by_seq btree. That is just plain wrong, else a massive confusion of vocabulary. Benoit, I believe you are right to suggest that none of these sequences-related things should change unless a revision is created. Bear with me for I believe ther is a related discussion about replicability for _security, _local docs, etc. It's clear that there are clustering and operational motivations for making this information replicable, thus making them proper documents with a place in the by_seq index, in the _changes feed, and affecting update_seq. Either these things have a proper place in the sequential history of a database or they do not. That there are things which affect update_seq but do not appear in the by_seq index and _changes feed feels like a mistake. Placing additional metadata in the db header feels like rubbing salt in this wound. Right now only replicable documents surface in the _changes feed and are added to the by_seq btree but some other things affect the update_seq. I've just gone and checked, as described in my previous email, that none of these appear to require a change to update_seq for any technical reason, though Jason properly points out that it is perhaps useful for operational tasks such as knowing when to back up a .couch file. I see two reasonable ways forward. 1) Stop incrementing update_seq for anything but replicable document changes 2) Make things which already affect update_seq but do not appear in _changes appear there, likely by turning them into proper MVCC documents. Regarding option 1: This is easy. I already outlined how to do this. It requires removing about 3 characters from our codebase. However, it spits at Jason's operations concerns, which I think are quite valid, and misses an opportunity for great improvement. Regarding option 2: There is a cluster-aware use case, an operations use case, and, I think, a purity argument here. As for how to accomplish this feat without terrible API breakage, we get a lot of help from our URL structure. We have reserved paths which cannot conflict with documents so it does not create ambiguity if '{seq:20,id:_security, ...}' appears in a changes feed. However, I think _security is a bad name for this document because it requires
Re: browserid support
On Tue, Dec 27, 2011 at 8:45 AM, Benoit Chesneau bchesn...@gmail.com wrote: Current implementation of browserid is worthless imo Indeed. It's good to show how it could work, Cue Inigo Montoya. -- Iris Couch
[jira] [Commented] (COUCHDB-1371) configure erroneously warns against using a new spidermonkey with old spidermonkeys
[ https://issues.apache.org/jira/browse/COUCHDB-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176267#comment-13176267 ] Paul Joseph Davis commented on COUCHDB-1371: While I understand your general frustration, deleting code just so we don't have to test a couple versions isn't a convincing argument to me. Though it does suck having to go back and test older spidermonkeys. I would be up for not officially supported, use at your own risk type of support, though that just makes me worry that we'll piss off anyone that happens to have an old SpiderMonkey. Bottom line, we chose a funky ass dependency years ago and the chickens have come home to roost. If we want to maintain any semblance of It just works then we'll need to keep maintaining this dependency for quite some time. configure erroneously warns against using a new spidermonkey with old spidermonkeys --- Key: COUCHDB-1371 URL: https://issues.apache.org/jira/browse/COUCHDB-1371 Project: CouchDB Issue Type: Bug Reporter: Randall Leeds Assignee: Randall Leeds Priority: Minor Fix For: 1.2, 1.3, 1.1.2 Attachments: 0001-fix-bad-configure-warning-on-old-SpiderMonkey.patch Paul added the check for JSOPTION_ANONFUNFIX in 7ce9e103e, but js-1.7 doesn't have this constant so configure gives a warning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1371) configure erroneously warns against using a new spidermonkey with old spidermonkeys
[ https://issues.apache.org/jira/browse/COUCHDB-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176280#comment-13176280 ] Randall Leeds commented on COUCHDB-1371: I hope there does come a time when we drop support. How's this configure.ac patch? I can apply it and fix the INSTALL files if you'd like. configure erroneously warns against using a new spidermonkey with old spidermonkeys --- Key: COUCHDB-1371 URL: https://issues.apache.org/jira/browse/COUCHDB-1371 Project: CouchDB Issue Type: Bug Reporter: Randall Leeds Assignee: Randall Leeds Priority: Minor Fix For: 1.2, 1.3, 1.1.2 Attachments: 0001-fix-bad-configure-warning-on-old-SpiderMonkey.patch Paul added the check for JSOPTION_ANONFUNFIX in 7ce9e103e, but js-1.7 doesn't have this constant so configure gives a warning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1371) configure erroneously warns against using a new spidermonkey with old spidermonkeys
[ https://issues.apache.org/jira/browse/COUCHDB-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176306#comment-13176306 ] Paul Joseph Davis commented on COUCHDB-1371: +1 configure erroneously warns against using a new spidermonkey with old spidermonkeys --- Key: COUCHDB-1371 URL: https://issues.apache.org/jira/browse/COUCHDB-1371 Project: CouchDB Issue Type: Bug Reporter: Randall Leeds Assignee: Randall Leeds Priority: Minor Fix For: 1.2, 1.3, 1.1.2 Attachments: 0001-fix-bad-configure-warning-on-old-SpiderMonkey.patch Paul added the check for JSOPTION_ANONFUNFIX in 7ce9e103e, but js-1.7 doesn't have this constant so configure gives a warning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
On Tue, Dec 27, 2011 at 05:22, Jason Smith j...@iriscouch.com wrote: On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds randall.le...@gmail.com wrote: Either these things have a proper place in the sequential history of a database or they do not. That there are things which affect update_seq but do not appear in the by_seq index and _changes feed feels like a mistake. The first sentence is, well, a tautology actually, but it asks the right question and the answer is they DO NOT belong. _changes shows data, not metadata. By definition, _changes is anything worth replicating. That strikes me as incorrect. The _changes feed is purely metadata unless ?include_docs=true is specified. But I hope my filesystem example above shows why it is okay to increment update_seq but not change by_seq. You show a nice precedent for separating metadata and data, but CouchDB has a decent precedent of avoiding this same thing. For example, _id and _rev are in the returned document body rather than part of the HTTP request (it could have been just URL and entity tag headers only for this). The bug with update_seq is not that it it is too eager (increments for _security, _revs_limit), but it is not eager enough (it should bump for _local too). I agree, but for different reasons. I think _local docs may have a place in by_seq even if the default _changes request still only shows the default, replicable documents. 2. As a frequent consumer of _changes, I would prefer *not* to see _local documents, nor _security or other updates in there. They are metadata, not data. Maybe I misunderstood, but nobody wants to *replicate* _security objects or _local docs; they just want MVCC semantics (Adam on _security, IIRC) and a simplified API (me, on making all metadata a _local doc, and making _local docs full MVCC). I think you misunderstand, maybe. In the case of BigCouch, MVCC is all that's needed because the replication does not go over HTTP. I see no reason to require that special care be taken to copy these objects when a flag on the _changes feed might cause them to be transferred very naturally. In particular, I would use this feature in a hypothetical Lounge 3.0. It also means that with admin privileges we could do full backup replications. -R
Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
On Wed, Dec 28, 2011 at 9:38 AM, Randall Leeds randall.le...@gmail.com wrote: On Tue, Dec 27, 2011 at 05:22, Jason Smith j...@iriscouch.com wrote: On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds randall.le...@gmail.com wrote: Either these things have a proper place in the sequential history of a database or they do not. That there are things which affect update_seq but do not appear in the by_seq index and _changes feed feels like a mistake. The first sentence is, well, a tautology actually, but it asks the right question and the answer is they DO NOT belong. _changes shows data, not metadata. By definition, _changes is anything worth replicating. That strikes me as incorrect. The _changes feed is purely metadata unless ?include_docs=true is specified. Yes, data and metadata are problematic words. I'll stop using them. Do you agree that _changes is, by definition, anything worth replicating? But I hope my filesystem example above shows why it is okay to increment update_seq but not change by_seq. You show a nice precedent for separating metadata and data, but CouchDB has a decent precedent of avoiding this same thing. For example, _id and _rev are in the returned document body rather than part of the HTTP request (it could have been just URL and entity tag headers only for this). Yeah that's a good point. The bug with update_seq is not that it it is too eager (increments for _security, _revs_limit), but it is not eager enough (it should bump for _local too). I agree, but for different reasons. I think _local docs may have a place in by_seq even if the default _changes request still only shows the default, replicable documents. That's an interesting idea. IMO, _security, _revs_limit, apply to a specific database and URL, and consequently must never replicate. _local docs are those which don't replicate. If _local would replicate, I'd worry about spurious checkpoints spreading to where they don't belong; and unchecked _security replication is even worse. Your idea improves consistency and orthogonality. It also solves the problem of how to enumerate _local docs. (AFAIK there is no way to list them all, not via _all_docs, or _changes, or a view). But it doesn't solve the larger problem: How to follow a _changes feed and know when you have caught up. Both Bob N. and I independently did the following for our projects: 1. GET /db and wrongly assume update_seq will appear in the changes feed 2. GET /db/_changes?feed=continuous 3. Break when a change has .seq = update_seq Suppose you have step 0: Update _security or _revs_limit. The loop will never break. You propose (WLOG) _changes?comprehensive=true which guarantees a change equal or greater than update_seq. That's cool, but IMO app developers now have to add code to ignore irrelevant changes like those containing replication checkpoints. I propose (WLOG) update_sikh in the db header which is the seq id of the latest *document* update. App developers modify their step 1 to use update_sikh instead of update_seq. Is that an accurate synopsis? 2. As a frequent consumer of _changes, I would prefer *not* to see _local documents, nor _security or other updates in there. They are metadata, not data. Maybe I misunderstood, but nobody wants to *replicate* _security objects or _local docs; they just want MVCC semantics (Adam on _security, IIRC) and a simplified API (me, on making all metadata a _local doc, and making _local docs full MVCC). I think you misunderstand, maybe. In the case of BigCouch, MVCC is all that's needed because the replication does not go over HTTP. I see no reason to require that special care be taken to copy these objects when a flag on the _changes feed might cause them to be transferred very naturally. In particular, I would use this feature in a hypothetical Lounge 3.0. It also means that with admin privileges we could do full backup replications. If couch could do this, then cool. But consider that both examples are the same special-case: sharding and simulating a normal database API when there are actually multiple parts. That sounds like an application concern. Is it really true that shards need the same _revs_limit as the simulated whole? Maybe they really want _revs_limit / number_of_shards? Is it really necessary that _security be identical in each shard? Actually, yes it is, because validate_doc_update uses it. But still... How are you computing doc_count in the /db response? You have to sum doc_count from each shard. But every shard needs a copy of every design doc for validation. So you have to subtract those back out? My broader point is, sharding applications already do lots of magic. I'm not sure if replicating _security and _local docs buys you much. But you've definitely persuaded me that your idea works. It is the second-best proposal I've seen in this thread :) -- Iris Couch