Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

2011-12-27 Thread Jason Smith
On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds randall.le...@gmail.com wrote:
 Yes it does. There is mostly consistent relationship between update
 sequence (seq, update_seq, last_seq, committed_seq) and the by_seq
 index. It seems entirely too confusing that there are things which
 affect update_seq but do not appear in the by_seq btree. That is just
 plain wrong, else a massive confusion of vocabulary.

I think it is confused definitions.

If by_seq is defined as the sequence ids of regular documents then
it is implemented correctly.

 Bear with me for I believe ther is a related discussion about
 replicability for _security, _local docs, etc. It's clear that there
 are clustering and operational motivations for making this information
 replicable, thus making them proper documents with a place in the
 by_seq index, in the _changes feed, and affecting update_seq.

It is common to track changes to data vs. changes to metadata
independently. Changing Unix file permissions updates ctime but not
mtime. Changing Unix files updates both ctime and mtime. In CouchDB,
update_seq plays the role of ctime (data or metadata updates), and
nobody's been cast for mtime (metadata-only updates).

 Either
 these things have a proper place in the sequential history of a
 database or they do not. That there are things which affect update_seq
 but do not appear in the by_seq index and _changes feed feels like a
 mistake.

The first sentence is, well, a tautology actually, but it asks the
right question and the answer is they DO NOT belong. _changes shows
data, not metadata. By definition, _changes is anything worth
replicating.

But I hope my filesystem example above shows why it is okay to
increment update_seq but not change by_seq.

The bug with update_seq is not that it it is too eager (increments for
_security, _revs_limit), but it is not eager enough (it should bump
for _local too).

2. As a frequent consumer of _changes, I would prefer *not* to see
_local documents, nor _security or other updates in there. They are
metadata, not data. Maybe I misunderstood, but nobody wants to
*replicate* _security objects or _local docs; they just want MVCC
semantics (Adam on _security, IIRC) and a simplified API (me, on
making all metadata a _local doc, and making _local docs full MVCC).


 Placing additional metadata in the db header feels like
 rubbing salt in this wound.

On the contrary, IMHO, we want

1. A new value: the sequence id of the most recent document update (pretty sure)
2. Available to the client alongside existing values like doc_count
doc_del_count (somewhat sure)

 Right now only replicable documents surface in the _changes feed and
 are added to the by_seq btree but some other things affect the
 update_seq. I've just gone and checked, as described in my previous
 email, that none of these appear to require a change to update_seq for
 any technical reason, though Jason properly points out that it is
 perhaps useful for operational tasks such as knowing when to back up a
 .couch file.

Here is where get into migrating to more _local docs. I am actually
not sure if that's good for this discussion. But anyway, my basic
feeling is

* All metadata that clients can change is _local docs, with MVCC, *not
in* the by_seq tree
* update_seq counts changes to data or metadata
* update_sikh (WLOG) counts changes to documents only (changes to the
by_seq tree)

Doable? It bears mentioning that I haven't any idea what I am talking about.

 Thoughts, concerns, emotions and relevant, famous quotations encouraged.

WELCOME TO THE PARTY, PAL!

-- 
Iris Couch


Re: browserid support

2011-12-27 Thread Benoit Chesneau
On Tue, Dec 27, 2011 at 6:09 AM, Randall Leeds randall.le...@gmail.com wrote:
 On Sun, Dec 25, 2011 at 22:02, Jason Smith j...@iriscouch.com wrote:
 On Mon, Dec 26, 2011 at 9:51 AM, Michiel de Jong mich...@unhosted.org 
 wrote:
 The other thing, CouchDB as a BrowserId RP, would simply be instead of
 clicking 'login' at the bottom right in futon, there would be a BrowserId
 sign in button there. This is nice because then people don't have to
 remember their CouchDB password all the time. Or for that matter, their
 password in whatever app uses CouchDB. This would have to be something in
 front of CouchDB, which check the BrowserId assertion, and opens a session
 - which may involve storing the plain text admin password and sending this
 to the client, or creating a session token and staying inbetween as a
 proxy, or creating a session token and adding this into the _users database
 as you send it in plain text to the client.

 We are further along than that. CouchDB can confirm a valid BrowserID
 identity (however it uses the mozilla.org web service). But the
 experience for the Couch application developer is quite good (IMO).

 https://github.com/iriscouch/browserid_couchdb

 --
 Iris Couch

 As Jason points, out, CouchDB can already act as an RP with the
 BrowserID plugin mentoined. I still have a lot of interest in making
 CouchDB both a primary identity provider and a verifier, but I've lost
 track of the state of BrowserID. I'm including dev@ in the hopes that
 a discussion about implementation can grow there.

 -Randall
I exchanged some mails recently on the browserid ml, to know the
status of primary services , it sound like the spec isn't finished
yet. I will wait for that before doing anything myself.

Current implementation of browserid is worthless imo, since it need to
rely on a centralized service. It's good to show how it could work,
but I'm eagerly waiting for the final spec, so we could use any mail
server as an ID validation. Once it's done, there are some pretty
interesting libs in Erlang that will make the implementation easy.

- benoît


Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

2011-12-27 Thread Robert Dionne
What's interesting is that modulo one edge case, last_seq in the changes feed 
and update_seq in the db_info record are exactly as defined on the WIKI. 

update_seq:

Current number of updates to the database (int)

last_seq:

last_seq is the sequence number of the last update returned. (Currently it will 
always be the same as the seq of the last item in results.)

this holds true also when there are no changes to documents, the value of 
last_seq is zero. The one edge case (which is a bit odd) is seen when you 
retrieve last_seq using ?descending=truelimit=1. If there are no changes the 
value will still be zero unless you call _set_revs_limit first in which case 
the value will be one. The value will still be zero if the normal _changes is 
called with no args. What makes it odd is that calling _changes?descending... 
after a call to _set_revs_limit does not impact the value of last_seq. This is 
a bug.

So yes it's a bit weird but it does pretty much agree with the documentation. 
The quote I'm looking for is the one about angels on the head of a pin. 

I guess it needs more thought. In general I don't like metadata because I think 
it creates more things that need to be handled differently, adding complexity 
for the sake of something that doesn't exist (metadata).

Do you have any more swatches in magenta?



On Dec 27, 2011, at 12:04 AM, Randall Leeds wrote:

 On Mon, Dec 26, 2011 at 08:49, Jason Smith j...@iriscouch.com wrote:
 Hi, Bob. Thanks for your feedback.
 
 On Mon, Dec 26, 2011 at 12:24 PM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Jason,
 
  After looking into this a bit I do not think it's a bug, at most poor 
 documentation. update_seq != last_seq
 
 Nobody knows what update_seq means. Even a CouchDB committer got it wrong.
 
 Fine. It is poor documentation.
 
 Adding last_seq into db_info is not helpful because last_seq also does
 not mean what we think it means. My last email demonstrates that
 last_seq is in fact incoherent.
 
 snip
 
 On Mon, Dec 26, 2011 at 23:03, Benoit Chesneau bchesn...@gmail.com wrote:
 Mmm right that confusing (maybe except if you consider update_seq as a
 way to know the numbers of updates in the databases but in this case
 the wording is confiusing) . Imo changes seq  commited_seq should be
 quites the same. At least a changes seq should only happen when there
 is a doc update ie each time and only if a revision is created.  Does
 that make sense?
 
 - benoiît
 
 Yes it does. There is mostly consistent relationship between update
 sequence (seq, update_seq, last_seq, committed_seq) and the by_seq
 index. It seems entirely too confusing that there are things which
 affect update_seq but do not appear in the by_seq btree. That is just
 plain wrong, else a massive confusion of vocabulary. Benoit, I believe
 you are right to suggest that none of these sequences-related things
 should change unless a revision is created.
 
 Bear with me for I believe ther is a related discussion about
 replicability for _security, _local docs, etc. It's clear that there
 are clustering and operational motivations for making this information
 replicable, thus making them proper documents with a place in the
 by_seq index, in the _changes feed, and affecting update_seq. Either
 these things have a proper place in the sequential history of a
 database or they do not. That there are things which affect update_seq
 but do not appear in the by_seq index and _changes feed feels like a
 mistake. Placing additional metadata in the db header feels like
 rubbing salt in this wound.
 
 Right now only replicable documents surface in the _changes feed and
 are added to the by_seq btree but some other things affect the
 update_seq. I've just gone and checked, as described in my previous
 email, that none of these appear to require a change to update_seq for
 any technical reason, though Jason properly points out that it is
 perhaps useful for operational tasks such as knowing when to back up a
 .couch file.
 
 I see two reasonable ways forward.
 
 1) Stop incrementing update_seq for anything but replicable document changes
 2) Make things which already affect update_seq but do not appear in
 _changes appear there, likely by turning them into proper MVCC
 documents.
 
 Regarding option 1:
 This is easy. I already outlined how to do this. It requires removing
 about 3 characters from our codebase. However, it spits at Jason's
 operations concerns, which I think are quite valid, and misses an
 opportunity for great improvement.
 
 Regarding option 2:
 There is a cluster-aware use case, an operations use case, and, I
 think, a purity argument here. As for how to accomplish this feat
 without terrible API breakage, we get a lot of help from our URL
 structure. We have reserved paths which cannot conflict with documents
 so it does not create ambiguity if '{seq:20,id:_security, ...}'
 appears in a changes feed. However, I think _security is a bad name
 for this document because it requires 

Re: browserid support

2011-12-27 Thread Jason Smith
On Tue, Dec 27, 2011 at 8:45 AM, Benoit Chesneau bchesn...@gmail.com wrote:
 Current implementation of browserid is worthless imo

Indeed.

 It's good to show how it could work,

Cue Inigo Montoya.

-- 
Iris Couch


[jira] [Commented] (COUCHDB-1371) configure erroneously warns against using a new spidermonkey with old spidermonkeys

2011-12-27 Thread Paul Joseph Davis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176267#comment-13176267
 ] 

Paul Joseph Davis commented on COUCHDB-1371:


While I understand your general frustration, deleting code just so we don't 
have to test a couple versions isn't a convincing argument to me. Though it 
does suck having to go back and test older spidermonkeys. I would be up for 
not officially supported, use at your own risk type of support, though that 
just makes me worry that we'll piss off anyone that happens to have an old 
SpiderMonkey.

Bottom line, we chose a funky ass dependency years ago and the chickens have 
come home to roost. If we want to maintain any semblance of It just works 
then we'll need to keep maintaining this dependency for quite some time.

 configure erroneously warns against using a new spidermonkey with old 
 spidermonkeys
 ---

 Key: COUCHDB-1371
 URL: https://issues.apache.org/jira/browse/COUCHDB-1371
 Project: CouchDB
  Issue Type: Bug
Reporter: Randall Leeds
Assignee: Randall Leeds
Priority: Minor
 Fix For: 1.2, 1.3, 1.1.2

 Attachments: 0001-fix-bad-configure-warning-on-old-SpiderMonkey.patch


 Paul added the check for JSOPTION_ANONFUNFIX in 7ce9e103e, but js-1.7 doesn't 
 have this constant so configure gives a warning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1371) configure erroneously warns against using a new spidermonkey with old spidermonkeys

2011-12-27 Thread Randall Leeds (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176280#comment-13176280
 ] 

Randall Leeds commented on COUCHDB-1371:


I hope there does come a time when we drop support.

How's this configure.ac patch? I can apply it and fix the INSTALL files if 
you'd like.

 configure erroneously warns against using a new spidermonkey with old 
 spidermonkeys
 ---

 Key: COUCHDB-1371
 URL: https://issues.apache.org/jira/browse/COUCHDB-1371
 Project: CouchDB
  Issue Type: Bug
Reporter: Randall Leeds
Assignee: Randall Leeds
Priority: Minor
 Fix For: 1.2, 1.3, 1.1.2

 Attachments: 0001-fix-bad-configure-warning-on-old-SpiderMonkey.patch


 Paul added the check for JSOPTION_ANONFUNFIX in 7ce9e103e, but js-1.7 doesn't 
 have this constant so configure gives a warning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1371) configure erroneously warns against using a new spidermonkey with old spidermonkeys

2011-12-27 Thread Paul Joseph Davis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176306#comment-13176306
 ] 

Paul Joseph Davis commented on COUCHDB-1371:


+1

 configure erroneously warns against using a new spidermonkey with old 
 spidermonkeys
 ---

 Key: COUCHDB-1371
 URL: https://issues.apache.org/jira/browse/COUCHDB-1371
 Project: CouchDB
  Issue Type: Bug
Reporter: Randall Leeds
Assignee: Randall Leeds
Priority: Minor
 Fix For: 1.2, 1.3, 1.1.2

 Attachments: 0001-fix-bad-configure-warning-on-old-SpiderMonkey.patch


 Paul added the check for JSOPTION_ANONFUNFIX in 7ce9e103e, but js-1.7 doesn't 
 have this constant so configure gives a warning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

2011-12-27 Thread Randall Leeds
On Tue, Dec 27, 2011 at 05:22, Jason Smith j...@iriscouch.com wrote:
 On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds randall.le...@gmail.com 
 wrote:
 Either
 these things have a proper place in the sequential history of a
 database or they do not. That there are things which affect update_seq
 but do not appear in the by_seq index and _changes feed feels like a
 mistake.

 The first sentence is, well, a tautology actually, but it asks the
 right question and the answer is they DO NOT belong. _changes shows
 data, not metadata. By definition, _changes is anything worth
 replicating.

That strikes me as incorrect. The _changes feed is purely metadata
unless ?include_docs=true is specified.


 But I hope my filesystem example above shows why it is okay to
 increment update_seq but not change by_seq.

You show a nice precedent for separating metadata and data, but
CouchDB has a decent precedent of avoiding this same thing. For
example, _id and _rev are in the returned document body rather than
part of the HTTP request (it could have been just URL and entity tag
headers only for this).


 The bug with update_seq is not that it it is too eager (increments for
 _security, _revs_limit), but it is not eager enough (it should bump
 for _local too).


I agree, but for different reasons. I think _local docs may have a
place in by_seq even if the default _changes request still only shows
the default, replicable documents.

 2. As a frequent consumer of _changes, I would prefer *not* to see
 _local documents, nor _security or other updates in there. They are
 metadata, not data. Maybe I misunderstood, but nobody wants to
 *replicate* _security objects or _local docs; they just want MVCC
 semantics (Adam on _security, IIRC) and a simplified API (me, on
 making all metadata a _local doc, and making _local docs full MVCC).

I think you misunderstand, maybe. In the case of BigCouch, MVCC is all
that's needed because the replication does not go over HTTP. I see no
reason to require that special care be taken to copy these objects
when a flag on the _changes feed might cause them to be transferred
very naturally. In particular, I would use this feature in a
hypothetical Lounge 3.0. It also means that with admin privileges we
could do full backup replications.

-R


Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

2011-12-27 Thread Jason Smith
On Wed, Dec 28, 2011 at 9:38 AM, Randall Leeds randall.le...@gmail.com wrote:
 On Tue, Dec 27, 2011 at 05:22, Jason Smith j...@iriscouch.com wrote:
 On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds randall.le...@gmail.com 
 wrote:
 Either
 these things have a proper place in the sequential history of a
 database or they do not. That there are things which affect update_seq
 but do not appear in the by_seq index and _changes feed feels like a
 mistake.

 The first sentence is, well, a tautology actually, but it asks the
 right question and the answer is they DO NOT belong. _changes shows
 data, not metadata. By definition, _changes is anything worth
 replicating.

 That strikes me as incorrect. The _changes feed is purely metadata
 unless ?include_docs=true is specified.

Yes, data and metadata are problematic words. I'll stop using them.

Do you agree that _changes is, by definition, anything worth replicating?

 But I hope my filesystem example above shows why it is okay to
 increment update_seq but not change by_seq.

 You show a nice precedent for separating metadata and data, but
 CouchDB has a decent precedent of avoiding this same thing. For
 example, _id and _rev are in the returned document body rather than
 part of the HTTP request (it could have been just URL and entity tag
 headers only for this).

Yeah that's a good point.

 The bug with update_seq is not that it it is too eager (increments for
 _security, _revs_limit), but it is not eager enough (it should bump
 for _local too).


 I agree, but for different reasons. I think _local docs may have a
 place in by_seq even if the default _changes request still only shows
 the default, replicable documents.

That's an interesting idea.

IMO, _security, _revs_limit, apply to a specific database and URL, and
consequently must never replicate. _local docs are those which don't
replicate. If _local would replicate, I'd worry about spurious
checkpoints spreading to where they don't belong; and unchecked
_security replication is even worse.

Your idea improves consistency and orthogonality. It also solves the
problem of how to enumerate _local docs. (AFAIK there is no way to
list them all, not via _all_docs, or _changes, or a view).

But it doesn't solve the larger problem: How to follow a _changes feed
and know when you have caught up. Both Bob N. and I independently did
the following for our projects:

1. GET /db and wrongly assume update_seq will appear in the changes feed
2. GET /db/_changes?feed=continuous
3. Break when a change has .seq = update_seq

Suppose you have step 0: Update _security or _revs_limit. The loop
will never break.

You propose (WLOG) _changes?comprehensive=true which guarantees a
change equal or greater than update_seq. That's cool, but IMO app
developers now have to add code to ignore irrelevant changes like
those containing replication checkpoints.

I propose (WLOG) update_sikh in the db header which is the seq id of
the latest *document* update. App developers modify their step 1 to
use update_sikh instead of update_seq.

Is that an accurate synopsis?

 2. As a frequent consumer of _changes, I would prefer *not* to see
 _local documents, nor _security or other updates in there. They are
 metadata, not data. Maybe I misunderstood, but nobody wants to
 *replicate* _security objects or _local docs; they just want MVCC
 semantics (Adam on _security, IIRC) and a simplified API (me, on
 making all metadata a _local doc, and making _local docs full MVCC).

 I think you misunderstand, maybe. In the case of BigCouch, MVCC is all
 that's needed because the replication does not go over HTTP. I see no
 reason to require that special care be taken to copy these objects
 when a flag on the _changes feed might cause them to be transferred
 very naturally. In particular, I would use this feature in a
 hypothetical Lounge 3.0. It also means that with admin privileges we
 could do full backup replications.

If couch could do this, then cool. But consider that both examples are
the same special-case: sharding and simulating a normal database API
when there are actually multiple parts. That sounds like an
application concern.

Is it really true that shards need the same _revs_limit as the
simulated whole? Maybe they really want _revs_limit /
number_of_shards?

Is it really necessary that _security be identical in each shard?
Actually, yes it is, because validate_doc_update uses it. But still...

How are you computing doc_count in the /db response? You have to sum
doc_count from each shard. But every shard needs a copy of every
design doc for validation. So you have to subtract those back out? My
broader point is, sharding applications already do lots of magic. I'm
not sure if replicating _security and _local docs buys you much.

But you've definitely persuaded me that your idea works. It is the
second-best proposal I've seen in this thread :)

-- 
Iris Couch