Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

2011-12-27 Thread Robert Dionne
What's interesting is that modulo one edge case, last_seq in the changes feed 
and update_seq in the db_info record are exactly as defined on the WIKI. 

update_seq:

Current number of updates to the database (int)

last_seq:

last_seq is the sequence number of the last update returned. (Currently it will 
always be the same as the seq of the last item in results.)

this holds true also when there are no changes to documents, the value of 
last_seq is zero. The one edge case (which is a bit odd) is seen when you 
retrieve last_seq using ?descending=truelimit=1. If there are no changes the 
value will still be zero unless you call _set_revs_limit first in which case 
the value will be one. The value will still be zero if the normal _changes is 
called with no args. What makes it odd is that calling _changes?descending... 
after a call to _set_revs_limit does not impact the value of last_seq. This is 
a bug.

So yes it's a bit weird but it does pretty much agree with the documentation. 
The quote I'm looking for is the one about angels on the head of a pin. 

I guess it needs more thought. In general I don't like metadata because I think 
it creates more things that need to be handled differently, adding complexity 
for the sake of something that doesn't exist (metadata).

Do you have any more swatches in magenta?



On Dec 27, 2011, at 12:04 AM, Randall Leeds wrote:

 On Mon, Dec 26, 2011 at 08:49, Jason Smith j...@iriscouch.com wrote:
 Hi, Bob. Thanks for your feedback.
 
 On Mon, Dec 26, 2011 at 12:24 PM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Jason,
 
  After looking into this a bit I do not think it's a bug, at most poor 
 documentation. update_seq != last_seq
 
 Nobody knows what update_seq means. Even a CouchDB committer got it wrong.
 
 Fine. It is poor documentation.
 
 Adding last_seq into db_info is not helpful because last_seq also does
 not mean what we think it means. My last email demonstrates that
 last_seq is in fact incoherent.
 
 snip
 
 On Mon, Dec 26, 2011 at 23:03, Benoit Chesneau bchesn...@gmail.com wrote:
 Mmm right that confusing (maybe except if you consider update_seq as a
 way to know the numbers of updates in the databases but in this case
 the wording is confiusing) . Imo changes seq  commited_seq should be
 quites the same. At least a changes seq should only happen when there
 is a doc update ie each time and only if a revision is created.  Does
 that make sense?
 
 - benoiît
 
 Yes it does. There is mostly consistent relationship between update
 sequence (seq, update_seq, last_seq, committed_seq) and the by_seq
 index. It seems entirely too confusing that there are things which
 affect update_seq but do not appear in the by_seq btree. That is just
 plain wrong, else a massive confusion of vocabulary. Benoit, I believe
 you are right to suggest that none of these sequences-related things
 should change unless a revision is created.
 
 Bear with me for I believe ther is a related discussion about
 replicability for _security, _local docs, etc. It's clear that there
 are clustering and operational motivations for making this information
 replicable, thus making them proper documents with a place in the
 by_seq index, in the _changes feed, and affecting update_seq. Either
 these things have a proper place in the sequential history of a
 database or they do not. That there are things which affect update_seq
 but do not appear in the by_seq index and _changes feed feels like a
 mistake. Placing additional metadata in the db header feels like
 rubbing salt in this wound.
 
 Right now only replicable documents surface in the _changes feed and
 are added to the by_seq btree but some other things affect the
 update_seq. I've just gone and checked, as described in my previous
 email, that none of these appear to require a change to update_seq for
 any technical reason, though Jason properly points out that it is
 perhaps useful for operational tasks such as knowing when to back up a
 .couch file.
 
 I see two reasonable ways forward.
 
 1) Stop incrementing update_seq for anything but replicable document changes
 2) Make things which already affect update_seq but do not appear in
 _changes appear there, likely by turning them into proper MVCC
 documents.
 
 Regarding option 1:
 This is easy. I already outlined how to do this. It requires removing
 about 3 characters from our codebase. However, it spits at Jason's
 operations concerns, which I think are quite valid, and misses an
 opportunity for great improvement.
 
 Regarding option 2:
 There is a cluster-aware use case, an operations use case, and, I
 think, a purity argument here. As for how to accomplish this feat
 without terrible API breakage, we get a lot of help from our URL
 structure. We have reserved paths which cannot conflict with documents
 so it does not create ambiguity if '{seq:20,id:_security, ...}'
 appears in a changes feed. However, I think _security is a bad name
 for this document because it requires

Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes

2011-12-26 Thread Robert Dionne
Jason,

  After looking into this a bit I do not think it's a bug, at most poor 
documentation. update_seq != last_seq  Most of the time it does but as we know 
now sometimes it doesn't. It's a different thing. Im not sure where else in 
the code we depend on update_seq reflecting all the changes to the database, 
perhaps as Randall suggests we might be able to *not* bump it in those other 
calls.

  Another way to handle this is hang on to the last_seq when a changes call is 
made and use that as a since parameter in the next call. This to me seems like 
what's needed in this use case anyway.

  In any event it's likely easy to add last_seq to the db_info record, and I'm 
more than happy to do that, we should open a new ticket for that.

Cheers,

Bob

  




On Dec 26, 2011, at 4:10 AM, Jason Smith wrote:

 Hi, Randall. Thanks for inviting me to argue a bit more. I hope you'll
 be persuaded that, if -1367 is not a bug, at least there is *some*
 bug.
 
 tl;dr summary:
 
 This is a real bug--a paper cut with a workaround, but still a real bug.
 
 1. Apps want a changes feed since 0, but they want to know when
 they've caught up (defined below)
 2. These apps (and robust apps generally) probably start out by
 pinging the /db anyway. Bob N. and I independently did so.
 3. update_seq looks deceptively like the sequence id of the latest
 change, and people assume so. They define caught up as receiving a
 change at or above this value. They expect to catch up in finite
 time, and even if the db receives no subsequent updates.
 4. In fact, CouchDB does not disclose the sequence id of the latest
 change in the /db response. To know that value:
  4a. If you want to process every change anyway, just get _changes
 and use last_seq
  4b. If you just want the last sequence id, query
 _changes?descending=truelimit=1
4b(1). If the response has a change, use its last_seq value
4b(2). If the response has no changes, ignore the last_seq value
 (it is really the update_seq) and use 0
 
 Step 3 is the major paper cut. That step 4 exists and is complicated
 is the minor paper cut.
 
 On Mon, Dec 26, 2011 at 5:36 AM, Randall Leeds (Commented) (JIRA)
 j...@apache.org wrote:
 
[ 
 https://issues.apache.org/jira/browse/COUCHDB-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175892#comment-13175892
  ]
 
 Randall Leeds commented on COUCHDB-1367:
 
 
 Wait a second. Robert, you are not fixing a bug in C-L, you are working 
 around a deficiency in CouchDB.
 
 Can't both be true?
 
 Only in the trivial sense. This ticket reveals that app
 developers--Henrik and me, but also a committer--misunderstand
 update_seq, thinking it is last_seq. last_seq is not easy to learn.
 
 Nope. You can not ever know. You always know the latest sequence number at 
 some arbitrarily recent point in time.
 
 Sorry, I cut corners and was not clear. Of course, nobody ever really
 knows anything except events in the very recent past. But I mean in
 the context of a _changes query one-two punch: get the last_seq, then
 begin a continuous feed since that value.
 
 The bug is that users cannot readily know the id of the most recent
 change. In fact, the id of the most recent change has no explicit
 label or name in the CouchDB interface. Neither update_seq nor
 last_seq mean exactly that.
 
 What if I want to see the most recent five changes? What if there are a 
 hundred million documents? What if 99% of the time, update_seq equals 
 last_seq and so developers assume it means something it doesn't?
 
 In order:
  * /_changes?descending=truelimit=5
 
 I stand corrected. I had forgotten about a descending changes query.
 That resolves the hundred-million-docs problem. (My erroneous point
 was, 100M docs makes it too expensive to learn last_seq.)
 
 But that response looks bizarre.
 
 GET /db/_changes?descending=true\limit=5
 {results:[
 {seq:22,id:after_3,changes:[{rev:1-0785e9eb543380151003dc452c3a001a}]},
 {seq:21,id:after_2,changes:[{rev:1-0785e9eb543380151003dc452c3a001a}]},
 {seq:20,id:after_1,changes:[{rev:1-0785e9eb543380151003dc452c3a001a}]},
 {seq:19,id:conc,changes:[{rev:2-584a4a504a97009241d2587fee8b5eb8}]},
 {seq:17,id:preload_create,changes:[{rev:1-28bf6cd8af83c40c6e3fb82b608ce98f}]}
 ],
 last_seq:17}
 
 last_seq is the *least recent* change. If you query with limit=1 then
 they will be equal, and that is nice. *Except* if there were no
 changes yet.
 
$ curl -X PUT localhost:5984/x
{ok:true}
 
$ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
{ok:true}
$ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
{ok:true}
$ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
{ok:true}
 
$ curl localhost:5984/x/_changes
{results:[
 
],
last_seq:0}
 
$ curl localhost:5984/x/_changes?descending=true
{results:[
 
],
last_seq:3}
 
 Weird.
 
  * Add additional information to the changes feed, 

Re: Understanding the CouchDB file format

2011-12-21 Thread Robert Dionne
) to cache frequently accessed data.
 
 I am trying to understand the logic used by CouchDB to answer a query
 using the index once updates to the tree have been appended to the data
 file... for example, consider a CouchDB datastore like the one Filipe
 has... 10 million documents and let's say it is freshly compacted.
 
 If I send in a request to that Couch instance, it hits the header of the
 data file along with the index and walks the B+ tree to the leaf node,
 where it finds the offset into the data file where the actual doc lives...
 let's say 1,000,000 bytes away.
 
 These B+ trees are shallow, so it might look something like this:
 
 Level 1: 1 node, root node.
 Level 2: 100 nodes, inner child nodes
 Level 3: 10,000 nodes, inner child nodes
 Level 4: 1,000,000, leaf nodes... all with pointers to the data offsets in
 the data file.
 
 Now let's say I write 10 updates to documents in that file. There are 10
 new revisions appended to the end of the data file *each one* separated by
 a rewritten B+ path to a leaf node with it's new location at the end of the
 file. Each of those paths written between each doc revision (say roughly 2k
 like Filipe mentioned) are just 4 item paths... root - level1 - level2 -
 level3 -- level4... showing the discrete path from the root to that
 individual updated doc. The intermediary levels (l1, 2, 3) are not fully
 flushed out with all the OTHER children from the original b+ tree index.
 
 [[ is this correct so far? If not, please point out my mistakes...]
 
 Now I issue a query for a document that WAS NOT updated...
 
  this is where I get confused on the logic ***
 
 this would mean I need to access the original B+ tree index at the root of
 the data file, because the revised B+ paths that are written between each
 of the updated doc revisions at the end of the file are not full indices.
 
 NOW consider I want to query for one of the changed docs... now I suddenly
 need to scan backwards from the data file's end to find the updated path to
 the new revision of that document.
 
 (obviously) this isn't what Couch is actually doing... it's doing
 something more elegant, I just can't figure out what or how and that is
 what I was hoping for help with.
 
 Much thanks guys, I know this is a heavy question to ask.
 
 Best wishes,
 R
 
 
 On Tue, Dec 20, 2011 at 1:35 PM, Robert Dionne 
 dio...@dionne-associates.com wrote:
 
 
 Robert Dionne
 Computer Programmer
 dio...@dionne-associates.com
 203.231.9961
 
 
 
 
 On Dec 20, 2011, at 3:27 PM, Riyad Kalla wrote:
 
 Filipe,
 
 Thank you for the reply.
 
 Maybe I am misunderstanding exactly what couch is writing out; the docs
 I've read say that it rewrites the root node -- I can't tell if the
 docs
 mean the parent node of the child doc that was changed (as one of the b+
 leaves) or if it means the direct path, from the root, to that
 individual
 doc... or if it means the *entire* index...
 
 In the case of even rewriting the single parent, with such a shallow
 tree,
 each internal leaf will have a huge fan of nodes; let's say 1-10k in a
 decent sized data set.
 
 If you are seeing a few K of extra written out after each changed doc
 then
 that cannot be write... I almost assumed my understanding was wrong
 because
 the sheer volume of data would make performance abysmal if it was true.
 
 Given that... is it just the changed path, from the root to the new leaf
 that is rewritten?
 
 Hi Riyad,
 
 You are correct, it's only the changed path. Interestingly I've just
 started to document all these internals[1] along with links to the code and
 other references available.
 
 Cheers,
 
 Bob
 
 
 [1] http://bdionne.github.com/couchdb/
 
 That makes me all sorts of curious as to how Couch
 updates/searches the new modified index with the small diff that is
 written
 out.
 
 Any pointers to reading that will help me dig down on this (even early
 bugs
 in JIRA?) would be appreciated. I've tried skimming back in 2007/08 on
 Damien's blog to see if it wrote about it in depth and so far haven't
 found
 anything as detailed as I am hoping for on this architecture.
 
 Best,
 Riyad
 
 On Tue, Dec 20, 2011 at 1:07 PM, Filipe David Manana 
 fdman...@apache.orgwrote:
 
 On Tue, Dec 20, 2011 at 6:24 PM, Riyad Kalla rka...@gmail.com wrote:
 I've been reading everything I can find on the CouchDB file format[1]
 and
 am getting bits and pieces here and there, but not a great, concrete,
 step-by-step explanation of the process.
 
 I'm clear on the use of B+ trees and after reading a few papers on the
 benefits of log-structured file formats, I understand the benefits of
 inlining the B+ tree indices directly into the data file as well
 (locality
 + sequential I/O)... what I'm flummoxed about is how much of the B+
 tree's
 index is rewritten after every modified document.
 
 Consider a CouchDB file that looks more or less like this:
 
 [idx/header][doc1, rev1][idx/header][doc1, rev2]
 
 After each revised doc is written and the b-tree root

Re: Understanding the CouchDB file format

2011-12-21 Thread Robert Dionne
Riyad,

Your welcome. At a quick glance your post has one error, internal nodes do 
contain values (from the reductions). The appendix in the couchdb book also 
makes this error[1] which I've opened a ticket for.

Cheers,

Bob


[1] https://github.com/oreilly/couchdb-guide/issues/450




On Dec 21, 2011, at 3:28 PM, Riyad Kalla wrote:

 Bob,
 
 Really appreciate the link; Rick has a handful of articles that helped a
 lot.
 
 Along side all the CouchDB reading I've been looking at SSD-optimized data
 storage mechanisms and tried to coalesce all of this information into this
 post on Couch's file storage format:
 https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv
 
 It is uncanny how many things Couch seems to have gotten right with regard
 to existing storage systems and future flash-based storage systems. I'd
 appreciate any corrections, additions or feedback to the post for anyone
 interested.
 
 Best,
 R
 
 On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne 
 dio...@dionne-associates.com wrote:
 
 I think this is largely correct Riyad, I dug out an old article[1] by Rick
 Ho that you may also find helpful though it might be slightly dated.
 Generally the best performance will be had if the ids are sequential and
 updates are done in bulk. Write heavy applications will eat up a lot of
 space and require compaction. At the leaf nodes what are stored are either
 full_doc_info records or doc_info records which store pointers to the data
 so the main thing that impacts the branching at each level are the key size
 and in the case of views the sizes of the reductions as these are stored
 with the intermediate nodes.
 
 All in all it works pretty well but as always you need to test and
 evaluate it for you specific case to see what the limits are.
 
 Regards,
 
 Bob
 
 
 [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html
 
 
 
 
 On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote:
 
 Adding to this conversation, I found this set of slides by Chris
 explaining
 the append-only index update format:
 http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed
 
 Specifically slides 16, 17 and 18.
 
 Using this example tree, rewriting the updated path (in reverse order)
 appended to the end of the file makes sense... you see how index queries
 can simply read backwards from the end of the file and not only find the
 latest revisions of docs, but also every other doc that wasn't touched
 (it
 will just seek into the existing inner nodes of the b+ tree for
 searching).
 
 What I am hoping for clarification on is the following pain-point that I
 perceive with this approach:
 
 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves
 to elements are short (typically no more than 3 to 5 levels deep) as
 opposed to a trie or some other construct that would have much longer
 paths
 to elements.
 
 2. Because the depth of the tree is so shallow, the breadth of it becomes
 large to compensate... more specifically, each internal node can have
 100s,
 1000s or more children. Using the example slides, consider the nodes
 [A...M] and [R...Z] -- in a good sized CouchDB database, those internal
 index nodes would have 100s (or more) elements in them pointing at deeper
 internal nodes that themselves had thousands of elements; instead of the
 13
 or so as implied by [A...M].
 
 3. Looking at slide 17 and 18, where you see the direct B+ tree path to
 the
 update node getting appended to the end of the file after the revision is
 written (leaf to root ordering: [J' M] - [A M] - [A Z]) it implies that
 those internal nodes with *all* their child elements are getting
 rewritten
 as well.
 
 In this example tree, it is isn't such a big issue... but in a
 sufficiently
 large CouchDB database, these nodes denoted by [A...M] and [A...Z] could
 be
 quite large... I don't know the format of the node elements in the B+
 tree,
 but it would be whatever the size of a node is times however many
 elements
 are contained at each level (1 for root, say 100 for level 2, 1000 for
 level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on
 here, of course it depends on the size of the data store).
 
 Am I missing something or is CouchDB really rewriting that much index
 information between document revisions on every update?
 
 What was previously confusing me is I thought it was *only* rewriting a
 direct path to the updated revision, like [B][E][J'] and Couch was
 some-how patching in that updated path info to the B+ index at runtime.
 
 If couch is rewriting entire node paths with all their elements then I am
 no longer confused about the B+ index updates, but am curious about the
 on-disk cost of this.
 
 In my own rough insertion testing, that would explain why I see my
 collections absolutely explode in size until they are compacted (not
 using
 bulk insert, but intentionally doing single inserts for a million(s) of
 docs to see what kind of cost the index path duplication would

Re: Understanding the CouchDB file format

2011-12-20 Thread Robert Dionne

Robert Dionne
Computer Programmer
dio...@dionne-associates.com
203.231.9961




On Dec 20, 2011, at 3:27 PM, Riyad Kalla wrote:

 Filipe,
 
 Thank you for the reply.
 
 Maybe I am misunderstanding exactly what couch is writing out; the docs
 I've read say that it rewrites the root node -- I can't tell if the docs
 mean the parent node of the child doc that was changed (as one of the b+
 leaves) or if it means the direct path, from the root, to that individual
 doc... or if it means the *entire* index...
 
 In the case of even rewriting the single parent, with such a shallow tree,
 each internal leaf will have a huge fan of nodes; let's say 1-10k in a
 decent sized data set.
 
 If you are seeing a few K of extra written out after each changed doc then
 that cannot be write... I almost assumed my understanding was wrong because
 the sheer volume of data would make performance abysmal if it was true.
 
 Given that... is it just the changed path, from the root to the new leaf
 that is rewritten?

Hi Riyad,

You are correct, it's only the changed path. Interestingly I've just started to 
document all these internals[1] along with links to the code and other 
references available. 

Cheers,

Bob


[1] http://bdionne.github.com/couchdb/

 That makes me all sorts of curious as to how Couch
 updates/searches the new modified index with the small diff that is written
 out.
 
 Any pointers to reading that will help me dig down on this (even early bugs
 in JIRA?) would be appreciated. I've tried skimming back in 2007/08 on
 Damien's blog to see if it wrote about it in depth and so far haven't found
 anything as detailed as I am hoping for on this architecture.
 
 Best,
 Riyad
 
 On Tue, Dec 20, 2011 at 1:07 PM, Filipe David Manana 
 fdman...@apache.orgwrote:
 
 On Tue, Dec 20, 2011 at 6:24 PM, Riyad Kalla rka...@gmail.com wrote:
 I've been reading everything I can find on the CouchDB file format[1] and
 am getting bits and pieces here and there, but not a great, concrete,
 step-by-step explanation of the process.
 
 I'm clear on the use of B+ trees and after reading a few papers on the
 benefits of log-structured file formats, I understand the benefits of
 inlining the B+ tree indices directly into the data file as well
 (locality
 + sequential I/O)... what I'm flummoxed about is how much of the B+
 tree's
 index is rewritten after every modified document.
 
 Consider a CouchDB file that looks more or less like this:
 
 [idx/header][doc1, rev1][idx/header][doc1, rev2]
 
 After each revised doc is written and the b-tree root is rewritten
 after
 that, is that just a modified root node of the B+ tree or the entire B+
 tree?
 
 The reason I ask is because regardless of the answer to my previous
 question, for a *huge* database will millions of records, that seems like
 an enormous amount of data to rewrite after every modification. Say the
 root node had a fanning factor of 133; that would still be alot of data
 to
 rewrite.
 
 Hi Riyad,
 
 Have you observed that in practice?
 
 Typically the depth of database btrees is not that high even for
 millions of documents. For example I have one around with about 10
 million documents which doesn't have more than 5 or 6 levels if I
 recall correctly.
 
 So updating a doc, for that particular case, means rewriting 5 or 6
 new nodes plus the document itself. Each node is normally not much
 bigger than 1.2Kb.
 
 I've written once a tool to analyze database files which reports btree
 depths, however it's not updated to work with recent changes on
 master/1.2.x such as snappy compression and btree sizes:
 
 https://github.com/fdmanana/couchfoo
 
 It should work with CouchDB 1.1 (and older) database files.
 
 
 I am certain I am missing the boat on this; if anyone can pull me out of
 the water and point me to dry land I'd appreciate it.
 
 Best,
 R
 
 
 
 [1]
 --
 
 http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=%5B%22CouchDB-Implements-a-Fundamental-Algorithm%22%5D
 -- http://horicky.blogspot.com/2008/10/couchdb-implementation.html
 -- http://blog.kodekabuki.com/post/132952897/couchdb-naked
 -- http://guide.couchdb.org/editions/1/en/btree.html
 -- http://ayende.com/blog/* (Over my head)
 
 
 
 --
 Filipe David Manana,
 
 Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
 



Re: The perfect logger for development

2011-12-07 Thread Robert Dionne
One of the things we do in BigCouch is attach a unique identifier to the 
request also, so that we can correlate a given request with other log messages 
that may appear from other internal components. We call it an X-Request-ID or 
some such thing and users can curl -v and tell us what that is. It's great 
for debugging




On Dec 6, 2011, at 11:24 PM, Jason Smith wrote:

 Brilliant, thanks!
 
 I think this is possible if the Req object is part of the log object.
 Then a formatter can access it there. Off the top of my head, it would
 have access to the source IP address, the HTTP method, the path and
 query string, and headers.
 
 On Wed, Dec 7, 2011 at 10:44 AM, kowsik kow...@gmail.com wrote:
 As a CouchDB administrator, I would want *all* exception dumps to be
 prefaced by the inbound request URL with the query parameters
 (assuming it's a web request that caused the exception). There are
 case where I've seen a stack trace but couldn't tell which inbound
 request caused the problem.
 
 K.
 ---
 http://blitz.io
 @pcapr
 
 On Tue, Dec 6, 2011 at 5:51 PM, Jason Smith j...@iriscouch.com wrote:
 Hi, all. Iris Couch urgently needs improved logging facilities in
 CouchDB. My goal is to make something we all love and get it accepted
 upstream, God willing. Or committers willing. But I repeat myself!
 
 This is the brainstorming and requirements gathering phase. In the
 CouchDB of your dreams, logging system fits you like an old pair of
 sneakers. It's perfect. Now, what characteristics does that system
 exhibit? I will compile feedback into a spec on the wiki.
 
 I hope to avoid bikeshedding. Seriously, please don't even mention a
 product or project by name. At this phase I hope to stick to
 descriptions of functionality, goals, and non-goals. I want to
 evaluate tools later.
 
 To start the discussion: logging is viewed differently based on your
 relationship with CouchDB:
 
 1. Developers of CouchDB
 2. Administrators of a couch
 3. Application developers
 
 My roles are administration, and a little bit of development.
 Requirements, in no order.
 
 * Idiomatic Erlang
 
 * Is a compelling place for new people to contribute. Miguel de Icaza
 talks about this. It's not enough that the source code is public. You
 have to provide a smooth on-ramp, where people people get great bang
 for their buck. They write a modest feature, and are rewarded by
 seeing its effects immediately. In other words: plugins. Or maybe a
 behaviour. Or some way to swap in formatters and data sinks. I don't
 want to write a Loggly target (http://loggly.com). Loggly should be
 begging me to merge their module.
 
 * 1st cut, no change to the default behavior. You still get the that
 peculiar log file you know and love. People are parsing their log
 files, and might expect version 1.x not to change.
 
 * Existing code still works. No sweeping changes hitting every
 ?LOG_INFO and ?LOG_DEBUG.
 
 (Filipe, would you please share your thoughts on these? I think you
 struggled with the conflict between them recently.)
 * No performance impact (non-blocking)...
 * ... but also, impossible or difficult to overwhelm or crash or lose logs.
 
 (The next few points sort of fit together)
 
 * Logs are not strings, but data structures, with data (the log
 message) and metadata (severity, line number, maybe the call stack,
 etc.)
 
 * More log levels. Roughly: trace, debug, info, warn, error, fatal
 
 * Maybe automatic trace logs upon function entry/exit with return
 values. Not sure if this is doable. Maybe a compile option, for `make
 dev`
 
 * When you log something, your module, line number, and maybe PID are known
 
 * Components or categories, or tags, where multiple .erl files or
 individual log events can share a common property (http, views,
 
 * A policy decides what to do with logs based on level, module, or
 component. You can set the policy either via configuration or
 programatically.
 
 * There is a formatter API to serialize log data. Built-in formatters
 include the legacy format, and Jan's Apache httpd combined format.
 
 * There is a transport API to receive logs and DTRT.
 
 * I know this is insane, but kill -HUP pid should make couch reopen
 its log files. Okay, I'll settle down now.
 
 = Non Goals =
 
 * Log rotation. I have never seen a rotation feature in an application
 which was better than the OS tools. And you can get problem where both
 the server and the OS are rotating the same logs. I have seen that
 happen, twice. Or was it once? Of course, people could write a
 rotating file transport.
 
 --
 Iris Couch
 
 
 
 -- 
 Iris Couch



Re: [VOTE] Apache CouchDB new docs proposal

2011-11-26 Thread Robert Dionne



On Nov 26, 2011, at 2:47 PM, Dave Cottlehuber wrote:

 On 26 November 2011 14:25, Robert Dionne dio...@dionne-associates.com wrote:
 +1 for Latex
 
 Hi Robert, all,
 
 Thanks for taking the time to read all that!
 
 Specific design  tools aside, are you willing to support at least the
 principle of upgrading/improvement of the documentation?

Yes of course, I'm not fundamentally opposed to any technology. I was only 
seconding the suggestion of the use of LaTEX as I find it superior to docbook 
in every way. It seems to me that if we want
the programmers to contribute a lot of this content then something like 
Markdown, Org, or even just plain text would be the easiest

 
 Or are you fundamentally against docbook? Personally, I am agnostic on
 the tool but I would like to know that I can contribute something that
 won't require rework in future, and I'll happily learn tool X to
 support that.
 
 I see little point in counting a few +1 votes and then making
 wholesale changes; this should be a consensus otherwise I'd rather
 revert to incremental changes to the wiki.
 
 A+
 Dave



Re: Proposal for Intro to CouchDB Coding class

2011-11-23 Thread Robert Dionne
+5  - excellent idea!

Right now the couchdb code is pretty much self-documenting, which makes it 
pretty hard for new adopters, and there's no independent documentation of the 
critical pieces outside the code (file formats, etc..). So every bug is a new 
adventure. 

Perhaps as part of this course, these guided tours might result in more 
documentation in the code. I find having as much as possible with the code 
increases the likelihood that it stays current.

With respect to Erlang I found the Armstrong book very readable.

Anyway I'm happy to help with the ask the developer forum. 


On Nov 23, 2011, at 1:24 AM, Joan Touzet wrote:

 Hello CouchDB Developers,
 
 Based on an informal survey of CouchDB users who are interested in
 contributing to the project, two key items tend to hold people back:
 
  1. Knowing Erlang (and the CouchDB coding style)
  2. Knowing the CouchDB code base
 
 So I decided to further my own grad research in Education, and
 contribute back to CouchDB, by volunteering to coordinate a class for
 6-20 students.
 
 ** I'd like to propose an Introduction to CouchDB Programming course,
 kicking off January 5, 2012, and ask for support from the current devs
 on this list.
 
 This won't be a traditional classroom course! Students themselves will
 be shaping the direction of the course, the topics covered, and will be
 expected to lead at least one week of online discussion. (I'll be
 providing the pedagogical framework for this Collaborative Learning
 model. This is my area of active research.)
 
 The idea is that, by the end of course (10 weeks or so), participants
 will have learned enough Erlang to have basic competency, and enough
 about the CouchDB code base to contribute. The final exam would be
 completing and submitting some number of patches from the outstanding
 bin of bugs in JIRA.
 
 ** I NEED YOUR HELP in two ways:
 
  A. Suggestions for good reference material (e.g. learnyousomeerlang)
  B. Volunteers from the current devs to conduct a guided tour of
 1 or more parts of the code
 
 The guided tours are the essential bits for this class to be
 successful, and I'd like them as much as possible to be accurate and
 accessible to newbs. These tours could take many forms:
 
  * A screencast of you talking about some code, e.g. ScreenFlow
  * A live walkthrough over Adobe Connect video (time donated by my
University dep't for the class)
  * IRC-based runthrough
  * Ask the developer - respond to questions about code on the class
forum
  * You fly everyone out to your house for dinner :) Etc.
 
 ** If you're willing to help out, please reply on or off list and let me
 know. Let's grow the contributor community!
 
 All the best,
 -- 
 Joan Touzet  |  jo...@atypical.net  |  wohali most other places



Re: [jira] [Commented] (COUCHDB-1342) Asynchronous file writes

2011-11-18 Thread Robert Dionne

On Nov 17, 2011, at 10:06 PM, Damien Katz (Commented) (JIRA) wrote:

 
[ 
 https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152608#comment-13152608
  ] 
 
 Damien Katz commented on COUCHDB-1342:
 --
 
 Paul, what I mean by Apache users concerns is that #3 isn't something that 
 vanilla Apache CouchDB users deal with, but third parties who modify the code 
 or embed in interesting way might (I suppose Cloudant has to deal with this). 
 Perhaps I'm mistaken about that. I do think patches should only be concerned 
 with the vanilla use cases in order to be considered check-in quality.
 
 #4 is a style issue, not a correctness issue, or at least you haven't made a 
 case that it's a correctness issues. I have no problems with you changing it 
 to a style you prefer, but we should not expect that submitters of patches 
 conform to an undocumented style.
 
 There is no urgency around this patch, at Couchbase we can keep adding 
 performance enhancements and drift our codebase further and further from 
 Apache. I don't want to see that happen, but it only hurts the Apache project.

Damien,

I agree with both these points, your codebase at Couchbase is drifting but 
you're not alone in that, we do need a culture where more correct fast code is 
checked in. I've only had a couple of days to look at this and I've not had the 
time to read your Couchbase work. As I look at this patch almost every concern 
Paul is raising is technically valid. We do have to consider more than the 
vanilla CouchDB as it gets embedded in BigCouch for example, and CouchDB is 
designed to be distributed, right? I first ran a simple test, adding 10K empty 
docs, and notice a 40K difference in the db file size. Probably harmless, but I 
don't know why. There's no real way to independently verify if this patch 
changes the db layout other than via the semantics of the code.

Databases are hard, as you mention, very hard. Without good performance they 
are next to useless, but a lack correctness is also problematic, certainly in 
some domains. I share other's frustration with patches languishing. The patches 
to date I've submitted have all been small and have often had to be refactored 
as the code migrated away (I think I have 3 now, 2 of them bugs). COUCHDB-911 
for example is a real bug, involving both couch_db and couch_db_updater, and as 
Adam notes is not just a bulk docs issue. It reports a conflict but adds data 
to the db anyway. Can you believe that? I tried a couple of fixes to minimize 
the surface area touched but there was no real way to solve it correctly 
without adding to the data structures. When I saw this patch my first reaction 
was wow, but now I'll have to rework 911 again as your patch also touches the 
same files. It's totally orthogonal so no big deal.

I mention this only to point out that the review process is awesome and when 
taken seriously makes for a better result. This isn't just people's pet 
concerns. It takes time to do this. Fortunately it's not rocket science, it's 
just databases. The solution to the culture problem is best practices. Best 
practices have to be practiced, and someone (Jan as the project lead I'm 
looking at you :) needs to crack the whip and set the tone. Of course I'm 
assuming that we're talking about a process to produce production quality 
code. I quote production as that phrase has evolved considerably over the 
years. If master is deemed acceptable for prototypes, proofs of concept, etc. 
then fine but otherwise I'd suggest we follow Randall's lead and work this 
patch on a branch first. Anyway, 'm sure you know these things, I don't mean to 
prattle on. 

Best Regards,

Bob

 And I do see we have some culture problems in the Apache project. We need a 
 culture where useful, correct, fast code is verified and checked in, and then 
 is improved incrementally. Right now we have a culture of everyone's pet 
 concerns must addressed before code gets checked in, which is demoralizing 
 and slows things down, which is a very big problem the project has right now. 
 I want your help in trying to change that.
 
 Asynchronous file writes
 
 
Key: COUCHDB-1342
URL: https://issues.apache.org/jira/browse/COUCHDB-1342
Project: CouchDB
 Issue Type: Improvement
 Components: Database Core
   Reporter: Jan Lehnardt
Fix For: 1.3
 
Attachments: COUCHDB-1342.patch
 
 
 This change updates the file module so that it can do
 asynchronous writes. Basically it replies immediately
 to process asking to write something to the file, with
 the position where the chunks will be written to the
 file, while a dedicated child process keeps collecting
 chunks and write them to the file (and batching them
 when possible). After issuing a series of write request
 to the file 

Re: [VOTE] Apache CouchDB 1.1.1 Release, Round 2

2011-10-21 Thread Robert Dionne
+0

OS X 10.7.2
Erlang R14B

make distcheck is fine

only two tests fail this time, changes and cookie_auth





On Oct 20, 2011, at 1:44 PM, Robert Newson wrote:

 This is the second release vote for Apache CouchDB 1.1.1
 
 Changes since round 1;
 
 * Fix object sealing with SpiderMonkey 1.7.0
 * Update CHANGES/NEWS to reflect COUCHDB-1129
 * Fix JavaScript CLI test runner
 
 We encourage the whole community to download and test these release
 artifacts so that any critical issues can be resolved before the release
 is made. Everyone is free to vote on this release. Please report your
 results and vote to this thread.
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~rnewson/dist/1.1.1/
 
 Instructions for validating the release tarball can be found here:
 
 http://people.apache.org/~rnewson/dist/
 
 Instructions for testing the build artefacts can be found here:
 
 http://wiki.apache.org/couchdb/Test_procedure
 
 These artifacts have been built from the 1.1.1 tag in Git:
 
 apache-couchdb-1.1.1.tar.gz
 apache-couchdb-1.1.1.tar.gz.md5
 apache-couchdb-1.1.1.tar.gz.asc
 apache-couchdb-1.1.1.tar.gz.sha
 
 Test ALL the things.
 
 B.



Re: Tweaking the release procedure

2011-10-21 Thread Robert Dionne


On Oct 21, 2011, at 12:33 PM, Paul Davis wrote:

 On Fri, Oct 21, 2011 at 4:28 AM, Jukka Zitting jukka.zitt...@gmail.com 
 wrote:
 Hi,
 
 My 2c from the gallery. I'm not involved in CouchDB, so just making
 general observations from the perspective of other Apache projects
 interested in using Git.
 
 On Fri, Oct 21, 2011 at 5:51 AM, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 As Noah points out, there are ASF procedural issues that affect this
 discussion. Part of making a release involves getting community input
 on whether the release is a valid artefact. As such we need to be able
 to refer to these not-release sets of bytes.
 
 I'd say that's a perfectly valid use of tags. An official release
 should be backed by a tag, but there's no requirement for the reverse.
 Using tags for release candidates or other milestones should also be
 fine. It should be up to each project to decide how they want to name
 and manage tags.
 
 I also find the idea of renaming a release tag after the vote
 completes a bit troublesome. The way I see it, a release manager will
 tag a given state of the source tree and use that tag to build the
 release candidate. After that no repository changes should be required
 regardless of the result of the release vote. If the vote passes, the
 candidate packages are pushed to www.apache.org/dist as-is. Otherwise
 the release candidate is just dropped and the next one made.
 
 This kind of a workflow also solves the 1.1.1 vs. 1.1.1-rc1 problem.
 If each release candidate is given a separate new tag and version
 number (i.e. 1.1.1 vs 1.1.2), then there can be no confusion about
 which build is being tested. Version numbers are cheap.
 
 BR,
 
 Jukka Zitting
 
 
 Are there projects that do this version incrementing when a vote
 fails? That's an idea I haven't heard before.

I think this is pretty common





Re: [VOTE] Apache CouchDB 1.1.1 Release

2011-10-20 Thread Robert Dionne
Interesting, this patch seems like a worthwhile thing to do regardless of the 
tests, if I understand it correctly. If restart cause the response to not be 
sent, then sending a 202 first will help at least the caller to know the 
restart was initiated.




On Oct 20, 2011, at 1:54 PM, J. Lee Coltrane wrote:

 
 FWIW, the patch attached to COUCHDB-1310 
  (https://issues.apache.org/jira/browse/COUCHDB-1310)
 
 will fix a great many (all, afaik) of the futon test hangs (the cases where
 the tests get stuck, and never complete).  Without this patch, I was 
 never able to get a complete run through the browser tests in 1.1.1 RC1.
 With the patch, I still get test failures, but at least I can get through all 
 the 
 tests without restarting the browser.
 
 The patch is tiny -- it just swaps the order of two lines of code, in the 
 '/_restart' handler, so that the http response gets written *before* the 
 server 
 is restarted (rather than after).
 
 As test instability continues to be a hot topic, maybe this patch is worth 
 considering for inclusion in the next 1.1.1 RC?  
 
 -Lee
 
 
 
 On Oct 20, 2011, at 12:25 PM, Benoit Chesneau wrote:
 
 On Thu, Oct 20, 2011 at 6:23 PM, Robert Newson rnew...@apache.org wrote:
 Hi All,
 
 Thanks for all the responses so far. Unfortunately I am aborting this round.
 
 It turns out there is a serious bug in this 1.1.1 candidate when using
 SpiderMonkey 1.7.0. Instead of sealing the 'doc' parameter to views,
 we seal the object that defines the seal function, which then causes
 all kinds of 'X is read-only' events.
 
 It's a one word fix, so a new 1.1.1 candidate will be out very soon,
 and it should not invalidate any of these results.
 
 B.
 
 
 :(
 
 It would worth to look at this erlang warning too imo. Hopefully i
 will have some wifi at the hotel tonight.I will see if I can make it.
 
 - benoit
 



Re: [VOTE] Apache CouchDB 1.1.1 Release

2011-10-19 Thread Robert Dionne
+0

make distcheck runs fine, all etaps pass

Futon tests fail in FF -- server admin gets out of whack at replicator and all 
tests thereafter to the end fail
Chrome -- same problem, this time the fails start at cookie_auth -- but it 
appears to be the same issue

all the usual remedies, clearing the cache, wiping the filesystem and starting 
over, all fail.

I'm sure it's the usual browser problems, I do notice though that it's been a 
while since I've seen them all pass







On Oct 19, 2011, at 10:27 AM, Robert Newson wrote:

 This is the release vote for Apache CouchDB 1.1.1
 
 Changes in this release:
 
 * Support SpiderMonkey 1.8.5
 * Add configurable maximum to the number of bytes returned by _log.
 * Allow CommonJS modules to be an empty string.
 * Bump minimum Erlang version to R13B02.
 * Do not run deleted validate_doc_update functions.
 * ETags for views include current sequence if include_docs=true.
 * Fix bug where duplicates can appear in _changes feed.
 * Fix bug where update handlers break after conflict resolution.
 * Fix bug with _replicator where include filter could crash couch.
 * Fix crashes when compacting large views.
 * Fix file descriptor leak in _log
 * Fix missing revisions in _changes?style=all_docs.
 * Improve handling of compaction at max_dbs_open limit.
 * JSONP responses now send text/javascript for Content-Type.
 * Link to ICU 4.2 on Windows.
 * Permit forward slashes in path to update functions.
 * Reap couchjs processes that hit reduce_overflow error.
 * Status code can be specified in update handlers.
 * Support provides() in show functions.
 * _view_cleanup when ddoc has no views now removes all index files.
 * max_replication_retry_count now supports infinity.
 * Fix replication crash when source database has a document with empty ID.
 * Fix deadlock when assigning couchjs processes to serve requests.
 * Fixes to the document multipart PUT API.
 
 We encourage the whole community to download and test these release
 artifacts so that any critical issues can be resolved before the release
 is made. Everyone is free to vote on this release. Please report your
 results and vote to this thread.
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~rnewson/dist/1.1.1/
 
 Instructions for validating the release tarball can be found here:
 
 http://people.apache.org/~rnewson/dist/
 
 Instructions for testing the build artefacts can be found here:
 
 http://wiki.apache.org/couchdb/Test_procedure
 
 These artifacts have been built from the 1.1.1 tag in Git:
 
 apache-couchdb-1.1.1.tar.gz
 apache-couchdb-1.1.1.tar.gz.md5
 apache-couchdb-1.1.1.tar.gz.asc
 apache-couchdb-1.1.1.tar.gz.sha
 
 Since you have read this far, you MUST vote.



Re: Starting the Git Experiment

2011-09-23 Thread Robert Dionne
+1


On Sep 23, 2011, at 1:52 PM, Paul J. Davis wrote:

 Dear committers, 
 
 We now have a green light from infrastructure to switch to using Git as our 
 writable VCS. This is to be considered a live experiment. If something breaks 
 its possible we'll have to revert back to SVN. But nothing will break and 
 everyone will forgive me for any bugs that may crop up.
 
 If there are no objections I would like to switch over soonish. Normally I 
 would say Monday to give people a chance to respond to this email but we've 
 had quite a few discussions on switching to Git already and no one has voiced 
 opposition. Seeing as that's the case if I get a majority of +1's from the 
 committers I'll start disabling SVN access as soon as I see the majority vote.
 
 Paul Davis 
 



Re: The replicator needs a superuser mode

2011-08-16 Thread Robert Dionne
No objection, just the question of why the need for a new role, why not use 
admin?



On Aug 16, 2011, at 2:10 PM, Adam Kocoloski wrote:

 Wow, this thread got hijacked a bit :)  Anyone object to the special role 
 that has the skip validation superpower?
 
 Adam
 
 On Aug 16, 2011, at 1:51 PM, Jan Lehnardt wrote:
 
 Both rsync an scp won't allow me to do curl http://couch/db/_dump | curl 
 http://couch/db/_restore.
 
 I acknowledge that similar solutions exist, but using the http transport 
 allows for more fun things down the road.
 
 See what we are doing with _changes today where DbUpdateNotifications nearly 
 do the same thing.
 
 Cheers
 Jan
 --
 
 On 16.08.2011, at 19:13, Nathan Vander Wilt nate-li...@calftrail.com wrote:
 
 We've already got replication, _all_docs and some really robust on-disk 
 consistency properties. For shuttling raw database files between servers, 
 wouldn't rsync be more efficient (and fit better within existing sysadmin 
 security/deployment structures)?
 -nvw
 
 
 On Aug 16, 2011, at 9:55 AM, Paul Davis wrote:
 Me and Adam were just mulling over a similar endpoint the other night
 that could be used to generate plain-text backups similar to what
 couchdb-dump and couchdb-load were doing. With the idea that there
 would be some special sauce to pipe from one _dump endpoint directly
 into a different _load handler. Obvious downfall was incremental-ness
 of this. Seems like it'd be doable, but I'm not entirely certain on
 the best method.
 
 I was also considering this as our full-proof 100% reliable method for
 migrating data between different CouchDB versions which we seem to
 screw up fairly regularly.
 
 +1 on the idea. Not sure about raw couch files as it limits the wider
 usefulness (and we already have scp).
 
 On Tue, Aug 16, 2011 at 10:24 AM, Jan Lehnardt j...@apache.org wrote:
 This is only slightly related, but I'm dreaming of /db/_dump and 
 /db/_restore endpoints (the names don't matter, could be one with GET / 
 PUT) that just ships verbatim .couch files over HTTP. It would be for 
 admins only, it would not be incremental (although we might be able to 
 add that), and I haven't yet thought through all the concurrency and 
 error case implications, the above solves more than the proposed problem 
 and in a very different, but I thought I throw it in the mix.
 
 Cheers
 Jan
 --
 
 On Aug 16, 2011, at 5:08 PM, Robert Newson wrote:
 
 +1 on the intention but we'll need to be careful. The use case is
 specifically to allow verbatim migration of databases between servers.
 A separate role makes sense as I'm not sure of the consequences of
 explicitly granting this ability to the existing _admin role.
 
 B.
 
 On 16 August 2011 15:26, Adam Kocoloski kocol...@apache.org wrote:
 One of the principal uses of the replicator is to make this database 
 look like that one.  We're unable to do that in the general case today 
 because of the combination of validation functions and out-of-order 
 document transfers.  It's entirely possible for a document to be saved 
 in the source DB prior to the installation of a ddoc containing a 
 validation function that would have rejected the document, for the 
 replicator to install the ddoc in the target DB before replicating the 
 other document, and for the other document to then be rejected by the 
 target DB.
 
 I propose we add a role which allows a user to bypass validation, or 
 else extend that privilege to the _admin role.  We should still 
 validate updates by default and add a way (a new qs param, for 
 instance) to indicate that validation should be skipped for a 
 particular update.  Thoughts?
 
 Adam
 
 
 
 



Re: Futon Test Suite

2011-08-14 Thread Robert Dionne
Paul,

  This is interesting, and if you're willing to put together the new 
infrastructure I can help with writing tests. I would suggest a more 
incremental approach that's less of a rewrite (rewrites often just get you back 
to 0 from a user's perspective). 

   The existing CouchDB JS object seems to work ok in terms of the http 
interface, and the Futon tests more or less all ran using couchjs until very 
recently. I would suggest getting these all running first, reusing copies of 
the existing CouchDB objects and such so we can hack them as needed. Then we 
would review and throw out all the tests that are not part of the core APIs, 
like the coffee stuff (I don't know why we decided to bundle coffee in there) 
and any tests that are for specific internals.

   At some point something like BigCouch is integrated in or MobileCouch we 
might have different make targets for the different deployments. Perhaps in 
that case we'd have different sets of tests. There needs to be a set of tests 
that can verify that the semantics of API calls is the same in CouchDB and 
BigCouch.

  So I'd say let's work backwards from what we have. Also I'm not a big fan of 
etap, preferring eunit mainly because it's one less moving part. For JS we 
already have this T(...) and TEquals() funs which seem to do the trick.

   All that said, I have a few hours today to hack on this today if you want 
some help just ping me on #couchdb

Bob




On Aug 12, 2011, at 11:46 AM, Paul Davis wrote:

 Here's a bit of a brain dump on the sort of environment I'd like to
 see our CLI JS tests have. If anyone has any thoughts I'd like to hear
 them. Otherwise I'll start hacking on this at some point over the
 weekend.
 
 https://gist.github.com/1142306



Re: Futon Test Suite

2011-08-14 Thread Robert Dionne




On Aug 14, 2011, at 12:55 PM, Paul Davis wrote:

 My plan was to rewrite couch.js to use the new request/response
 classes internally and then when we need closer HTTP access we'd be
 able to have it. Same for T and Tequals. and what not. There is at
 least one test that we just can't make work in our current couchjs
 based test runner because it needs to use async HTTP requests, so at a
 certain point we have to at least add some of this stuff.
 
 I quite like using etap over eunit as it seems more better. Also, now
 that we're going to a second language for make check tests, it seems
 like an even better approach. Though I'm not at all married to it by
 any means. Also, I do understand your concerns about moving parts and

That's fine with me, I'm not impressed with etap but it seems to have worked 
out well so far. By
moving parts I mean the usual thing: the more moving parts, third party libs, 
etc. the more things
to get right and make work on various machines. Eunit comes bundled with OTP


 uncessesary dependencies. I should get around to updating the build
 system to use the single file etap distribution but its never really
 been a concern.
 
 Another thing I've been contemplating is if it'd be beneficial to
 remove libcurl and replace it with node.js's parser or with the ragel
 parser from Mongrel. Anyway, food for thought. I'll be around this
 afternoon to hack.
 
 On Sun, Aug 14, 2011 at 7:50 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Paul,
 
  This is interesting, and if you're willing to put together the new 
 infrastructure I can help with writing tests. I would suggest a more 
 incremental approach that's less of a rewrite (rewrites often just get you 
 back to 0 from a user's perspective).
 
   The existing CouchDB JS object seems to work ok in terms of the http 
 interface, and the Futon tests more or less all ran using couchjs until very 
 recently. I would suggest getting these all running first, reusing copies of 
 the existing CouchDB objects and such so we can hack them as needed. Then we 
 would review and throw out all the tests that are not part of the core APIs, 
 like the coffee stuff (I don't know why we decided to bundle coffee in 
 there) and any tests that are for specific internals.
 
   At some point something like BigCouch is integrated in or MobileCouch we 
 might have different make targets for the different deployments. Perhaps 
 in that case we'd have different sets of tests. There needs to be a set of 
 tests that can verify that the semantics of API calls is the same in CouchDB 
 and BigCouch.
 
  So I'd say let's work backwards from what we have. Also I'm not a big fan 
 of etap, preferring eunit mainly because it's one less moving part. For JS 
 we already have this T(...) and TEquals() funs which seem to do the 
 trick.
 
   All that said, I have a few hours today to hack on this today if you want 
 some help just ping me on #couchdb
 
 Bob
 
 
 
 
 On Aug 12, 2011, at 11:46 AM, Paul Davis wrote:
 
 Here's a bit of a brain dump on the sort of environment I'd like to
 see our CLI JS tests have. If anyone has any thoughts I'd like to hear
 them. Otherwise I'll start hacking on this at some point over the
 weekend.
 
 https://gist.github.com/1142306
 
 



Re: Futon Test Suite

2011-08-09 Thread Robert Dionne
 
 
 Also, I've been thinking more and more about beefing up the JavaScript
 test suite runner and moving more of our browser tests over to
 dedicated code in those tests. If anyone's interested in hacking on
 some C and JavaScript against an HTTP API, let me know.


Paul,

  Jan and I talked about this a few times and I started a branch[1] along that 
idea. So far all I did was make a copy of the then current Futon tests into
test/javascript/test  and started looking at the small handful that fail. 

   The browser tests are great (any test is good) but they have too many 
browser  dependent quirks, or at least I assume that because of the pleasant 
surprise
one gets when they all run. So I think the goal of these runner tests would be 
some sort of official HTTP API suite that's part of make check. Would you 
agree?
If so I'm happy to take this on.

   Also I've found eunit to be helpful in BigCouch and wondering how hard it 
would be to support eunit in couchdb. Having tests in the modules is very good 
for not
only testing but also to help with reading and understanding what the code does.

Bob   


[1] https://github.com/bdionne/couchdb/tree/cli-tests



Re: Moving CouchDB to Git

2011-08-01 Thread Robert Dionne
+1 




On Jul 31, 2011, at 12:29 PM, Paul Davis wrote:

 Dearest Devs,
 
 A few months ago I did some work in preparing a solution to using Git
 as a primary VCS at the ASF. Now that we have released 1.1.0 and 1.0.3
 there's a bit of a lull in large events dealing with the code base. As
 such I thought now would be a good time to propose the idea of moving
 CouchDB to Git.
 
 A few things on what this would mean for the community:
 
 1. The SVN repository would no longer be the primary source for
 CouchDB source code. It'll still exist for house keeping things like
 the website and other bits.
 
 2. For the time being there is no fancy integration with anything like
 Gerrit. The initial phase of moving to Git will be to just test the
 infrastructure aspects of the system to make sure its all configured
 correctly and works reliably. This also applies to GitHub. There's no
 magical Pull request turns into JIRA ticket or similar. GitHub will
 remain as it is a currently, a read-only mirror in the GitHub
 ecosystem.
 
 3. There are a couple minor restrictions on our Git usage as required
 by ASF policy. First, rewriting Git commits on master is prohibited. I
 also added a feature that allows us to make branches that can't be
 rewritten either in the interest of protecting release branches.
 Currently, this is just a regular expression that matches
 (master)|(rel/*) in the branch name. The second issue is that
 there's always a possibility we have to revert to SVN if things break.
 In this interest I've disabled inserting merge commits into those same
 branches.
 
 4. Before making the complete switch I'll end up making a handful of
 Git clones to check that our history is preserved. I plan on writing a
 script to make Graphviz images of the branch history and so on, but
 having people volunteer to look back at the history to spot errors
 would be helpful as well.
 
 5. There are probably other things, but this is mostly to just kick
 off serious discussion on making the switch.
 
 Thoughts?
 
 Paul



Re: [VOTE] Apache CouchDB 1.0.3 Release

2011-06-28 Thread Robert Dionne
I don't think they are officially part of the test procedure, if so then 
we've shipped a lot of releases with them failing. In fact they never run 
completely, once in a blue moon. Perhaps I shouldn't have mentioned it in my +1 
vote. I run them almost every time I do a build from trunk, almost always the 
issues are related to the two different JS execution environments. I've started 
a topic branch to separate out [1] these tests from the browser based Futon 
tests. Ideally these would be part of make check.

SHIP IT!!!


[1] https://github.com/bdionne/couchdb/tree/cli-tests




On Jun 28, 2011, at 12:11 PM, Noah Slater wrote:

 
 On 28 Jun 2011, at 17:09, Sebastian Cohnen wrote:
 
 Perhaps there is some good reason for them failing?
 
 I was not aware that the javascript CLI tests are officially ready to use 
 yet. In that case it's definitely not okay if they are failing and since 
 others have seen the same failing tests, I'd change my vote to −1.
 
 If they are not ready to use yet, that is a good reason for them failing. If 
 they should be part of the test procedure, then they should a) pass and b) be 
 added to the wiki. Sorry for being so ARGH about it, but I like to run a 
 tight ship, so we should get clarification on this either way. Thanks!
 



Re: [VOTE] Apache CouchDB 1.0.3 Release

2011-06-25 Thread Robert Dionne
OS X
make check passes
Futon tests pass in FF
JS tests in CLI pass *except* the numbers 3, 26, and 45

+1


On Jun 24, 2011, at 7:54 PM, Paul Davis wrote:

 This is the release vote for Apache CouchDB 1.0.3
 
 Changes in this release:
 
 * Fixed compatibility issues with Erlang R14B02.
 * Fix bug that allows invalid UTF-8 after valid escapes.
 * The query parameter `include_docs` now honors the parameter `conflicts`.
   This applies to queries against map views, _all_docs and _changes.
 * Added support for inclusive_end with reduce views.
 * More performant queries against _changes and _all_docs when using the
  `include_docs` parameter.
 * Enabled replication over IPv6.
 * Fixed for crashes in continuous and filtered changes feeds.
 * Fixed error when restarting replications in OTP R14B02.
 * Upgrade ibrowse to version 2.2.0.
 * Fixed bug when using a filter and a limit of 1.
 * Fixed OAuth signature computation in OTP R14B02.
 * Handle passwords with : in them.
 * Made compatible with jQuery 1.5.x.
 * Etap tests no longer require use of port 5984. They now use a randomly
   selected port so they won't clash with a running CouchDB.
 * Windows builds now require ICU = 4.4.0 and Erlang = R14B03. See
   COUCHDB-1152, and COUCHDB-963 + OTP-9139 for more information.
 
 We encourage the whole community to download and test these release
 artifacts so that any critical issues can be resolved before the release
 is made. Everyone is free to vote on this release. Please report your
 results and vote to this thread.
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~davisp/dist/1.0.3-rc1/
 
 Instructions for validating the release tarball can be found here:
 
  http://people.apache.org/~davisp/dist/
 
 Instructions for testing the build artefacts can be found here:
 
  http://wiki.apache.org/couchdb/Test_procedure
 
 These artifacts have been built from the 1.0.3 tag in Subversion:
 
 http://svn.apache.org/repos/asf/couchdb/tags/1.0.3/
 
 At some point this weekend you will be bored with nothing to do for
 ten to fifteen minutes, this is when you should vote.



Re: New write performance optimizations coming

2011-06-24 Thread Robert Dionne
This is interesting work, I notice some substantial changes to couch_btree, a 
new query_modify_raw, etc.. 

I'm wondering though if we'd be better off to base these changes on the re 
factored version of couch_btree that davisp has[1]. I haven't looked at it too 
closely or tested with it but if I recall the goal was first to achieve
a more readable version with identical semantics so that we could then move 
forward with improvements.


[1] 
https://github.com/davisp/couchdb/commit/37c1c9b4b90f6c0f3c22b75dfb2ae55c8b708ab1




On Jun 24, 2011, at 6:06 AM, Filipe David Manana wrote:

 Thanks Adam.
 
 Don't get too scared :) Ignore the commit history and just look at
 github's Files changed tab, the modification summary is:
 
 Showing 19 changed files with 730 additions and 402 deletions.
 
 More than half of those commits were merges with trunk, many snappy
 refactorings (before it was added to trunk) and other experiments that
 were reverted after.
 We'll try to break this into 2 or 3 patches.
 
 So the single patch is something relatively small:
 https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test.diff
 
 On Fri, Jun 24, 2011 at 4:05 AM, Adam Kocoloski kocol...@apache.org wrote:
 Hi Damien, I'd like to see these 220 commits rebased into a set of logical 
 patches against trunk.  It'll make the review easier and will help future 
 devs track down any bugs that are introduced.  Best,
 
 Adam
 
 On Jun 23, 2011, at 6:49 PM, Damien Katz wrote:
 
 Hi everyone,
 
 As it’s known by many of you, Filipe and I have been working on improving 
 performance, specially write performance [1]. This work has been public in 
 the Couchbase github account since the beginning, and the non Couchbase 
 specific changes are now isolated in [2] and [3].
 In [3] there’s an Erlang module that is used to test the performance when 
 writing and updating batches of documents with concurrency, which was used, 
 amongst other tools, to measure the performance gains. This module bypasses 
 the network stack and the JSON parsing, so that basically it allows us to 
 see more easily how significant the changes in couch_file, couch_db and 
 couch_db_updater are.
 
 The main and most important change is asynchronous writes. The file module 
 no longer blocks callers until the write calls complete. Instead they 
 immediately reply to the caller with the position in the file where the 
 data is going to be written to. The data is then sent to a dedicated loop 
 process that is continuously writing the data it receives, from the 
 couch_file gen_server, to disk (and batching when possible). This allows 
 callers (such as the db updater for.e.g.) to issue write calls and keep 
 doing other work (preparing documents, etc) while the writes are being done 
 in parallel. After issuing all the writes, callers simply call the new 
 ‘flush’ function in the couch_file gen_server, which will block the caller 
 until everything was effectively written to disk - normally this flush call 
 ends up not blocking the caller or it blocks it for a very small period.
 
 There are other changes such as avoiding 2 btree lookups per document ID 
 (COUCHDB-1084 [4]), faster sorting in the updater (O(n log n) vs O(n^2)) 
 and avoid sorting already sorted lists in the updater.
 
 Checking if attachments are compressible was also moved into a new 
 module/process. We verified this took much CPU time when all or most of the 
 documents to write/update have attachments - building the regexps and 
 matching against them for every single attachment is surprisingly expensive.
 
 There’s also a new couch_db:update_doc/s flag named ‘optimistic’ which 
 basically changes the behaviour to write the document bodies before 
 entering the updater and skip some attachment related checks (duplicated 
 names for e.g.). This flag is not yet exposed to the HTTP api, but it could 
 be via an X-Optimistic-Write header in the doc PUT/POST requests and 
 _bulk_docs for e.g. We’ve seen this as good when the client knows that the 
 documents to write don’t exist yet in the database and we aren’t already IO 
 bound, such as when SSDs are used.
 
 We used relaximation, Filipe’s basho bench based tests [5] and the Erlang 
 test module mentioned before [6, 7], exposed via the HTTP . Here follow 
 some benchmark results.
 
 
 # Using the Erlang test module (test output)
 
 ## 1Kb documents, 10 concurrent writers, batches of 500 docs
 
 trunk before snappy was added:
 
 {db:load_test,total:10,batch:500,concurrency:10,rounds:10,delayed_commits:false,optimistic:false,total_time_ms:270071}
 
 trunk:
 
 {db:load_test,total:10,batch:500,concurrency:10,rounds:10,delayed_commits:false,optimistic:false,total_time_ms:157328}
 
 trunk + async writes (and snappy):
 
 {db:load_test,total:10,batch:500,concurrency:10,rounds:10,delayed_commits:false,optimistic:false,total_time_ms:121518}
 
 ## 2.5Kb documents, 10 concurrent writers, batches of 500 docs
 
 trunk before snappy was 

Re: make couchdb more otpish

2011-06-21 Thread Robert Dionne
Thanks Paul, was just going to respond about /rel

My two cents:

I think what would be nice is to enable the use of rebar in downstream 
projects, that are built on top of couchdb. I've been able to keep my 
bitstore[1] hack pretty much in sync with a given 
couchdb version with some simple tweaks in the Makefile.

What would be ideal is to add couchdb as a dependency in rebar.config and type 
./config and have it pull couchdb 1.0 or 1.5.6 or whatever. 

If this can be done so that couchdb is still usable and build-able as a 
standalone tool, using the  existing autotools, without turning it into a 
hairball, then that would be real sweet, kudos
to the brave soul that pulls that off.

In theory bigcouch could also work that way, though bigcouch is technically a 
fork of couchdb as it requires a few tweaks to make couchdb a distributed 
database.



[1] https://github.com/bdionne/bitstore




On Jun 21, 2011, at 10:36 AM, Paul Davis wrote:

 On Tue, Jun 21, 2011 at 10:30 AM, Noah Slater nsla...@apache.org wrote:
 
 On 21 Jun 2011, at 15:25, Paul Davis wrote:
 
 The problem with 'doing it once' is that its not entirely that
 straight forward unless you want to have a single absolutely massive
 commit. And that's something I wanted to avoid.
 
 Can we break this work down into logical chunks?
 
 We could transition the source over in that way.
 
 I'm not at all certain what you mean by this. There should not be a
 c_src directory at the top level of the source tree. Nor libs or priv
 or include. As we went over previously Noah had some pretty general
 constraints for what the source tree should look like.
 
 Glad you are on board with this, Paul.
 
 I'm not against a rel folder somewhere but I doubt it'd go at the top
 level of the source tree. Maybe in share?
 
 What does this directory even do pls??!?!??!?!11
 
 
 
 Its used to make releases (Erlang world, not to be confused with our
 releases) that can be distributed to users. This is also required for
 the machinery that does hot code swapping and other things.
 
 Here's an example one from BigCouch:
 
 https://github.com/cloudant/bigcouch/tree/master/rel



Re: make couchdb more otpish

2011-06-21 Thread Robert Dionne


On Jun 21, 2011, at 11:24 AM, Paul Davis wrote:

 On Tue, Jun 21, 2011 at 10:54 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Thanks Paul, was just going to respond about /rel
 
 My two cents:
 
 I think what would be nice is to enable the use of rebar in downstream 
 projects, that are built on top of couchdb. I've been able to keep my 
 bitstore[1] hack pretty much in sync with a given
 couchdb version with some simple tweaks in the Makefile.
 
 What would be ideal is to add couchdb as a dependency in rebar.config and 
 type ./config and have it pull couchdb 1.0 or 1.5.6 or whatever.
 
 
 You mean ./configure in the downstream project, yeah?

yes, basically ./rebar get-deps


 
 If this can be done so that couchdb is still usable and build-able as a 
 standalone tool, using the  existing autotools, without turning it into a 
 hairball, then that would be real sweet, kudos
 to the brave soul that pulls that off.
 
 In theory bigcouch could also work that way, though bigcouch is technically 
 a fork of couchdb as it requires a few tweaks to make couchdb a distributed 
 database.
 
 
 I'm confused. Are you advocating a full on switch to rebar?

No, sorry to be vague here, not at all. For standalone couchdb the auto-tools 
does a bit more in terms of dealing with platforms, though if I recall it was a 
royal pain to get all the dependencies down. I'm merely advocating to make life 
easier for downstream users. What I meant by referring to bigcouch is that it 
does a bit more than just embedding couchdb (a few places make clustered calls 
to fabric .eg.)

If couchdb followed the ebin/src/include conventions that would probably be 
half the battle





 
 
 
 [1] https://github.com/bdionne/bitstore
 
 
 
 
 On Jun 21, 2011, at 10:36 AM, Paul Davis wrote:
 
 On Tue, Jun 21, 2011 at 10:30 AM, Noah Slater nsla...@apache.org wrote:
 
 On 21 Jun 2011, at 15:25, Paul Davis wrote:
 
 The problem with 'doing it once' is that its not entirely that
 straight forward unless you want to have a single absolutely massive
 commit. And that's something I wanted to avoid.
 
 Can we break this work down into logical chunks?
 
 We could transition the source over in that way.
 
 I'm not at all certain what you mean by this. There should not be a
 c_src directory at the top level of the source tree. Nor libs or priv
 or include. As we went over previously Noah had some pretty general
 constraints for what the source tree should look like.
 
 Glad you are on board with this, Paul.
 
 I'm not against a rel folder somewhere but I doubt it'd go at the top
 level of the source tree. Maybe in share?
 
 What does this directory even do pls??!?!??!?!11
 
 
 
 Its used to make releases (Erlang world, not to be confused with our
 releases) that can be distributed to users. This is also required for
 the machinery that does hot code swapping and other things.
 
 Here's an example one from BigCouch:
 
 https://github.com/cloudant/bigcouch/tree/master/rel
 
 



Re: [Couchdb Wiki] Trivial Update of CouchDB_in_the_wild by wentforgold

2011-06-14 Thread Robert Dionne
anyone thought of using edoc?





On Jun 14, 2011, at 1:03 PM, Benoit Chesneau wrote:

 On Tue, Jun 14, 2011 at 9:51 AM, Robert Newson rnew...@apache.org wrote:
 +1 for docs in the same place as the code. One of the main reasons is
 that a single commit adds the feature, the tests that confirm the
 feature works, and the doc changes that let folks know about it. It's
 just sane.
 
 And -1 on keeping them floating around externally and sucking them in
 somehow at release time.
 
 B.
 
 The same.
 
 - benoît



Re: [Couchdb Wiki] Trivial Update of CouchDB_in_the_wild by wentforgold

2011-06-13 Thread Robert Dionne




On Jun 13, 2011, at 9:13 AM, Noah Slater wrote:

 
 On 13 Jun 2011, at 13:55, Paul Davis wrote:
 
 On Mon, Jun 13, 2011 at 8:49 AM, Noah Slater nsla...@apache.org wrote:
 What percentage of useful wiki edits were made by committers vs 
 non-committers?
 
 
 http://en.wikipedia.org/wiki/Toilet_paper_orientation
 
 Not sure how this is relevant.
 
 If we decommission the wiki we are putting up barriers to contribution from 
 non-committers. So it is arguably worth while understanding exactly what that 
 means for us. How many people have contributed in the past who could not have 
 contributed if this had been the case?


I think the WIKI is useful. The APi docs are pretty good, everything is there, 
though it takes a while to get used to how to find it all. 

 



Re: [Couchdb Wiki] Trivial Update of CouchDB_in_the_wild by wentforgold

2011-06-13 Thread Robert Dionne
++1





On Jun 13, 2011, at 2:05 PM, Robert Newson wrote:

 It's not the wiki per se that bothers me, it's that it's the primary,
 often only, source of documentation.
 
 I propose that future releases of CouchDB include at least a full
 description of all public API's. Improvements above that base level
 would be a manual and/or simple tutorials.
 
 This documentation would be maintained in the same source tree as the
 code and it would be a release requirement for this documentation to
 be updated to include all new features.
 
 This documentation is then the primary source, the wiki can serve as a
 supplement.
 
 b.
 
 On 13 June 2011 18:16, Peter Nolan peterwno...@gmail.com wrote:
 Any documentation is good.
 
 What is this 'spam'?  Haven't personally encountered anything on the wiki
 that would be 'considered' spam (perhaps not stumbled upon that portion?)
 
 But it's inevitable that the wiki will be attacked by unscrupulous people
 and as such, the wiki should prepare for this.  The wiki is going to need
 gatekeepers/admins to maintain it.
 
 It would be nice, that any edits be archived so users can see previous
 states of the page if they so choose so.
 
 
 If a noted jerk keeps editing the wiki, we should have a system that only
 applies his edits to his account.  The common user would not see his edits,
 only he would, which would hopefully convince him that his edit has gone
 through.
 
 +1 top hats.
 



Re: [Couchdb Wiki] Trivial Update of CouchDB_in_the_wild by wentforgold

2011-06-13 Thread Robert Dionne
++1++

On Jun 13, 2011, at 2:08 PM, Paul Davis wrote:

 On Mon, Jun 13, 2011 at 2:05 PM, Robert Newson robert.new...@gmail.com 
 wrote:
 It's not the wiki per se that bothers me, it's that it's the primary,
 often only, source of documentation.
 
 I propose that future releases of CouchDB include at least a full
 description of all public API's. Improvements above that base level
 would be a manual and/or simple tutorials.
 
 This documentation would be maintained in the same source tree as the
 code and it would be a release requirement for this documentation to
 be updated to include all new features.
 
 
 You had me until you said release requirement. I would upgrade that
 to commit requirement if we're seriously about having such
 documentation. If we don't force people to make sure docs reflect
 changes at commit time then its probably going to be a lost cause.
 
 This documentation is then the primary source, the wiki can serve as a
 supplement.
 
 b.
 
 On 13 June 2011 18:16, Peter Nolan peterwno...@gmail.com wrote:
 Any documentation is good.
 
 What is this 'spam'?  Haven't personally encountered anything on the wiki
 that would be 'considered' spam (perhaps not stumbled upon that portion?)
 
 But it's inevitable that the wiki will be attacked by unscrupulous people
 and as such, the wiki should prepare for this.  The wiki is going to need
 gatekeepers/admins to maintain it.
 
 It would be nice, that any edits be archived so users can see previous
 states of the page if they so choose so.
 
 
 If a noted jerk keeps editing the wiki, we should have a system that only
 applies his edits to his account.  The common user would not see his edits,
 only he would, which would hopefully convince him that his edit has gone
 through.
 
 +1 top hats.
 
 



Re: svn commit: r1133319 - /couchdb/trunk/src/ejson/Makefile.am

2011-06-08 Thread Robert Dionne
well it breaks my build :)





On Jun 8, 2011, at 10:15 AM, Paul Davis wrote:

 On Wed, Jun 8, 2011 at 5:55 AM,  rand...@apache.org wrote:
 Author: randall
 Date: Wed Jun  8 09:55:00 2011
 New Revision: 1133319
 
 URL: http://svn.apache.org/viewvc?rev=1133319view=rev
 Log:
 include $(ERLANG_FLAGS) when building ejson nif
 
 Modified:
couchdb/trunk/src/ejson/Makefile.am
 
 Modified: couchdb/trunk/src/ejson/Makefile.am
 URL: 
 http://svn.apache.org/viewvc/couchdb/trunk/src/ejson/Makefile.am?rev=1133319r1=1133318r2=1133319view=diff
 ==
 --- couchdb/trunk/src/ejson/Makefile.am (original)
 +++ couchdb/trunk/src/ejson/Makefile.am Wed Jun  8 09:55:00 2011
 @@ -65,6 +65,7 @@ if USE_OTP_NIFS
  ejsonpriv_LTLIBRARIES = ejson.la
 
  ejson_la_SOURCES = $(EJSON_C_SRCS)
 +ejson_la_CFLAGS = $(ERLANG_FLAGS)
  ejson_la_LDFLAGS = -module -avoid-version
 
  if WINDOWS
 
 
 
 
 Is this right?



Re: curl dependency

2011-06-05 Thread Robert Dionne
I like the idea of having two sets of JS tests, those run as part of make 
check, and a smaller suite invocable from browsers that validates the install. 
Towards that end, I created a branch[1], copied all the tests to 
test/javascript/test and tweaked the relevant bits in run.tpl. Most are 
unchanged, and there are only two tests that fail now, attachment_names, which 
looks like it might be a real bug, and attachments_multipart which requires the 
use of the XMLHttpRequest object. This one might possibly be fixed with some 
enhancements to couch_http.js or couchjs, will look more into it.

Next I'll start auditing them all to see if there's other browser specific code 
that can be removed. We can then do the same for the browser tests in 
www/script/test or just replace all that completely with Jan's new stuff as the 
lion's share of it is moving to test/javascript/test

[1]  
https://github.com/bdionne/couchdb/commit/adcde6c07ae3586ee779b20a24a608337f9b5957



On Jun 4, 2011, at 11:29 PM, Paul Davis wrote:

 Of course! Granted one of the two requires that we maintain a table of
 data that is updated continuously and has many significant global
 variants and the other is a fairly static definition that can be
 traced back to a (relatively) deterministic set of bit patterns.
 
 On Sat, Jun 4, 2011 at 9:42 PM, Jan Lehnardt j...@apache.org wrote:
 
 On 5 Jun 2011, at 03:19, Paul Davis wrote:
 
 Or we write our own code to make HTTP requests (Which isn't out of the
 question) and remove the curl dependency altogether.
 
 Can we also write our own unicode collation code? :)
 
 Cheers
 Jan
 --
 
 
 
 On Sat, Jun 4, 2011 at 9:10 PM, Jan Lehnardt j...@apache.org wrote:
 
 On 4 Jun 2011, at 22:10, Paul Davis wrote:
 
 Its already a soft dependency. If curl isn't found all that happens is
 that you can't run ./test/javascript/run after you build it.
 
 Which means curl is a hard dependency if you want to run the JS tests
 that we intend to move from the browser into CLI JS tests, right?
 
 Cheers
 Jan
 --
 
 
 
 On Sat, Jun 4, 2011 at 4:02 PM, Chris Anderson jch...@apache.org wrote:
 This sparks another thought that we could have a ./configure directive
 that says --no-curl or something. Since we only need curl for purposes
 of running the developer test suite.
 
 Chris
 
 On Wed, Jun 1, 2011 at 2:33 PM, Arash Bizhan zadeh aras...@gmail.com 
 wrote:
 Hi,
 I am trying to compile couch on RH enterprise. The official RH package 
 for
 curl is 7.15, but Couch needs 7.18. I would like to know what is the 
 best
 way to handle this? Is  there any specific reason that Couch depends on
 7.18? Can the dependency be downgraded to 7.15?
 Can anybody advise me on how to handle this specific dependency and 
 other
 dependencies ( i.e Erlang) please?
 
 thanks,
 -arash
 
 
 
 
 --
 Chris Anderson
 http://jchrisa.net
 http://couchbase.com
 
 
 
 
 



Re: [VOTE] Apache CouchDB 1.1.0 release, round 3.

2011-06-03 Thread Robert Dionne
Chris,

  This is an excellent idea. Currently the entire suite of browser tests are 
also run from the command line, where we overload the CouchDB definition and 
use couchjs. We could break the suite out with a subset staying in 
share/www/script/test  to be used as you suggest, and the lion's share of them 
moving to test/javascript  to be run as part of make check. They serve a 
great role in testing end-end but go a little too far in making use of the 
browser. 

   I'll take a closer look at this over the weekend.

Best,

Bob





On Jun 2, 2011, at 11:35 PM, Chris Anderson wrote:

 I agree, the browser tests should move to the command line, and a
 small subset (30 seconds tops) of tests should be in the browser
 (useful for debugging proxy config, installation, spidermonkey
 version, or whatever. I'd rather not block 1.1 on rewriting the test
 suite, even though I agree the browser suite has started to outgrow
 itself.
 
 I am happy to report that all tests pass on my machine (basically
 stock macbook air).
 
 +1 from me.
 
 Thanks to everyone who helped with 1.1.
 
 Chris
 
 On Wed, Jun 1, 2011 at 12:45 PM, Noah Slater nsla...@apache.org wrote:
 
 On 1 Jun 2011, at 20:41, Paul Davis wrote:
 
 On Wed, Jun 1, 2011 at 3:37 PM, Noah Slater nsla...@apache.org wrote:
 Considering that the tests work with Chrome, I'm going to change my vote 
 to +1 now.
 
 I am also suggesting that we change our recommended test browser to Chrome.
 
 Firefox 4 seems to have a lot of trouble with it.
 
 Also, our documented test browser is FF3.5. I wonder if the update to
 4 is also most of the issue.
 
 Could be. :)
 
 DOWN WITH ALL NON-DETERMINISTIC BROWSER TESTS, I SAY!
 
 (I, for one, welcome our new etap overlords.)
 
 
 
 
 
 -- 
 Chris Anderson
 http://jchrisa.net
 http://couchbase.com



Re: [VOTE] Apache CouchDB 1.1.0 release, round 3.

2011-06-01 Thread Robert Dionne
Noah,

  Does make check run with all the etaps passing? 

Bob



On Jun 1, 2011, at 2:13 PM, Noah Slater wrote:

 Reinstalled Erlang and the weird SSL problems went away.
 
 Unit tests fail for me.
 
 -
 
 replicator_db
 error
 3001ms
 Run with debugger
   • Exception raised: {}
 
 rev_stemming
 error
 7ms
 Run with debugger
   • Exception raised: {}
 
 rewrite
 error
 10ms
 Run with debugger
   • Exception raised: {}
 
 security_validation
 error
 6ms
 Run with debugger
   • Exception raised: {}
 
 show_documents
 error
 11ms
 Run with debugger
   • Exception raised: {}
 
 stats
 error
 7ms
 Run with debugger
   • Exception raised: {}
 
 update_documents
 error
 5ms
 Run with debugger
   • Exception raised: {}
 
 users_db
 error
 4ms
 Run with debugger
   • Exception raised: {}
 
 -
 
 Tried to re-run these tests in with debug on, but Firefox locked up.
 
 The tests lock up on me all the time. And I don't mean, while they're running 
 through, as expected. When they fail, or I try to restart one, basically, the 
 whole thing just freezes. I never seem to be able to do much more, other than 
 restart Firefox. I'm not sure why this happens, but I really wish it 
 wouldn't. Makes testing harder than it could be.
 
 -
 
 After restarting Firefox all these tests pass.
 
 Very weird.
 
 Going to run the whole test suite again.
 
 —
 
 auth_cache fails
 
 assertion misses_after == misses_before +1 failed
 
 assertion hits_after == hits_before failed
 
 cookie_auth fails
 
 error: file_exists
 
 The test suite then hung on replicator_db again.
 
 Killing Firefox and trying again.
 
 —
 
 auth_cache fails
 
 assertion misses_after == misses_before +1 failed
 
 assertion hits_after == hits_before failed
 
 The test suite then hung on replicator_db again.
 
 Killing Firefox and trying again, manually.
 
 Note I was not able to get debug output because the test suite hangs Firefox.
 
 I don't see anything happening in the CouchDB log either.
 
 So my best guess is that Firefox is just looping over something mindlessly.
 
 —
 
 Running auth_cache manually lots of times produces no errors.
 
 Running through all the tests manually from the start.
 
 cookie_auth fails again for the same reason.
 
 Ran with debug:
 
 Exception raised: {message:ddoc is 
 null,fileName:http://127.0.0.1:5985/_utils/script/test/cookie_auth.js,lineNumber:41,stack:;()@http://127.0.0.1:5985/_utils/script/test/cookie_auth.js:41\u000arun_on_modified_server([object
  Array],(function () {try {var usersDb = new CouchDB(\test_suite_users\, 
 {'X-Couch-Full-Commit': \false\});usersDb.deleteDb();usersDb.createDb();var 
 ddoc = usersDb.open(\_design/_auth\);T(ddoc.validate_doc_update);var 
 password = \3.141592653589\;var jasonUserDoc = 
 CouchDB.prepareUserDoc({name: \Jason Davies\, roles: [\dev\]}, 
 password);T(usersDb.save(jasonUserDoc).ok);var checkDoc = 
 usersDb.open(jasonUserDoc._id);T(checkDoc.name == \Jason Davies\);var 
 jchrisUserDoc = CouchDB.prepareUserDoc({name: \jch...@apache.org\}, 
 \funnybone\);T(usersDb.save(jchrisUserDoc).ok);var duplicateJchrisDoc = 
 CouchDB.prepareUserDoc({name: \jch...@apache.org\}, \eh, Boo-Boo?\);try 
 {usersDb.save(duplicateJchrisDoc);T(false  \Can't create duplicate user 
 names. Should have thrown an error.\);} catch (e) {T(e.error == 
 \conflict\);T(usersDb.last_req.status == 409);}var underscoreUserDoc = 
 CouchDB.prepareUserDoc({name: \_why\}, \copperfield\);try 
 {usersDb.save(underscoreUserDoc);T(false  \Can't create underscore user 
 names. Should have thrown an error.\);} catch (e) {T(e.error == 
 \forbidden\);T(usersDb.last_req.status == 403);}var badIdDoc = 
 CouchDB.prepareUserDoc({name: \foo\},  \bar\);badIdDoc._id = 
 \org.apache.couchdb:w00x\;try {usersDb.save(badIdDoc);T(false  \Can't 
 create malformed docids. Should have thrown an error.\);} catch (e) 
 {T(e.error == \forbidden\);T(usersDb.last_req.status == 
 403);}T(CouchDB.login(\Jason Davies\, 
 password).ok);T(CouchDB.session().userCtx.name == \Jason Davies\);var xhr = 
 CouchDB.request(\POST\, \/_session\, {headers: {'Content-Type': 
 \application/json\}, body: JSON.stringify({name: \Jason Davies\, 
 password: 
 password})});T(JSON.parse(xhr.responseText).ok);T(CouchDB.session().userCtx.name
  == \Jason Davies\);jasonUserDoc.foo = 
 2;T(usersDb.save(jasonUserDoc).ok);T(CouchDB.session().userCtx.roles.indexOf(\_admin\)
  == -1);try {usersDb.deleteDoc(jchrisUserDoc);T(false  \Can't delete other 
 users docs. Should have thrown an error.\);} catch (e) {T(e.error == 
 \forbidden\);T(usersDb.last_req.status == 403);}T(!CouchDB.login(\Jason 
 Davies\, \2.71828\).ok);T(!CouchDB.login(\Robert Allen Zimmerman\, 
 \d00d\).ok);T(CouchDB.session().userCtx.name != \Jason Davies\);xhr = 
 CouchDB.request(\POST\, \/_session?next=/\, {headers: {'Content-Type': 
 \application/x-www-form-urlencoded\}, body: 
 \name=Jason%20Daviespassword=\ + encodeURIComponent(password)});if 
 (xhr.status == 200) 

Re: [VOTE] Apache CouchDB 1.1.0 release, round 3.

2011-05-31 Thread Robert Dionne
+1

OS X 10.6 
Erlang 14B01
All tests pass



On May 30, 2011, at 6:25 PM, Robert Newson wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 1.1.0 release, round 3.
 
 Two further issues have been resolved since round 2;
 
 1) Compatibility with erlang R14B03.
 2) Release tarball now works on Windows (with Cygwin).
 
 We encourage the whole community to download and test these release artifacts 
 so
 that any critical issues can be resolved before the release is made. Everyone 
 is
 free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~rnewson/dist/1.1.0/
 
 These artifacts have been built from the 1.1.0 tag in Subversion:
 
 http://svn.apache.org/repos/asf/couchdb/tags/1.1.0
 
 Please follow our test procedure:
 
 http://wiki.apache.org/couchdb/Test_procedure
 
 Happy voting,
 
 B.



Re: Helping out with releases

2011-05-10 Thread Robert Dionne
Paul,

  I'll try to take a look at 090 and 140 tonight after work, I know I've seen 
140 randomly failing. 

Bob


On May 10, 2011, at 9:21 AM, Paul Davis wrote:

 On Tue, May 10, 2011 at 8:29 AM, Dirkjan Ochtman dirk...@ochtman.nl wrote:
 On Tue, May 10, 2011 at 14:08, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 Like Jan says, it'd be awesome to have more people familiar with the
 release procedure. Although if you're interested in speeding up
 releases the best place to start would be by learning some internals.
 The issues that usually keep things from shipping is that a test is
 randomly failing or there's a bug waiting to be fixed.
 
 Right, but that's not the case now, is it? So I would like to help out
 with all the non-internals and chasing after other committers to fix
 up the bugs, as that seems an area that's currently understaffed.
 
 Which doesn't mean that maybe I won't get into the internals at some
 point, but I think doing the other things could be valuable to, and
 the project needs more of it.
 
 Cheers,
 
 Dirkjan
 
 
 I haven't run through prepping the 1.1.x branch for a release, but
 1.0.3 is being held up because I've sen the 090 and 140 etap tests
 fail and no one (me included) has felt like fixing them yet.



Re: Development environment

2011-04-29 Thread Robert Dionne
Hi Andrey,

  I use Distel[1] (Distributed emacs lisp for Erlang), a set of emacs 
extensions that create a full development environment for Erlang. It connects 
to a running node so one gets full access to the syntax_tools and source code 
of Erlang, all at run time. As this brief white paper points out it goes 
further than typical elisp hacks as it imports the Erlang distributed model 
into emacs. I keep hoping to find some time to push it a little further and 
provide better support for working with BigCouch, our clustered CouchDB at 
Cloudant.

  I keep up my own fork of it as there aren't too many of us out there and I 
needed to fix a few things. I also include in that project some tweaks to the 
Erlang mode that ships with Erlang to accommodate the CouchDB format 
conventions. It provides a full symbolic debugging environment. Though it's 
useful and I think I've found a few CouchDB bugs with it, given the nature of 
OTP programming it's a little harder when you have breakpoints that hit 50 
processes. It was great for stepping thru the btree code.

  The most useful features are the navigation (M-. M-,) and who_calls (C-c C-d 
w) The lack of use of who_calls I believe is the major reason we often discover 
dead code that's been there forever. As an aside the use of type specs and 
dialyzer go a long way towards finding errors at compile time.

Regards,

Bob

[1] https://github.com/bdionne/distel/raw/master/doc/gorrie02distel.pdf
[2] https://github.com/bdionne/distel

On Apr 28, 2011, at 9:03 AM, Andrey Somov wrote:

 Hi all,
 in order to understand how CouchDB works I want be able to run the
 application under debugger. Unfortunately it does not look like an easy
 task.
 The information provided on the wiiki (
 http://wiki.apache.org/couchdb/Running%20CouchDB%20in%20Dev%20Mode) may be
 enough for a
 professional Erlang developer but it is not enough for anyone who learns
 Erlang together with CouchDB.
 I could not find any resource which gives step-by-step instructions on how
 to organise an effective development environment for CouchDB.
 
 Can someone point me to such a guide/tutorial/manual/screencast ?
 
 Thanks,
 Andrey



Re: Full text search - is it coming? If yes, approx when.

2011-03-28 Thread Robert Dionne
Yes this would be a great feature. I've made some modest progress[1], based on 
the examples in the erlang book, on native FTI. I'm very keen on it as my
use case isn't handled by Lucene. I've used both bitcask[2] and couchdb for 
storage of the inverted indices but neither seem well suited for something this 
write heavy
without lots of compaction.

With all due respect to Lucene, I'm not at all interested in a port of it. I 
think a native solution ought to try and move forward and leverage the features 
of couchdb.

I also don't think there is any official position and/or roadmap, other than 
what's in JIRA.

YMMV,

Bob

[1] http://dionne.posterous.com/relaxed-searching-in-couchdb
[2] https://github.com/basho/bitcask




On Mar 28, 2011, at 5:51 AM, Andrew Stuart (SuperCoders) wrote:

 Although it may be a huge amount of work, it would still seem to be a 
 necessary feature for the core developers to build?
 
 It seems a fairly reasonable thing to expect a database could return records 
 that contain java OR C# in a subject line field, without resorting to 
 external software.  It's not ideal to require installation additional 
 software to get basic functionality going.
 
 My primary question is  - what is the official position on full text search 
 in CouchDB - is it coming, sooner, later or never? Is there an up to date 
 development roadmap for couchdb?  Maybe its on there.
 
 as
 
 On 28/03/2011, at 8:24 PM, Robert Newson wrote:
 
 I have to dispute There does not seem to be much understanding that
 this could be a killer feature.
 
 Obviously full-text search is a killer feature, but it's trivially
 available now via couchdb-lucene or elasticsearch.
 
 What people are asking for is native full-text search which, to me, is
 essentially asking for an Erlang port of Lucene. We'd love this, but
 it's a huge amount of work. Continually asking others to do
 significant amounts of work is also wearying.
 
 To replace a Lucene-based solution and match its quality and breadth
 is a huge chunk of work and is only necessary to satisfy people who,
 for various reasons, don't want to use Java.
 
 To answer the original post, there are no publicly known plans to
 build a native full-text indexing feature for CouchDB.
 
 B.
 
 On 28 March 2011 10:15, Olafur Arason olaf...@olafura.com wrote:
 There does not seem to be much understanding that this could be a killer
 feature. People are now relying on Lucene which monitors the _changes
 feed.
 
 Cloudant has done it's own implementation which I gather through the
 information they have published makes a view out of all your word,
 they recommend java view because you can then reuse the lexer from
 Lucene. Then I think they are reusing the reader of the view to make
 their query. They have a similar syntax as Lucene for the query interface.
 They are still working on this and I think they don't have that much
 incentive to opensource it right away. But they have in past both
 opensourced there technology like BigCouch so I think it's more a
 matter of when rather then if.
 
 I think this is a good solution for a fulltext search. But I don't think that
 the java view does not have direct access to the data so it could be
 slow. But cloudant does clustering on view generation so that helps.
 
 But there is also general problem with the current view system where
 search technology could be used.
 
 The view are really good at sorting but people are using them to
 do key matches which they are not designed for. They beginkey and
 endkey are for sorting ranges and are not good for matching which
 most resources online are pointing to.
 
 For example when you do:
 beginkey = [key11, key21]
 endkey = [key19, key21]
 
 You get [key11,key22], [key11, key23] ... [key12,key21],
 [key12,key22]...
 which makes sense when looking up sorting ranges but not using it to
 match keys. But you can have a range match lookup but only on the
 last key and never on two keys. So this would work:
 
 beginkey = [key21, key11]
 endkey = [key21, key19]
 
 The current view interface could be augmented to accept queries
 and could make them much more powerful then they currently are
 and just using the keys for sorting and selecting which values you
 want shown which they are designed to do and do really well.
 
 This would be a killer feature and could use the new infrastructure
 from Cloudant search.
 
 And don't tell me the Elastic or Lucene interface could do anything
 close to this :)
 
 Regards,
 Olafur Arason
 
 On Mon, Mar 28, 2011 at 04:31, Andrew Stuart (SuperCoders)
 andrew.stu...@supercoders.com.au wrote:
 It would be good to know if full text search is coming as a core feature and
 if yes, approximately when - does anyone know?
 
 Even an approximate timeframe would be good.
 
 thanks
 
 
 -- 
 Message  protected by MailGuard: e-mail anti-virus, anti-spam and content 
 filtering.http://www.mailguard.com.au/mg
 Click here to report this message as spam:
 

Re: [jira] Commented: (COUCHDB-1092) Storing documents bodies as raw JSON binaries instead of serialized JSON terms

2011-03-18 Thread Robert Dionne




On Mar 18, 2011, at 2:08 PM, Randall Leeds (JIRA) wrote:

 
[ 
 https://issues.apache.org/jira/browse/COUCHDB-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008555#comment-13008555
  ] 
 
 Randall Leeds commented on COUCHDB-1092:
 
 
 I love a good bike shed more than most, but I've stayed pretty quiet since my 
 first comment because I wanted to think hard about what Paul was saying.
 In the end, I agree with the last comment. I would be happy to trust the md5 
 and not validate on the way out _only_ so long as we close the API for 
 manipulating docs and validate on the way in. Paul, if I understand 
 correctly, this sort of change should make you rest easy.

I've also been watching this thread with no comment, but would +1 your proposal 
if I understand it correctly. I think the main concern is summarized in Paul's 
last post (Paul tell me to shut up if I'm wrong):

The concern I want to see addressed is avoiding the requirement that we rely 
on JSON data being specifically formatted while exposing that value as editable 
to client code.  -- davisp

Essentially the code isn't architected properly to support this change without 
adding the risk of data corruption, and any amount of that is bad. Your 
proposal Randall is to go forward with it subject to the constraint that more 
refactoring is done to clean up the APIs before it's published. If so then I'd 
say go for it. More frequent releases and more progress would be valuable. I've 
seen a lot of forks and good ideas on github and would love to see more of it 
on trunk, .eg. Paul's btree cleanup.




 
 The internal API change would mean more code refactoring, but we shouldn't be 
 afraid of that.
 The agile way forward, if people agree that this solution is prudent, would 
 be to commit to trunk and open a blocking ticket to close down the document 
 body API before release.
 
 Trunk is trunk, lets iterate on it. We haven't even shipped 1.1 yet! We could 
 even branch a feature frozen trunk for 1.2 and drop this on trunk targeted 
 for 1.3.
 I'd love to see the 1.2 cycle stay short and in general to have more frequent 
 releases. It's something I feel we talk about a lot but then we sit around 
 and comment on tickets like this without taking the dive and committing. I 
 don't mean that to sound like a rant. 3.
 
 Storing documents bodies as raw JSON binaries instead of serialized JSON 
 terms
 --
 
Key: COUCHDB-1092
URL: https://issues.apache.org/jira/browse/COUCHDB-1092
Project: CouchDB
 Issue Type: Improvement
 Components: Database Core
   Reporter: Filipe Manana
   Assignee: Filipe Manana
 
 Currently we store documents as Erlang serialized (via the term_to_binary/1 
 BIF) EJSON.
 The proposed patch changes the database file format so that instead of 
 storing serialized
 EJSON document bodies, it stores raw JSON binaries.
 The github branch is at:  
 https://github.com/fdmanana/couchdb/tree/raw_json_docs
 Advantages:
 * what we write to disk is much smaller - a raw JSON binary can easily get 
 up to 50% smaller
  (at least according to the tests I did)
 * when serving documents to a client we no longer need to JSON encode the 
 document body
  read from the disk - this applies to individual document requests, view 
 queries with
  ?include_docs=true, pull and push replications, and possibly other use 
 cases.
  We just grab its body and prepend the _id, _rev and all the necessary 
 metadata fields
  (this is via simple Erlang binary operations)
 * we avoid the EJSON term copying between request handlers and the db 
 updater processes,
  between the work queues and the view updater process, between replicator 
 processes, etc
 * before sending a document to the JavaScript view server, we no longer need 
 to convert it
  from EJSON to JSON
 The changes done to the document write workflow are minimalist - after JSON 
 decoding the
 document's JSON into EJSON and removing the metadata top level fields (_id, 
 _rev, etc), it
 JSON encodes the resulting EJSON body into a binary - this consumes CPU of 
 course but it
 brings 2 advantages:
 1) we avoid the EJSON copy between the request process and the database 
 updater process -
   for any realistic document size (4kb or more) this can be very expensive, 
 specially
   when there are many nested structures (lists inside objects inside lists, 
 etc)
 2) before writing anything to the file, we do a term_to_binary([Len, Md5, 
 TheThingToWrite])
   and then write the result to the file. A term_to_binary call with a binary 
 as the input
   is very fast compared to a term_to_binary call with EJSON as input (or 
 some other nested
   structure)
 I think both compensate the JSON encoding after the separation of meta data 
 fields and non-meta data fields.
 The 

Re: btree refactoring

2011-03-08 Thread Robert Dionne
+1 

I'd definitely have a hard look at it. 

I'm wondering if it makes sense to first revisit davisp's refactoring. Not the 
second one but the first one he recently did which was just a clean up and 
simplification of the code. It may have broken something but if I recall it was 
more readable than the original. You might find it a better starting point.


Regards,

Bob


On Mar 8, 2011, at 2:28 PM, Randall Leeds wrote:

 When I start hacking on COUCHDB-1076 a couple nights ago I found that the
 cleanest way I could see for allowing kp_nodes to be skipped during
 traversal was for the fold function to couch_btree:foldl to be arity 4, with
 kp_node | kv_node as the first argument.
 
 Evaluating that function for every node (instead of just kv_nodes) lets us
 return {skip, Acc} to skip whole subtrees.
 Diving back into the couch_btree code to read over Damien's patch for
 COUCHDB-1084, it hit me that ModifyFun could instead be a function that was
 called for all the actions. We wouldn't need to return the query results
 because this action function could send them back to the client as they're
 found. Then we just keep the actions list as part of the accumulator and
 query_modify collapses into a btree:foldl and we no longer need so many
 similar-looking recursive functions. Off the cuff I envision modify_node
 being exported and simplified to be non-recursive and query_modify being a
 helper function to generate a fold function that uses modify_node (or
 something like this).
 
 Is this similar to anything you've done already, Paul? Would you all be
 interested if I took a crack at doing this kind of refactor?



Re: [jira] Commented: (COUCHDB-902) Attachments that have recovered from conflict do not accept attachments.

2011-01-31 Thread Robert Dionne
Thanks Adam,

  I also had broken up 988 to move some into 902 and the rest into 462. I'll 
rearrange things based on this commit.

  One question I have is about a re-factor I did in 988 involving multiple 
assignments to a variable[1]. This re-factor does nothing to change the 
behavior of the code. Dialyzer does throw a warning about it, which is what 
motivated it but I also think the re-factor is clearer and slightly more 
readable as Conflict is assigned in one place. I've seen this before so I'm 
wondering whats the preference on this.

Cheers,

Bob


[1] 
https://github.com/bdionne/couchdb/commit/49bcb6df05dfefdcee40fea3d0fcded2859b6bf1#L0L30



On Jan 30, 2011, at 7:50 PM, Adam Kocoloski wrote:

 I'm tired of waiting for the JIRA maintenance window to end so I'm just going 
 to comment here.  I've combined pieces of Bob Dionne's various patches from 
 this ticket and COUCHDB-988 into a single commit here:
 
 https://github.com/kocolosk/couchdb/commit/1efd87
 
 It focuses on the merge code/tests/docs.  The actual change to the merge code 
 is small and has been discussed before; I only replaced or with orelse.  
 The rest of the commit involves reorganizing and augmenting the tests and 
 providing a version of Paul's description of the key tree as the module @doc. 
  I think it's high time we get this into trunk.  Best,
 
 Adam



Re: [jira] Commented: (COUCHDB-462) track conflict count in db_info (was built-in conflicts view)

2011-01-31 Thread Robert Dionne
still a good idea, I think I have a version that does this short-circuiting. 
Makes you want Scheme's call/cc :)




On Jan 30, 2011, at 2:01 PM, Adam Kocoloski (JIRA) wrote:

 
[ 
 https://issues.apache.org/jira/browse/COUCHDB-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988635#action_12988635
  ] 
 
 Adam Kocoloski commented on COUCHDB-462:
 
 
 Ah, scratch that - we need to check if the leaf of the conflict edit branch 
 has been deleted, of course.  Oh well.
 
 track conflict count in db_info (was built-in conflicts view)
 -
 
Key: COUCHDB-462
URL: https://issues.apache.org/jira/browse/COUCHDB-462
Project: CouchDB
 Issue Type: Improvement
 Components: HTTP Interface
   Reporter: Adam Kocoloski
Fix For: 1.2
 
Attachments: 462-jan-2.patch, conflicts_in_db_info.diff, 
 conflicts_in_db_info2.diff, conflicts_view.diff, 
 COUCHDB-462-adam-updated.patch, COUCHDB-462-jan.patch, whitespace.diff
 
 
 This patch adds a built-in _conflicts view indexed by document ID that looks 
 like
 GET /dbname/_conflicts
 {rows:[
 {id:foo, rev:1-1aa8851c9bb2777e11ba56e0bf768649, 
 conflicts:[1-bdc15320c0850d4ee90ff43d1d298d5d]}
 ]}
 GET /dbname/_conflicts?deleted=true
 {rows:[
 {id:bar, rev:5-dd31186f5aa11ebd47eb664fb342f1b1, 
 conflicts:[5-a0efbb1990c961a078dc5308d03b7044], 
 deleted_conflicts:[3-bdc15320c0850d4ee90ff43d1d298d5d,2-cce334eeeb02d04870e37dac6d33198a]},
 {id:baz, rev:2-eec205a9d413992850a6e32678485900, deleted:true, 
 deleted_conflicts:[2-10009b36e28478b213e04e71c1e08beb]}
 ]}
 As the HTTPd and view layers are a bit outside my specialty I figured I 
 should ask for a Review before Commit.
 
 -- 
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.
 



Re: [VOTE-RESULT] (was: [VOTE] Apache CouchDB 1.0.2 Release, Round 3)

2011-01-26 Thread Robert Dionne
Thanks davisp.




On Jan 26, 2011, at 5:47 PM, Paul Davis wrote:

 The final tally of the vote is:
 
14 +1 votes
 
 This exceeds the required minimum three +1 votes and the proposal passes.
 
 I shall prepare the release as soon as possible.
 
 The individual votes are as follows:
 
+1 Robert Dionne

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3cf5c66349-8576-4e96-b2ab-787a94515...@dionne-associates.com%3E
 
+1 Robert Newson

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktimxq+obdjawl+2qaqvz1x+qb8gaemqa73a0m...@mail.gmail.com%3E
 
+1 Jan Lehnardt

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3c7a193722-825b-4a34-8ad5-86cbd577b...@apache.org%3E
 
+1 Jeff Zellner

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktin3rvqeezjh182ppse+4h0jdflybp1eqaedc...@mail.gmail.com%3E
 
+1 Till t...@php.net

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktinhdqmto60ivyia+sqa_fcbwhbqnzkgnxtlh...@mail.gmail.com%3E
 
+1 Fedor Indutny

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3CAANLkTi=ipdt3rff7zw8difw83yea9ydcxfebbngd8...@mail.gmail.com%3E
 
+1 Randall Leeds

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktikg61ms9tg8zjw6z9xcafya+_rjwu7zczvxo...@mail.gmail.com%3E
 
+1 Benoit Chesneau

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktikxr9fzmn-rrosem64hkgrcmbde_j1pdt7h5...@mail.gmail.com%3E
 
+1 Dave Cottlehuber

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktinb_w+qdvegxoyqgxnhzq2adsxdhut9d0jux...@mail.gmail.com%3E
 
+1 Klaus Trainer

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3C1295703848.7124.48.camel@devil%3E
 
+1 Sebastian Cohnen

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3c685ff108-f203-4aed-bc7c-07ef2320c...@googlemail.com%3E
 
+1 Filipe David Manana

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3caanlktinegzludbfnwpgrmy8qh-jhp92ywn3xwk8d2...@mail.gmail.com%3E
 
+1 Noah Slater

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3c435fe3d3-3813-4a77-9b4f-8f201d126...@apache.org%3E
 
+1 Adam Kocoloski

 http://mail-archives.apache.org/mod_mbox/couchdb-dev/201101.mbox/%3c29bbcc74-2418-485d-bc09-bc3c83ff2...@apache.org%3E
 
 Thanks to everyone who voted.



Re: Idea: Piggyback doc on conflict

2011-01-23 Thread Robert Dionne
+1 

this sounds like an excellent idea.


On Jan 23, 2011, at 12:21 AM, kowsik wrote:

 I've been spending a fair bit of time on profiling the performance
 aspects of Couch. One common recurring theme is updating documents on
 a write-heavy site. This is currently what happens:
 
 PUT /db/doc_id
- 409 indicating conflict
 
 loop do
GET /db/doc_id
- 200
 
PUT /db/doc_id
- 201 (successful and returns the new _rev)
 end until we get a 201
 
 What would be beneficial is if I can request the current doc during
 PUT like so:
 
 PUT /db/doc_id?include_doc=true
- 409 conflict (but the 'doc' at the current _rev is returned)
 
 This would allow the caller to simply take the doc that was returned,
 update it and try PUT again (eliminate the extra GET). This is
 especially valuable when the app is on one geo and the db is in yet
 another (think couchone or cloudant).
 
 2 cents,
 
 K.
 ---
 http://twitter.com/pcapr
 http://labs.mudynamics.com



Re: Idea: Piggyback doc on conflict

2011-01-23 Thread Robert Dionne
These are also interesting ideas, but I don't think they adequately satisfy 
this particular write-heavy scenario. The client receiving the 409 has in hand 
the doc they wished to write and may just to add a 
field or update one. A general resolve_conflict function is a good idea for 
certain collaborative environments but I don't think would handle this specific 
case.

Having the conflict causing update return the doc that caused it seems really 
ideal. I'm still +1 on it






On Jan 23, 2011, at 7:51 AM, Robert Newson wrote:

 Oooh, crosspost.
 
 Had a similar chat on IRC last night.
 
 I'm -0 on returning the doc during a 409 PUT just because I think
 there are other options that might be preferred.
 
 For example, allowing a read_repair function in ddocs, that would take
 all conflicting revisions as input and return the resolved document as
 output. Or allowing a resolve_conflict function that is called at the
 moment of conflict creation, allowing it to be downgraded to a
 non-conflicting update.
 
 With either, or both, of those mechanisms, the proposed one here is 
 unnecessary.
 
 B.
 
 On Sun, Jan 23, 2011 at 12:04 PM, Robert Dionne
 dio...@dionne-associates.com wrote:
 +1
 
 this sounds like an excellent idea.
 
 
 On Jan 23, 2011, at 12:21 AM, kowsik wrote:
 
 I've been spending a fair bit of time on profiling the performance
 aspects of Couch. One common recurring theme is updating documents on
 a write-heavy site. This is currently what happens:
 
 PUT /db/doc_id
- 409 indicating conflict
 
 loop do
GET /db/doc_id
- 200
 
PUT /db/doc_id
- 201 (successful and returns the new _rev)
 end until we get a 201
 
 What would be beneficial is if I can request the current doc during
 PUT like so:
 
 PUT /db/doc_id?include_doc=true
- 409 conflict (but the 'doc' at the current _rev is returned)
 
 This would allow the caller to simply take the doc that was returned,
 update it and try PUT again (eliminate the extra GET). This is
 especially valuable when the app is on one geo and the db is in yet
 another (think couchone or cloudant).
 
 2 cents,
 
 K.
 ---
 http://twitter.com/pcapr
 http://labs.mudynamics.com
 
 



Re: code style

2011-01-20 Thread Robert Dionne


On Jan 20, 2011, at 9:26 AM, Jan Lehnardt wrote:

 
 On 20 Jan 2011, at 14:57, Adam Kocoloski wrote:
 
 I'd go a little further.  I think CouchDB should have two include files:
 
 include/couch_db.hrl (I'd prefer couch.hrl but I think we might be stuck w/ 
 this)
 src/couch_int.hrl (name is not important)
 
 The first one would contain all record definitions needed to interact with 
 CouchDB from Erlang.  The second would contain macro definitions and records 
 that are not supposed to be exported.  Moving couch_db.hrl to include/ would 
 eventually allow other applications to point to couch_db.hrl using the 
 -include_lib directive instead of specifying the absolute path to the 
 header.  Regards,
 
 I like that approach best.
 
 This is all part of a bigger discussion: what does a CouchDB plugin system 
 look like. While technically, you can have plugins today, it is a fairly 
 fragile endeavour.
 
 The srvmv (tip hat Paul) will give us more foundations to make the technical 
 part of this more solid. Fully fledged plugin support that I'd be comfortable 
 supporting would also include a defined internal API for plugins to use that 
 we give certain guarantees to not break. I know that's a bit off, but we 
 should get there eventually.

+1


 
 I would like to see, before getting started on any of this, an RFC-style 
 document / wiki page that defines what a CouchDB plugins system looks like 
 that we agree on implementing.

+3


 
 Cheers
 Jan
 -- 
 
 
 
 Adam
 
 On Jan 20, 2011, at 8:29 AM, Benoit Chesneau wrote:
 
 Actually we are using ?b2l/?l2b and some other macros to make the code
 shorter and ease our development. All these macros are in the main
 include file couch_db.hrl used everywhere in the code.
 
 Since this include will be likely used in CouchDB plugins created by
 users, I would like to have these kind of macros separated in their
 own include file. Something common in C world. The main reason is to
 not pollute namesspacing in external plugins and let them import only
 what they need, ie couchdb types/records.
 
 What do you think about it? Also, not related but maybe it could be a
 good practice to enforce the use of these macros in all the couchdb
 codebase like suggest filippe.
 
 Any thoughts ?
 
 - benoît
 
 



Re: [VOTE] Apache CouchDB 1.0.2 Release, Round 3

2011-01-20 Thread Robert Dionne
+1


On Jan 20, 2011, at 10:06 AM, Paul Davis wrote:

 This is the third release vote for Apache CouchDB 1.0.2
 
 Changes since the last round:
 
 * Fix raw view document link due to overlealous URI encoding in
   Futon.
 * Spell javascript correctly in loadScript(uri).
 * Preserve purge metadata during compaction to avoid spurious
   view regeneration.
 * Fix spurious conflicts during attachment uploads after a document
   has had a conflict. See COUCHDB-902 for details.
 * Fix multipart GET APIs to always send attachments in compressed
   form when they are compressed on disk. This fixes a bug for
   attachments created via local-local replication. See COUCHDB-1022
   for details.
 
 We encourage the whole community to download and test these release
 artifacts so that any critical issues can be resolved before the release
 is made. Everyone is free to vote on this release. Please report your
 results and vote to this thread.
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~davisp/dist/1.0.2/
 
 These artifacts have been built from the 1.0.2 tag in Subversion:
 
 http://svn.apache.org/repos/asf/couchdb/tags/1.0.2/
 
 Happy voting!



Re: [VOTE] Apache CouchDB 1.0.2 Release, Round 2

2011-01-11 Thread Robert Dionne
+1

OS X
make check is fine
All Futon tests pass in Chrome


Thanks Paul


On Jan 10, 2011, at 9:01 PM, Paul Davis wrote:

 This is the second release vote for Apache CouchDB 1.0.2
 
 Changes since the last round:
 
  * Fix share/www/image/spinner.gif
  * Fix OOM error when compacting documents with many conflicts
  * Upgraded ibrowse to 2.1.2 to fix more replicator bugs
  * Fix attachment compression for MIME types with parameters
  * Fix replicator to respect HTTP settings in the configuration
  * Fixed spurious replicator bug due to ibrowse dropping connections
  * Fix for frequently edited documents in multi-master deployments
being duplicated in _changes and _all_docs [See COUCHDB-968].
  * Detect corrupt views due to document duplication bug and warn
that the view should be rebuilt [See COUCHDB-999].
 
 We apologize for the delay due to fixing COUCHDB-968 but we felt it was
 sufficiently serious that it warranted an immediate fix.
 
 We encourage the whole community to download and test these release
 artifacts so that any critical issues can be resolved before the release
 is made. Everyone is free to vote on this release. Please report your
 results and vote to this thread.
 
 We are voting on the following release artifacts:
 
  http://people.apache.org/~davisp/dist/1.0.2/
 
 These artifacts have been built from the 1.0.2 tag in Subversion:
 
  http://svn.apache.org/repos/asf/couchdb/tags/1.0.2/
 
 Release the voters!



Re: [jira] Created: (COUCHDB-1004) list_to_existing_atom is too restrictive as used by couch_rep

2011-01-02 Thread Robert Dionne
Klaus,

   perhaps I just heard wrong or misinterpreted what was said in the chat room. 
It did seem unusual that calling list_to_atom(foo) twice would add more than 
one atom. So just reverting the call back in couch_rep:dbinfo should suffice to 
fix this as it's internal. Thanks,

Bob



On Jan 2, 2011, at 8:26 AM, Klaus Trainer wrote:

 As far as I can remember, the motivation behind list_to_existing_atom
 was not the fact that list_to_atom pollutes the atoms table during
 normal operation. However, it won't prevent atom table pollution when
 something goes wrong or somebody goes malicious (i.e., DoS attack).
 
 I've just looked it up for you, the exact description is here:
 https://issues.apache.org/jira/browse/COUCHDB-829
 
 
 - Klaus
 
 
 On Sun, 2011-01-02 at 08:06 -0500, Bob Dionne (JIRA) wrote:
 list_to_existing_atom is too restrictive as used by couch_rep
 -
 
 Key: COUCHDB-1004
 URL: https://issues.apache.org/jira/browse/COUCHDB-1004
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
 Environment: erlang
Reporter: Bob Dionne
Priority: Minor
 
 
 We'd like to additional information to db_info in BigCouch, such as the Q 
 and N constants for a given database. This causes replication to fail when 
 replicating from BigCouch to CouchDB due to the use of list_to_existing_atom 
 in couch_rep:dbinfo(...
 
 The claim is that list_to_atom pollutes the atoms table, however superficial 
 testing indicates this is not the case, list_to_atom when called repeatedly 
 seems to work fine. If this is true then consider reverting 
 list_to_existing_atom back to list_to_atom.
 
 
 



Re: CouchDB partitioning proposal

2010-12-19 Thread Robert Dionne

On Dec 18, 2010, at 8:00 PM, Klaus Trainer wrote:

 Hi guys!
 
 
 My two cents:
 
 
 If I had a few months to do some research in the area of Distributed
 Programming and CouchDB, I'd take the thread How fast do CouchDB
 propagate changes to other nodes on the user mailing list as an
 inspiration (which I've just read).
 
 For instance, one could do some research about the challenges of having
 updates propagated in soft real time through a system of many loosely
 connected CouchDB nodes that get a lot of updates independently of each
 other. Maybe there's some room for optimizing CouchDB's replication, in
 particular for such scenarios.

Great suggestion. This is a challenging area, even within clusters of couchdb 
nodes that aren't loosely coupled. There is information
that needs to be maintained globally, .ie. at all nodes for a healthy cluster 
and this needs to be kept in sync. As Klaus mentioned
earlier BigCouch addresses a lot of the needs of distribution (it puts the C 
back in CouchDB), and there are areas that need work, .eg. splitting/merging of 
partitions
dynamically while keeping the cluster up[1]. BigCouch has a well-defined 
architecture and layered approach that makes exploration
and experimentation easier[1,2,3]. The inter-node communication component[2] 
was built to be standalone and geared towards use with CouchDB.

Cheers,

Bob


[1] https://github.com/cloudant/mem3
[2] https://github.com/cloudant/rexi
[3] https://github.com/cloudant/fabric



 
 At first, in order to find out about different possible tradeoffs, one
 would have to start comparing and evaluating different concepts.
 
 For instance, one could find out about how replication things work, e.g.
 in CouchDB and in Riak. In terms of finding common ancestors of subtrees
 and detecting conflicts, there might be even a few things one could
 learn from Git...
 
 
 Anyway, you are welcomed to present new ideas! Or if not, some paper
 that gives an in-depth description of an existing feature of CouchDB
 (e.g. replication) would be great as well, as that provided insights for
 people who are not familiar with that particular codebase.
 
 
 Cheers,
 Klaus
 
 
 On Tue, 2010-12-14 at 22:54 +, Iago Abal wrote:
 Hi all,
 
 Well, to be more specific we are a group of classmates that have decided to
 work on CouchDB as MSc coursework (Tiago might want to be brief...). We have
 the task of study CouchDB until February and then, the idea is to spend 4-5
 months working on a contribution for CouchDB. Our main problem seems to be
 that wiki stuff is very out-of-date, when we read that CouchDB lacks feature
 A and we decide to focus in this problem we finally find that it is already
 solved. We have spent some time learn the very basic about CouchDB but we
 are having troubles to properly define the project so we would appreciate
 commentaries about what kind of contribution (related with distributed
 systems topic) is of the interest of th CouchDB community.
 
 Thanks in advance,
 
 On Tue, Dec 14, 2010 at 10:03 PM, Klaus Trainer klaus.trai...@web.dewrote:
 
 Hi Tiago,
 
 check out BigCouch: https://github.com/cloudant/bigcouch. Most of it has
 been done by the guys at Cloudant. They're building a scalable CouchDB
 hosting platform (https://cloudant.com), in which BigCouch is more or
 less the core of it. If you've any questions regarding Cloudant or
 BigCouch, you maybe can find some help in the #cloudant IRC room at
 Freenode.
 
 For a (quick) introduction to BigCouch you can check out e.g.:
 
 - Dynamo and CouchDB Clusters:
 http://blog.cloudant.com/dynamo-and-couchdb-clusters
 - Scaling Out with BigCouch—Webcast: http://is.gd/iKLwM
 
 Cheers,
 Klaus
 
 
 On Tue, 2010-12-14 at 21:34 +, Tiago Silva wrote:
 Hi,
 
 I want to contribute on CouchDB partitioning proposal (
 http://wiki.apache.org/couchdb/Partitioning_proposal) and I would like
 to
 know if anyone can help me to find the issues on this topic. Please tell
 me
 what issues are being developed currently, the ones that are already
 closed
 and what you suggest to start to develop now.
 
 
 Thank you,
 Tiago Silva
 
 P.S. Please reply to all recipients of this mail message in order to
 better
 filter this conversation in CouchDB dev mailing list.
 
 
 
 
 
 
 



Re: [jira] Commented: (COUCHDB-975) non-portable sh in configure.ac (breaks on Solaris)

2010-12-03 Thread Robert Dionne



On Dec 3, 2010, at 5:13 AM, Noah Slater (JIRA) wrote:

 
[ 
 https://issues.apache.org/jira/browse/COUCHDB-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966471#action_12966471
  ] 
 
 Noah Slater commented on COUCHDB-975:
 -
 
 I'd be more comfortable with:
 
 -version=`${ERL} -version 21 | ${SED} s/[[^0-9]]/ /g` 
 +version=`${ERL} -version 21 | ${SED} 's/[[^0-9]]/ /g'` 

wow, my new glasses work! I can see the diff between these two lines :)


 
 Can you confirm this would work?
 
 non-portable sh in configure.ac (breaks on Solaris)
 ---
 
Key: COUCHDB-975
URL: https://issues.apache.org/jira/browse/COUCHDB-975
Project: CouchDB
 Issue Type: Bug
 Components: Build System
   Affects Versions: 0.8, 0.8.1, 0.9, 0.9.1, 0.9.2, 0.10, 0.10.1, 0.10.2, 
 0.11, 0.11.1, 0.11.2, 1.0, 1.0.1
Environment: OpenSolaris (will affect other Solaris versions, too)
 SunOS osol-x86 5.11 snv_111b i86pc i386 i86pc Solaris
   Reporter: Timothy Smith
   Priority: Minor
  Original Estimate: 0.08h
 Remaining Estimate: 0.08h
 
 Get this when running configure:
 ...
 checking for erl... /export/home/tim/c/build-couchdb/build/bin/erl
 ./configure[12123]: : cannot execute [Is a directory]
 /opt/csw/bin/gsed: -e expression #1, char 9: unterminated `s' command
 ./configure[12123]: /g: not found [No such file or directory]
 ./configure[12125]: test: argument expected
 ./configure[12129]: test: argument expected
 ./configure[12133]: test: argument expected
 checking for erlc... /export/home/tim/c/build-couchdb/build/bin/erlc
 ...
 A patch to fix it is:
 commit 6b018d087ba8ddaf3789e106ade9b74488de5136
 Author: Timothy Smith t...@couchone.com
 Date:   Thu Dec 2 23:13:10 2010 -0700
Fix syntax error with /bin/sh on Solaris
 
The RHS of an assignment is implicitly quoted in Bourne shell. Not
all shells (in particular, not /bin/sh in Solaris) can handle nested
double-quotes like foo=`bar baz`, and it's always safe to not
use the outer set.
 diff --git a/configure.ac b/configure.ac
 index c609a08..73ea9fe 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -243,7 +243,7 @@ fi
 
 erlang_version_error=The installed Erlang version is less than 5.6.5 
 (R12B05).
 
 -version=`${ERL} -version 21 | ${SED} s/[[^0-9]]/ /g`
 +version=`${ERL} -version 21 | ${SED} s/[[^0-9]]/ /g`
 
 if test `echo $version | ${AWK} {print \\$1}` -lt 5; then
 AC_MSG_ERROR([$erlang_version_error])
 
 -- 
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.
 



Re: add plugin handling to the build system

2010-12-01 Thread Robert Dionne
I think this would be really neat




On Dec 1, 2010, at 9:49 AM, Benoit Chesneau wrote:

 On Wed, Dec 1, 2010 at 1:04 PM, Noah Slater nsla...@apache.org wrote:
 I've read the whole thread, and I still don't understand what anyone is 
 talking about.
 
 The goal is to provide an easy way to handle plugings in couchdb:
 
 - how to build them against easily against couchdb
 - how they can be handle directly in couchdb
 
 Both are possible today, except build is really not intuitive. And
 having to launch a plugin by setting ERL_FLAGS environment variable
 manually isn't the best way. So this discussion is about finding a way
 to register a plugin in couch, make couchdb launch it automatically if
 enabled. Other part of the problem is to have a way to find couchdb
 includes on the system to build the plugin againts them. That's part
 may be handle by pkg-config.
 
 - benoit



test email

2010-11-30 Thread Robert Dionne
my posts to the dev list appear to be bouncing





Fwd: failure notice

2010-11-30 Thread Robert Dionne

well here's the reply to your post



Begin forwarded message:

 From: mailer-dae...@cpoproxy3-pub.bluehost.com
 Date: November 30, 2010 6:35:58 AM EST
 To: dio...@dionne-associates.com
 Subject: failure notice
 
 Hi. This is the qmail-send program at cpoproxy3-pub.bluehost.com.
 I'm afraid I wasn't able to deliver your message to the following addresses.
 This is a permanent error; I've given up. Sorry it didn't work out.
 
 dev@couchdb.apache.org:
 140.211.11.136 does not like recipient.
 Remote host said: 550 Dynamic IP Addresses See: 
 http://www.sorbs.net/lookup.shtml?67.222.54.6
 Giving up on 140.211.11.136.
 
 --- Enclosed are the original headers of the message.
 
 From: Robert Dionne dio...@dionne-associates.com
 Date: November 30, 2010 6:35:57 AM EST
 To: dev@couchdb.apache.org
 Subject: Re: test email
 
 
 (Body supressed)
 
 



Fwd: failure notice

2010-11-30 Thread Robert Dionne
I also received this when I forwarded your reply to my post to dev.apache.org.




Begin forwarded message:

 From: postmas...@blackrock.com
 Date: November 30, 2010 6:37:50 AM EST
 To: Robert Dionne dio...@dionne-associates.com
 Subject: Re: Fwd: failure notice
 
 Please note that the address:
 dev@couchdb.apache.org
 will cease working on December 1, 2010.  Please update your contact address 
 to the same address @blackrock.com.
 
 Thank you.
 
 THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
 PRIVILEGED.  If this message was misdirected, BlackRock, Inc. and its 
 subsidiaries, (BlackRock) does not waive any confidentiality or privilege.  
 If you are not the intended recipient, please notify us immediately and 
 destroy the message without disclosing its contents to anyone.  Any 
 distribution, use or copying of this e-mail or the information it contains by 
 other than an intended recipient is unauthorized.  The views and opinions 
 expressed in this e-mail message are the author's own and may not reflect the 
 views and opinions of BlackRock, unless the author is authorized by BlackRock 
 to express such views or opinions on its behalf.  All email sent to or from 
 this address is subject to electronic storage and review by BlackRock.  
 Although BlackRock operates anti-virus programs, it does not accept 
 responsibility for any damage whatsoever caused by viruses being passed.
 
 



Re: tracking upstream dependencies

2010-11-27 Thread Robert Dionne
I think the problem with patches is that they can become unwieldy, .eg. which 
couch version plus which set of patches are bundled? There is also the issue of 
support and bug triage. A released version has some sort of implicit support 
offered by the publisher of the release, whereas a collection of patches are 
provided by the bundler of the software.

It's not to say patches aren't useful, they are, but they are better employed 
for exceptional circumstances and supported by the original publisher. Database 
vendors often do this.

I haven't followed all the specific dependency issues closely but it strikes me 
that the state of development of these various projects makes the canonical 
github approach mentioned earlier preferable. 







On Nov 27, 2010, at 5:27 AM, Robert Newson wrote:

 I like the Debian approach where they maintain a patchset against a
 pristine upstream tarball.
 
 I'd particularly like to see the bigcouch variant of couchdb expressed
 that way, it would allow new developers to see where bigcouch diverges
 and it would allow couchdb developers to easily incorporate generally
 useful patches.
 
 B.
 
 On Fri, Nov 26, 2010 at 9:25 PM, Dirkjan Ochtman dirk...@ochtman.nl wrote:
 On Fri, Nov 26, 2010 at 22:15, Noah Slater nsla...@apache.org wrote:
 If we have a checksum, what's the point?
 
 Why not just include the original source the checksum is taken from?
 
 The point is keeping very exact track of what the source is. And the
 point is making it easy for distributors to build without the bundled
 dependencies. It would be great to also ship an artefact without the
 bundled deps, but that might be a tad too much work...
 
 Cheers,
 
 Dirkjan
 



Re: [jira] Updated: (COUCHDB-968) Duplicated IDs in _all_docs

2010-11-27 Thread Robert Dionne
yea, they are identical and both compaction and exceeding the revision max is 
required to reproduce.



On Nov 27, 2010, at 5:45 PM, Adam Kocoloski (JIRA) wrote:

 
 [ 
 https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
 
 Adam Kocoloski updated COUCHDB-968:
 ---
 
Priority: Blocker  (was: Major)
 
 Bob, in tisba's case the duplicates had the same revision.  Is that also true 
 in your case?  And you only see these duplicates after compaction?
 
 Duplicated IDs in _all_docs
 ---
 
Key: COUCHDB-968
URL: https://issues.apache.org/jira/browse/COUCHDB-968
Project: CouchDB
 Issue Type: Bug
 Components: Database Core
   Affects Versions: 1.0, 1.0.1, 1.0.2
Environment: Ubuntu 10.04.
   Reporter: Sebastian Cohnen
   Priority: Blocker
 
 We have a database, which is causing serious trouble with compaction and 
 replication (huge memory and cpu usage, often causing couchdb to crash b/c 
 all system memory is exhausted). Yesterday we discovered that db/_all_docs 
 is reporting duplicated IDs (see [1]). Until a few minutes ago we thought 
 that there are only few duplicates but today I took a closer look and I 
 found 10 IDs which sum up to a total of 922 duplicates. Some of them have 
 only 1 duplicate, others have hundreds.
 Some facts about the database in question:
 * ~13k documents, with 3-5k revs each
 * all duplicated documents are in conflict (with 1 up to 14 conflicts)
 * compaction is run on a daily bases
 * several thousands updates per hour
 * multi-master setup with pull replication from each other
 * delayed_commits=false on all nodes
 * used couchdb versions 1.0.0 and 1.0.x (*)
 Unfortunately the database's contents are confidential and I'm not allowed 
 to publish it.
 [1]: Part of http://localhost:5984/DBNAME/_all_docs
 ...
 {id:9997,key:9997,value:{rev:6096-603c68c1fa90ac3f56cf53771337ac9f}},
 {id:,key:,value:{rev:6097-3c873ccf6875ff3c4e2c6fa264c6a180}},
 {id:,key:,value:{rev:6097-3c873ccf6875ff3c4e2c6fa264c6a180}},
 ...
 [*]
 There were two (old) servers (1.0.0) in production (already having the 
 replication and compaction issues). Then two servers (1.0.x) were added and 
 replication was set up to bring them in sync with the old production servers 
 since the two new servers were meant to replace the old ones (to update 
 node.js application code among other things).
 
 -- 
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.
 



Re: [VOTE] Apache CouchDB 1.0.2 release, Round 1

2010-11-25 Thread Robert Dionne
+1

OS X 10.6.5 Erlang R14B

everything works except the futon test auth_cache in Chrome



On Nov 25, 2010, at 1:43 PM, Paul Davis wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 1.0.2 release, round 1.
 
 Changes since 1.0.1:
 
 Futon:
 
 * Make test suite work with Safari and Chrome.
 
 Storage System:
 
 * Fix leaking file handles after compacting databases and views.
 * Fix databases forgetting their validation function after compaction.
 * Fix occasional timeout errors after successfully compacting large databases.
 * Fix ocassional error when writing to a database that has just been 
 compacted.
 * Fix occasional timeout errors on systems with slow or heavily loaded IO.
 
 Log System:
 
 * Reduce lengthy stack traces.
 * Allow logging of native xml types.
 
 HTTP Interface:
 
 * Allow reduce=false parameter in map-only views.
 * Fix parsing of Accept headers.
 
 Replicator:
 
 * Updated ibrowse library to 2.1.0 fixing numerous replication issues.
 * Fix authenticated replication (with HTTP basic auth) of design documents
   with attachments.
 * Various fixes to make replication more resilient for edge-cases.
 
 View Server:
 
 * Don't trigger view updates when requesting `_design/doc/_info`.
 * Fix for circular references in CommonJS requires.
 * Made isArray() function available to functions executed in the query server.
 * Documents are now sealed before being passed to map functions.
 
 We encourage the whole community to download and test these release artifacts 
 so
 that any critical issues can be resolved before the release is made. Everyone 
 is
 free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
  http://people.apache.org/~davisp/dist/1.0.2/
 
 These artifacts have been built from the 1.0.2 tag in Subversion:
 
  http://svn.apache.org/repos/asf/couchdb/tags/1.0.2/
 
 Happy voting,
 Paul Davis



Re: Using rebar to install couchdb

2010-10-14 Thread Robert Dionne
+1 also

I think the convention is

./apps/couch_core/ebin
./apps/couch_core/src
./apps/couch_core/include
./apps/couch_core/priv
./apps/couch_http/ebin


rather than ./src/

I like the idea of still using the existing build, which is awesome, and have 
it feed into rebar so we can make use of reltools, etc, and templates for 
parameterizing the various .ini files

doing it after the next release will be a good time to break everything

On Oct 14, 2010, at 4:03 PM, Robert Newson wrote:

 Paul,
 
 Brilliant writeup and proposal. I'd like to see all those things
 happen pretty much as you said. Cleaning the cycles out will be much
 easier once things are broken out in that style.
 
 +1
 
 B.
 
 On Thu, Oct 14, 2010 at 7:54 PM, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 On Wed, Oct 13, 2010 at 5:23 PM, Benoit Chesneau bchesn...@gmail.com wrote:
 In an attempt to start some merging with cloudant I would like to
 start by using rebar in our install process.
 
 Like i see it, we could continue to use autotools to create the
 rebar.config files and other templates an then rebar for the final
 build and dependencies management. This changes as noticed by @davisp
 also imply we make our tree a little more OTP compliant. I would like
 to start this work asap.
 
 Thoughts ?
 
 - benoit
 
 
 So there's a couple issues at hand here which seem to be motivated by
 the desire to start using tools like rebar.
 
 Our current source tree is not compliant with some of the basic
 Erlang/OTP conventions. This is both bad technically and socially.
 Technically, it prevents us from easily integrating tools like rebar
 that would help advanced users with things like making Erlang reltools
 packages. Socially, it doesn't reflect well on us to members of the
 Erlang community that may have otherwise become contributors. All
 languages have a standard package layout and Erlang is no different.
 
 The current CouchDB Erlang app has grown considerably. There's been
 general consensus that we need to start splitting it up into smaller
 applications that encompass specific functionality. There's been a bit
 of effort in this direction, but its such a major change to source
 file location it needs to have a community consensus to really start
 working on seriously.
 
 I don't think we should focus directly on the issue of integrating
 rebar. It should definitely be a goal, but not at the cost of our
 current situation. Noah Slater has maintained an excellent build
 system for us as is shown by the number of people building CouchDB
 from source and the number of packages available. While I have argued
 with him on numerous occasions about details, I have come to the
 conclusion that it is not possible for him to be wrong. I personally
 attribute this to the fact that he's most likely an advanced robot
 from the future. That said, Noah has voiced concerns to various ideas
 and we should make sure that any of his concerns are fully addressed.
 
 We should attempt to make sure that any tool support doesn't morph
 into tool requirement. For instance, I think we should make sure that
 its possible to keep compiling CouchDB without rebar and not come to
 rely on it.
 
 While I'd be more than happy to start in on this and handle all of the
 build system refactoring to make this happen, I'm not going to start
 until there's a community consensus on what needs to be done. There
 are a couple paths that I could see us taking to make this happen. We
 could just make the current source tree be rebar compatible and figure
 out the build system to do the optional rebar build or we could also
 take this chance to split the source code into multiple applications.
 Personally, I'd prefer to take this opportunity to organize the code
 with multiple erlang apps.
 
 Too get the conversation rolling here's a first pass at a new app proposal:
 
 etap:
 
Nick Gerakines now releases etap as a single .erl file that can be
 dropped into the test directory. This app should be removed in favor
 of that method.
 
 erlang-oauth:
 
Should be renamed to just oauth. That erlang- prefix has bugged me
 fore entirely too long.
 
 mochiweb, ibrowse, oauth:
 
Refactored to use standard src, include, ebin, priv directories to
 be OTP compliant. This results in directories like
 
./src/$APP/ebin
./src/$APP/incldue
./src/$APP/priv
./src/$APP/src
 
 couchdb:
 
Each proposed app will be structured as described above. Proposed apps:
 
couch_core: The core Erlang modules for storing docs and managing
 internal infrastructure
couch_view: The view engine as well as the holder for managing OS 
 processes.
couch_rep: couch_rep*.erl
couch_externals: couch_external*.erl
couch_httpd: couch_http*.erl
 
 At the bottom of this email I made an initial pass through the
 ./src/couchdb tree to classify file by file into the described apps.
 There are also some minor warts in this split. Things like the core
 couchdb 

Re: Using rebar to install couchdb

2010-10-13 Thread Robert Dionne
+1



On Oct 13, 2010, at 5:23 PM, Benoit Chesneau wrote:

 In an attempt to start some merging with cloudant I would like to
 start by using rebar in our install process.
 
 Like i see it, we could continue to use autotools to create the
 rebar.config files and other templates an then rebar for the final
 build and dependencies management. This changes as noticed by @davisp
 also imply we make our tree a little more OTP compliant. I would like
 to start this work asap.
 
 Thoughts ?
 
 - benoit



Re: multiview using bloom filters

2010-09-25 Thread Robert Dionne
Norman,

   Basho also has a bloom filter implementation packaged as a separate 
project[1], that you might find useful. It's used in Bitcask.

Cheers,

Bob



[1] http://github.com/basho/ebloom




On Sep 24, 2010, at 11:21 PM, Norman Barker wrote:

 Paul,
 
 yes, performance is actually much better (for some of our harder
 queries, so all docs over time with field X (two views), 10x faster),
 I am testing with docs that in total emit ~100K of keys (following the
 raindrop megaview).
 
 Some of the scalable bloom filter project contained EPL headers,
 others didn't, googling for the source code I had seen other projects
 add the EPL headers to bit array so I did the same. I will contact the
 author as he seems active on the erlang mailing lists and if not I
 will write a bloom filter from scratch, the theory is well documented,
 though I like his code!
 
 thanks for your help, let me know any suggestions you may have.
 
 thanks,
 
 Norman
 
 
 
 On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 Norman,
 
 Just glanced through. Looks better. Any feeling for a performance 
 differences?
 
 Also, I glanced at the original files that you linked to. The bit
 array files didn't have a license, but what you've got there does have
 EPL headers. We need to make sure we have permission to do so. I would
 assume as much, but we have to be careful about such things in the
 ASF. You only need to get an email from the original author saying its
 ok.
 
 I'm a bit caught up with some other code at the moment, I'll give a
 more thorough combing over tomorrow.
 
 Paul
 
 On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker norman.bar...@gmail.com 
 wrote:
 Hi,
 
 thanks to Paul's excellent suggestion I have rewritten the multiview
 to use bloom filters, I had a concern that a bloom filter per view
 would use too much memory but thanks in the main to excellent
 implementation of bloom filters in erlang
 (http://sites.google.com/site/scalablebloomfilters/) they seem to be
 very space efficient.
 
 New code is here
 
 http://github.com/normanb/couchdb/
 
 The code is simple, all one process, once we have agreed the approach
 we can decide if there is any benefit in making the bloom filter
 generation occur a separate process (using a genserver).
 
 Comments as always appreciated, I will continue adding to the test suite.
 
 thanks for the help,
 
 Norman
 
 



Re: CouchDb not releasing files

2010-09-25 Thread Robert Dionne
Filipe,

  Won't terminate be called only if the gen_server is stopped for a reason?

Bob



On Sep 25, 2010, at 7:30 AM, Filipe David Manana wrote:

 Stephen,
 
 I committed something to trunk (
 http://svn.apache.org/viewvc?view=revisionrevision=1001196 ) that
 might be the cause for your issue.
 Can you test it with trunk?
 
 I was not yet able to reproduce the issue.
 
 cheers
 
 On Wed, Sep 22, 2010 at 2:40 PM, [mRg] emar...@gmail.com wrote:
 Yes I did, issued it a couple of times as I was trying to see if its was
 some kind of race condition.
 
 Have a site due to launch on Friday so have got crazy busy but ill post more
 info when I have it, unless anyone else can try and help replicate this ?
 
 On 20 September 2010 21:42, Paul Davis paul.joseph.da...@gmail.com wrote:
 
 Do you issue two compact commands simultaneously or wait for the first
 one to complete and then run the second?
 
 
 On Mon, Sep 20, 2010 at 4:07 PM, [mRg] emar...@gmail.com wrote:
 Well I can certainly reproduce the issue, but am having trouble finding
 the
 exact sequence (annoyingly of course)
 
 1 I started with a blank VM of Ubuntu 10.10 (ext3) running on virtualbox
 with latest CouchDB (apt-get install couchdb)
 2 Began adding lots of blank docs ..  (curl -X POST -H
 Content-Type:application/json http://localhost:5984/test -d {})
 3 Created a simple view (function(doc) {  emit(null, doc); }) and ran
 to
 ensure it wrote to disk.
 4 I issued 2 compact commands (curl -X POST
 http://localhost:5984/test/_compact/ -H Content-Type:application/json)
 5 And began adding more documents again as before ..
 
 Repeating this, essentially creating view data and compacting it about 10
 times resulted in 4 'snagged' files
 
 r...@ubuntu:~# lsof | grep -P 'COMMAND|/var/lib/couchdb/1.0.1'
 COMMANDPIDUSER   FD  TYPE DEVICE SIZE/OFF   NODE NAME
 beam  1083 couchdb   13u  REG  252,0 4185271
 /var/lib/couchdb/1.0.1/_users.couch
 beam  1083 couchdb   14u  REG  252,0   331865275
 /var/lib/couchdb/1.0.1/.delete/4e000e0ae5cce3d275f09b4542396e85 (deleted)
 beam  1083 couchdb   16u  REG  252,0   602210279
 /var/lib/couchdb/1.0.1/.delete/a9ee2b96c12591455ea795fa5324dc9c (deleted)
 beam  1083 couchdb   17u  REG  252,0   270434283
 /var/lib/couchdb/1.0.1/.test_design/07ca32cf9b0de9c915c5d9ce653cdca3.view
 beam  1083 couchdb   18u  REG  252,0   204898284
 /var/lib/couchdb/1.0.1/.test_design/540958c4124af3925fe467afb96f4906.view
 beam  1083 couchdb   20u  REG  252,0   405602286
 /var/lib/couchdb/1.0.1/.test_design/f26a8fcc3d2ce226a9e652338882c3db.view
 beam  1083 couchdb   21u  REG  252,0   299106287
 /var/lib/couchdb/1.0.1/.test_design/40614f8c8e1b4bab9d093881e914729d.view
 beam  1083 couchdb   22u  REG  252,0   106594285
 /var/lib/couchdb/1.0.1/.delete/31909e936ce94db7a6cede72827b18b2 (deleted)
 beam  1083 couchdb   24r  REG  252,0   233570288
 /var/lib/couchdb/1.0.1/.delete/83d09ec6dab90ca6078c0310085b97cc (deleted)
 beam  1083 couchdb   26u  REG  252,0   163932281
 /var/lib/couchdb/1.0.1/.test_design/3486e8de398e27b8767afd4975691360.view
 beam  1083 couchdb   27w  REG  252,0   114786289
 /var/lib/couchdb/1.0.1/test.couch
 beam  1083 couchdb   28u  REG  252,0   217186280
 /var/lib/couchdb/1.0.1/.test_design/b450740e3245f89ab902db10d767a397.view
 
 .. while the bottom 2 marked with (deleted) hung around for about 20
 minutes
 they eventually disappeared ..
 
 r...@ubuntu:~# lsof | grep -P 'COMMAND|/var/lib/couchdb/1.0.1'
 COMMANDPIDUSER   FD  TYPE DEVICE SIZE/OFF   NODE NAME
 beam  1083 couchdb   13u  REG  252,0 4185271
 /var/lib/couchdb/1.0.1/_users.couch
 beam  1083 couchdb   14u  REG  252,0   331865275
 /var/lib/couchdb/1.0.1/.delete/4e000e0ae5cce3d275f09b4542396e85 (deleted)
 beam  1083 couchdb   16u  REG  252,0   602210279
 /var/lib/couchdb/1.0.1/.delete/a9ee2b96c12591455ea795fa5324dc9c (deleted)
 beam  1083 couchdb   17u  REG  252,0   270434283
 /var/lib/couchdb/1.0.1/.test_design/07ca32cf9b0de9c915c5d9ce653cdca3.view
 beam  1083 couchdb   18u  REG  252,0   204898284
 /var/lib/couchdb/1.0.1/.test_design/540958c4124af3925fe467afb96f4906.view
 beam  1083 couchdb   20u  REG  252,0   405602286
 /var/lib/couchdb/1.0.1/.test_design/f26a8fcc3d2ce226a9e652338882c3db.view
 beam  1083 couchdb   21u  REG  252,0   299106287
 /var/lib/couchdb/1.0.1/.test_design/40614f8c8e1b4bab9d093881e914729d.view
 beam  1083 couchdb   23u  REG  252,0   114786288
 /var/lib/couchdb/1.0.1/test.couch
 beam  1083 couchdb   26u  REG  252,0   163932281
 

Re: CouchDb not releasing files

2010-09-25 Thread Robert Dionne
http://erldocs.com/otp_src_R13B/stdlib/gen_server.html

If the function returns {stop,Reason,Reply,NewState}, Reply will be given back 
to From. If the function returns {stop,Reason,NewState}, any reply to From must 
be given explicitly using gen_server:reply/2. The gen_server will then call 
Module:terminate(Reason,NewState)and terminate.



On Sep 25, 2010, at 10:53 AM, Paul Davis wrote:

 I'm not aware of a case where a gen_server will stop without calling
 terminate except if its a hard VM shutdown. Though, I get the feeling
 that Kocoloski is going to remind me of a case I've seen before and
 I'll say Oh duh! and then move on with my life.
 
 HTH,
 Paul Davis
 
 On Sat, Sep 25, 2010 at 8:35 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Filipe,
 
  Won't terminate be called only if the gen_server is stopped for a reason?
 
 Bob
 
 
 
 On Sep 25, 2010, at 7:30 AM, Filipe David Manana wrote:
 
 Stephen,
 
 I committed something to trunk (
 http://svn.apache.org/viewvc?view=revisionrevision=1001196 ) that
 might be the cause for your issue.
 Can you test it with trunk?
 
 I was not yet able to reproduce the issue.
 
 cheers
 
 On Wed, Sep 22, 2010 at 2:40 PM, [mRg] emar...@gmail.com wrote:
 Yes I did, issued it a couple of times as I was trying to see if its was
 some kind of race condition.
 
 Have a site due to launch on Friday so have got crazy busy but ill post 
 more
 info when I have it, unless anyone else can try and help replicate this ?
 
 On 20 September 2010 21:42, Paul Davis paul.joseph.da...@gmail.com wrote:
 
 Do you issue two compact commands simultaneously or wait for the first
 one to complete and then run the second?
 
 
 On Mon, Sep 20, 2010 at 4:07 PM, [mRg] emar...@gmail.com wrote:
 Well I can certainly reproduce the issue, but am having trouble finding
 the
 exact sequence (annoyingly of course)
 
 1 I started with a blank VM of Ubuntu 10.10 (ext3) running on virtualbox
 with latest CouchDB (apt-get install couchdb)
 2 Began adding lots of blank docs ..  (curl -X POST -H
 Content-Type:application/json http://localhost:5984/test -d {})
 3 Created a simple view (function(doc) {  emit(null, doc); }) and ran
 to
 ensure it wrote to disk.
 4 I issued 2 compact commands (curl -X POST
 http://localhost:5984/test/_compact/ -H Content-Type:application/json)
 5 And began adding more documents again as before ..
 
 Repeating this, essentially creating view data and compacting it about 10
 times resulted in 4 'snagged' files
 
 r...@ubuntu:~# lsof | grep -P 'COMMAND|/var/lib/couchdb/1.0.1'
 COMMANDPIDUSER   FD  TYPE DEVICE SIZE/OFF   NODE NAME
 beam  1083 couchdb   13u  REG  252,0 4185271
 /var/lib/couchdb/1.0.1/_users.couch
 beam  1083 couchdb   14u  REG  252,0   331865275
 /var/lib/couchdb/1.0.1/.delete/4e000e0ae5cce3d275f09b4542396e85 (deleted)
 beam  1083 couchdb   16u  REG  252,0   602210279
 /var/lib/couchdb/1.0.1/.delete/a9ee2b96c12591455ea795fa5324dc9c (deleted)
 beam  1083 couchdb   17u  REG  252,0   270434283
 /var/lib/couchdb/1.0.1/.test_design/07ca32cf9b0de9c915c5d9ce653cdca3.view
 beam  1083 couchdb   18u  REG  252,0   204898284
 /var/lib/couchdb/1.0.1/.test_design/540958c4124af3925fe467afb96f4906.view
 beam  1083 couchdb   20u  REG  252,0   405602286
 /var/lib/couchdb/1.0.1/.test_design/f26a8fcc3d2ce226a9e652338882c3db.view
 beam  1083 couchdb   21u  REG  252,0   299106287
 /var/lib/couchdb/1.0.1/.test_design/40614f8c8e1b4bab9d093881e914729d.view
 beam  1083 couchdb   22u  REG  252,0   106594285
 /var/lib/couchdb/1.0.1/.delete/31909e936ce94db7a6cede72827b18b2 (deleted)
 beam  1083 couchdb   24r  REG  252,0   233570288
 /var/lib/couchdb/1.0.1/.delete/83d09ec6dab90ca6078c0310085b97cc (deleted)
 beam  1083 couchdb   26u  REG  252,0   163932281
 /var/lib/couchdb/1.0.1/.test_design/3486e8de398e27b8767afd4975691360.view
 beam  1083 couchdb   27w  REG  252,0   114786289
 /var/lib/couchdb/1.0.1/test.couch
 beam  1083 couchdb   28u  REG  252,0   217186280
 /var/lib/couchdb/1.0.1/.test_design/b450740e3245f89ab902db10d767a397.view
 
 .. while the bottom 2 marked with (deleted) hung around for about 20
 minutes
 they eventually disappeared ..
 
 r...@ubuntu:~# lsof | grep -P 'COMMAND|/var/lib/couchdb/1.0.1'
 COMMANDPIDUSER   FD  TYPE DEVICE SIZE/OFF   NODE NAME
 beam  1083 couchdb   13u  REG  252,0 4185271
 /var/lib/couchdb/1.0.1/_users.couch
 beam  1083 couchdb   14u  REG  252,0   331865275
 /var/lib/couchdb/1.0.1/.delete/4e000e0ae5cce3d275f09b4542396e85 (deleted)
 beam  1083 couchdb   16u  REG  252,0   602210279
 /var/lib/couchdb/1.0.1/.delete/a9ee2b96c12591455ea795fa5324dc9c (deleted)
 beam  1083 couchdb   17u  REG  252,0

Re: CouchDb not releasing files

2010-09-25 Thread Robert Dionne
couch_file has a close function which presumably does the right thing, but it's 
only called from couch_db_update:terminate 



On Sep 25, 2010, at 11:02 AM, Randall Leeds wrote:

 What about if an exit() is called or something? I think then terminate
 is still called, but worth checking.
 
 On Sat, Sep 25, 2010 at 16:57, Robert Dionne
 dio...@dionne-associates.com wrote:
 http://erldocs.com/otp_src_R13B/stdlib/gen_server.html
 
 If the function returns {stop,Reason,Reply,NewState}, Reply will be given 
 back to From. If the function returns {stop,Reason,NewState}, any reply to 
 From must be given explicitly using gen_server:reply/2. The gen_server will 
 then call Module:terminate(Reason,NewState)and terminate.
 
 
 
 On Sep 25, 2010, at 10:53 AM, Paul Davis wrote:
 
 I'm not aware of a case where a gen_server will stop without calling
 terminate except if its a hard VM shutdown. Though, I get the feeling
 that Kocoloski is going to remind me of a case I've seen before and
 I'll say Oh duh! and then move on with my life.
 
 HTH,
 Paul Davis
 
 On Sat, Sep 25, 2010 at 8:35 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 Filipe,
 
  Won't terminate be called only if the gen_server is stopped for a reason?
 
 Bob
 
 
 
 On Sep 25, 2010, at 7:30 AM, Filipe David Manana wrote:
 
 Stephen,
 
 I committed something to trunk (
 http://svn.apache.org/viewvc?view=revisionrevision=1001196 ) that
 might be the cause for your issue.
 Can you test it with trunk?
 
 I was not yet able to reproduce the issue.
 
 cheers
 
 On Wed, Sep 22, 2010 at 2:40 PM, [mRg] emar...@gmail.com wrote:
 Yes I did, issued it a couple of times as I was trying to see if its was
 some kind of race condition.
 
 Have a site due to launch on Friday so have got crazy busy but ill post 
 more
 info when I have it, unless anyone else can try and help replicate this ?
 
 On 20 September 2010 21:42, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 
 Do you issue two compact commands simultaneously or wait for the first
 one to complete and then run the second?
 
 
 On Mon, Sep 20, 2010 at 4:07 PM, [mRg] emar...@gmail.com wrote:
 Well I can certainly reproduce the issue, but am having trouble finding
 the
 exact sequence (annoyingly of course)
 
 1 I started with a blank VM of Ubuntu 10.10 (ext3) running on 
 virtualbox
 with latest CouchDB (apt-get install couchdb)
 2 Began adding lots of blank docs ..  (curl -X POST -H
 Content-Type:application/json http://localhost:5984/test -d {})
 3 Created a simple view (function(doc) {  emit(null, doc); }) and 
 ran
 to
 ensure it wrote to disk.
 4 I issued 2 compact commands (curl -X POST
 http://localhost:5984/test/_compact/ -H 
 Content-Type:application/json)
 5 And began adding more documents again as before ..
 
 Repeating this, essentially creating view data and compacting it about 
 10
 times resulted in 4 'snagged' files
 
 r...@ubuntu:~# lsof | grep -P 'COMMAND|/var/lib/couchdb/1.0.1'
 COMMANDPIDUSER   FD  TYPE DEVICE SIZE/OFF   NODE 
 NAME
 beam  1083 couchdb   13u  REG  252,0 4185271
 /var/lib/couchdb/1.0.1/_users.couch
 beam  1083 couchdb   14u  REG  252,0   331865275
 /var/lib/couchdb/1.0.1/.delete/4e000e0ae5cce3d275f09b4542396e85 
 (deleted)
 beam  1083 couchdb   16u  REG  252,0   602210279
 /var/lib/couchdb/1.0.1/.delete/a9ee2b96c12591455ea795fa5324dc9c 
 (deleted)
 beam  1083 couchdb   17u  REG  252,0   270434283
 /var/lib/couchdb/1.0.1/.test_design/07ca32cf9b0de9c915c5d9ce653cdca3.view
 beam  1083 couchdb   18u  REG  252,0   204898284
 /var/lib/couchdb/1.0.1/.test_design/540958c4124af3925fe467afb96f4906.view
 beam  1083 couchdb   20u  REG  252,0   405602286
 /var/lib/couchdb/1.0.1/.test_design/f26a8fcc3d2ce226a9e652338882c3db.view
 beam  1083 couchdb   21u  REG  252,0   299106287
 /var/lib/couchdb/1.0.1/.test_design/40614f8c8e1b4bab9d093881e914729d.view
 beam  1083 couchdb   22u  REG  252,0   106594285
 /var/lib/couchdb/1.0.1/.delete/31909e936ce94db7a6cede72827b18b2 
 (deleted)
 beam  1083 couchdb   24r  REG  252,0   233570288
 /var/lib/couchdb/1.0.1/.delete/83d09ec6dab90ca6078c0310085b97cc 
 (deleted)
 beam  1083 couchdb   26u  REG  252,0   163932281
 /var/lib/couchdb/1.0.1/.test_design/3486e8de398e27b8767afd4975691360.view
 beam  1083 couchdb   27w  REG  252,0   114786289
 /var/lib/couchdb/1.0.1/test.couch
 beam  1083 couchdb   28u  REG  252,0   217186280
 /var/lib/couchdb/1.0.1/.test_design/b450740e3245f89ab902db10d767a397.view
 
 .. while the bottom 2 marked with (deleted) hung around for about 20
 minutes
 they eventually disappeared ..
 
 r...@ubuntu:~# lsof | grep -P 'COMMAND|/var/lib/couchdb/1.0.1'
 COMMANDPIDUSER   FD  TYPE DEVICE SIZE/OFF   NODE 
 NAME
 beam  1083 couchdb   13u  REG

Re: CouchDb not releasing files

2010-09-25 Thread Robert Dionne



On Sep 25, 2010, at 11:15 AM, Randall Leeds wrote:

 On Sat, Sep 25, 2010 at 17:07, Robert Dionne
 dio...@dionne-associates.com wrote:
 couch_file has a close function which presumably does the right thing, but 
 it's only called from couch_db_update:terminate
 
 
 It just shuts down the process.

right, I'm just wondering how couch_db_update:terminate is ever called? Does it 
receive an EXIT message? 


 The question on the table is whether
 that will close the raw file opened by the process as well. Stephen,
 can you reproduce the issue with .couch files if you set max_open_dbs
 really low (like 2) and repeatedly access three or four databases? If
 not, then I suspect it's a ref counting issue with the view index and
 not directly related to couch_file at all.
 
 -Randall



Re: CouchDb not releasing files

2010-09-25 Thread Robert Dionne
I see, cool. I just saw the patch, was wondering how all these terminate 
functions were being called, and started refreshing my memory on these bits. 
They aren't all that intuitive.

Well it's a good idea to try it, as it's harmless and might fix the problem.

Thanks Filipe, Randall, and Paul, for the schooling


On Sep 25, 2010, at 11:57 AM, Paul Davis wrote:

 Bob,
 
 If memory serves, this thread doesn't have anything to do with the
 couch_db_updater not functioning properly. I thought all the
 unlreased files were related to views, and more specifically, view
 compaction. Ie, its quite possible that calling couch_file:stop in the
 proper place would've fixed this. So, my thought was perhaps with
 Filipe's spotting this and adding the patch, it might fix the view
 compaction leaking file descriptors issue.
 
 HTH,
 Paul Davis
 
 On Sat, Sep 25, 2010 at 11:47 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 
 
 
 On Sep 25, 2010, at 11:15 AM, Randall Leeds wrote:
 
 On Sat, Sep 25, 2010 at 17:07, Robert Dionne
 dio...@dionne-associates.com wrote:
 couch_file has a close function which presumably does the right thing, but 
 it's only called from couch_db_update:terminate
 
 
 It just shuts down the process.
 
 right, I'm just wondering how couch_db_update:terminate is ever called? Does 
 it receive an EXIT message?
 
 
 The question on the table is whether
 that will close the raw file opened by the process as well. Stephen,
 can you reproduce the issue with .couch files if you set max_open_dbs
 really low (like 2) and repeatedly access three or four databases? If
 not, then I suspect it's a ref counting issue with the view index and
 not directly related to couch_file at all.
 
 -Randall
 
 



Re: question about how write_header works

2010-09-23 Thread Robert Dionne



On Sep 23, 2010, at 12:25 PM, Robert Newson wrote:

 The idea also doesn't account for the waste in obsolete b+tree nodes.
 Basically, it's more complicated than that.
 
 Compaction is unavoidable with an append-only strategy. One idea I've
 pitched (and frankly stolen from Berkeley JE) is for the database file
 to be a series of files instead of a single one.

Bitcask takes a somewhat similar approach to JE in the use of multiple files




 If we track the used
 space in each file, we can compact any file that drops below a
 threshold (by copying the extant data to the new tail and deleting the
 old file). This is still compaction but it's no longer a wholesale
 rewrite of the database.
 
 All that said, with enough databases and some scheduling, the current
 scheme is still pretty good.
 
 B.
 
 On Thu, Sep 23, 2010 at 5:11 PM, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 On Thu, Sep 23, 2010 at 12:00 PM, chongqing xiao cqx...@gmail.com wrote:
 Hi, Paul:
 
 Thanks for the clarification.
 
 I am not sure why this is designed this way but here is one approach I
 think might work better
 
 Instead of appending the header to the data file, why not just moving
 the header to a different file. The header file can be implmented as
 before - 2 duplicate header blocks to keep it
 corruption free. For performance reason, the header file can be cached
 (say using memory mapped file).
 
 The reason I like this approache better is that for the application I
 am interested in - archiving data from relational database, the saved
 data never change. So if there is no wasted space for the old header,
 there is no need to compact the database file.
 
 Chong
 
 
 Writing the header to the data file means that the header is where the
 data is. Ie, if the header is there and intact, we can be reasonably
 sure that the data the header refers to is also there (barring weirdo
 filesystems like xfs). Using a second file descriptor per database is
 an increase of 100% in the number of file descriptors. This would very
 much affect people that have lots of active databases on a single
 node. I'm sure there are other reasons but I've not had anything to
 eat yet.
 
 Paul
 
 
 
 On Thu, Sep 23, 2010 at 8:44 AM, Paul Davis paul.joseph.da...@gmail.com 
 wrote:
 Its not appended each time data is written necessarily. There are
 optimizations to batch as many writes to the database together as
 possible as well as delayed commits which will write the header out
 every N seconds.
 
 Remember that *any* write to the database is going to look like wasted
 space. Even document deletes make the database file grow larger.
 
 When a header is written, it contains checksums of its contents and
 when reading we check that nothing has changed. There's an fsync
 before and after writing the header which also help to ensure that
 writes succeed.
 
 As to the header2 or header1 problem, if header2 appears to be
 corrupted or is otherwise discarded, the header search just continues
 through the file looking for the next valid header. In this case that
 would mean that newData2 would not be considered valid data and
 ignored.
 
 HTH,
 Paul Davis
 
 On Wed, Sep 22, 2010 at 11:51 PM, chongqing xiao cqx...@gmail.com wrote:
 Hi, Adam:
 
 Thanks for the answer.
 
 If that is how it works, that seems create a lot of wasted space
 assuming a new header has to be appended each time new data is saved.
 
 Also, assuming here is the data layout
 
 newData1   -start
 header1
 newData2
 header2  - end
 
 If header 2 is partially written, I am assuming newData will also be
 discarded. If that is the case, I am assuming there is a special flag
 in header 1 so the code can skip newData2 and find header1?
 
 I am very interested in couchdb and I think it might be a very good
 choice for archiving relational data with some minor changes.
 
 Thanks
 Chong
 
 On Wed, Sep 22, 2010 at 10:36 PM, Adam Kocoloski kocol...@apache.org 
 wrote:
 Hi Chong, that's exactly right.  Regards,
 
 Adam
 
 On Sep 22, 2010, at 10:18 PM, chongqing xiao wrote:
 
 Hi,
 
 Could anyone explain how write_header (or header) in works in couchdb?
 
 When appending new header, I am assuming the new header will be
 appended to the end of the DB file and the old header will be kept
 around?
 
 If that is the case, what will happen if the header is partially
 written? I am assuming the code will loop back and find the previous
 old header and recover from there?
 
 Thanks
 
 Chong
 
 
 
 
 
 



Re: multiview on github

2010-09-20 Thread Robert Dionne
I see, neat. 

I ask because you might treat disjunction and conjunction  differently in terms 
of whether you run around the ring or broadcast to all the nodes. For 
conjunctions you need all to succeed so broadcast might fare better whereas for 
disjunctions only one need succeed. I suppose it would depend largely on the 
number of views and the amount of each computation.

Anyway I guess I have mixed feelings about seeing this in core. I see a lot of 
folks already struggling to get their arms around working with map/reduce. It 
would make a good plugin for advanced users. Actually the ability to have 
plugins is almost there now. I have an indexer that only requires some ini file 
mods and getting the code on the classpath. I think all that's needed at this 
point is:

1. conventions for a plugins directory

2. way of specing gen_servers in order to supervise them

3. some apis around some of the internals.

I'm oversimplifying it for sure, the devils in the details and it's the kind of 
thing programmers love to argue about ad nauseum but no one wants to do it 
(myself included :)

Best,

Bob



On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:

 Bob,
 
 it is just checking that a given id participates in a view, if it
 makes it around the ring then it wins and gets streamed to the client,
 adding disjoints would be fairly simple. Currently the only way I can
 check if an id is in a view is to loop over the results of each view,
 hence each node in the ring is in its own process to keep things
 moving.
 
 A use case is two views, one that emits datetime (numeric) and another
 view that emits values, e.g. A, B, C ..., the query would then be to
 find the all documents with value A between start time and end time.
 
 Norman
 
 On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 I took another peek at this and I'm curious as to what it's doing. Is it 
 just checking that a given id participates in a view? So if it makes it 
 around the ring it wins? Or is it actually computing the result of passing 
 the doc thru all the views?
 
 If the answer is the former then would disjunction also be something one 
 might want? I'm just curious, I don't have a use case and I forget the 
 original discussion around this. I sort of think of views as a functional 
 mapping from the database to some subset. That's not entirely accurate given 
 there's this reduce phase also. So I could imagine composing views in a 
 functional way, but the same thing can be had with just a different map 
 function that is the composition.
 
 Anyway if you have a brief description of this, with a use case,  it would 
 help.
 
 Cheers,
 
 Bob
 
 
 
 
 On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
 
 Chris, James
 
 thanks for bumping this, we are using this internally at 'scale'
 (million+ keys). I want this to work for couchdb as we want to give
 back for such a great product and support this going forward, so any
 suggestions welcomed and we will test and add them to the local github
 account with the aim of getting this into trunk.
 
 Norman
 
 On Fri, Sep 17, 2010 at 7:00 PM, James Hayton theb...@purplebulldog.com 
 wrote:
 I want to use it!  I just haven't gotten around to it.  I was going to try
 and test it out this weekend and if I am able, I will certainly report back
 what I find.
 
 James
 
 On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson jch...@apache.org wrote:
 
 On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker norman.bar...@gmail.com
 wrote:
 Bob,
 
 I can and have been testing the multiview at this scale, it is ok
 (fast enough), but I think being able to test inclusion of a document
 id in a view without having to loop would be a considerable speed
 improvement. If you have any ideas let me know.
 
 
 I just want to bump this thread, as I think this is a useful feature.
 I don't expect to be able to test it in the coming weeks, but if I did
 I would. Is anyone besides Norman using this? Has anyone used it at
 scale?
 
 Cheers,
 Chris
 
 thanks,
 
 Norman
 
 On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson robert.new...@gmail.com
 wrote:
 I'm sorry, I've had no time to play with this at scale.
 
 On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker norman.bar...@gmail.com
 wrote:
 Hi,
 
 are there any more comments on this, if not can you describe the
 process (in particular how to obtain a wiki and jira account for
 couchdb which I have been unable to do) and I will start documenting
 this so we can put this into the trunk.
 
 Bob, were you able to do any more testing with large views, are there
 any suggestions on how to speed up the document id inclusion test as
 described below?
 
 thanks,
 
 Norman
 
 On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker 
 norman.bar...@gmail.com wrote:
 Bob,
 
 thanks for the feedback and for taking a look at the code. Guidelines
 on when to use a supervisor within couchdb with a gen_server would be
 appreciated, currently I have a supervisor and a gen_server

Re: multiview on github

2010-09-20 Thread Robert Dionne
Norman,

  Actually ontylog is GPL, and I wouldn't wish that code on anyone just yet. 
Think of it as the contents of my /etc directory.

  The indexer I'm chipping away at is just a proof of concept hacked up from 
Joe Armstrong's Erlang book (with his permission). Anyone is welcome to use it 
that as they see fit, though it does have restrictions from Armstrong press. 
It's been great for me to learn erlang and explore the couch internals. It's 
also nice to have something nice and light running in couch.

  My thoughts about plugins have nothing to do with licenses. I'd like the fact 
that couchdb is simple and lean and more rock solid. I'm not sure multiview, 
geocouch, fti, or any other indexers belong in the core. With multiview I think 
there's perhaps something more general that might be part of core but I haven't 
given it a lot of thought yet.

Cheers,

Bob




On Sep 20, 2010, at 7:02 PM, Norman Barker wrote:

 Bob,
 
 I can see why plugins might work for you since your ontology /
 indexing code is GPL, however I am more than happy for the multiview
 to be apache licensed and would like to see it in trunk.
 
 I like the concept of plugins as it creates a stable API for third
 parties, but I think a multiview is a core feature of CouchDB.
 
 Norman
 
 On Mon, Sep 20, 2010 at 4:19 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 I see, neat.
 
 I ask because you might treat disjunction and conjunction  differently in 
 terms of whether you run around the ring or broadcast to all the nodes. For 
 conjunctions you need all to succeed so broadcast might fare better whereas 
 for disjunctions only one need succeed. I suppose it would depend largely on 
 the number of views and the amount of each computation.
 
 Anyway I guess I have mixed feelings about seeing this in core. I see a lot 
 of folks already struggling to get their arms around working with 
 map/reduce. It would make a good plugin for advanced users. Actually the 
 ability to have plugins is almost there now. I have an indexer that only 
 requires some ini file mods and getting the code on the classpath. I think 
 all that's needed at this point is:
 
 1. conventions for a plugins directory
 
 2. way of specing gen_servers in order to supervise them
 
 3. some apis around some of the internals.
 
 I'm oversimplifying it for sure, the devils in the details and it's the kind 
 of thing programmers love to argue about ad nauseum but no one wants to do 
 it (myself included :)
 
 Best,
 
 Bob
 
 
 
 On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:
 
 Bob,
 
 it is just checking that a given id participates in a view, if it
 makes it around the ring then it wins and gets streamed to the client,
 adding disjoints would be fairly simple. Currently the only way I can
 check if an id is in a view is to loop over the results of each view,
 hence each node in the ring is in its own process to keep things
 moving.
 
 A use case is two views, one that emits datetime (numeric) and another
 view that emits values, e.g. A, B, C ..., the query would then be to
 find the all documents with value A between start time and end time.
 
 Norman
 
 On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
 dio...@dionne-associates.com wrote:
 I took another peek at this and I'm curious as to what it's doing. Is it 
 just checking that a given id participates in a view? So if it makes it 
 around the ring it wins? Or is it actually computing the result of passing 
 the doc thru all the views?
 
 If the answer is the former then would disjunction also be something one 
 might want? I'm just curious, I don't have a use case and I forget the 
 original discussion around this. I sort of think of views as a functional 
 mapping from the database to some subset. That's not entirely accurate 
 given there's this reduce phase also. So I could imagine composing views 
 in a functional way, but the same thing can be had with just a different 
 map function that is the composition.
 
 Anyway if you have a brief description of this, with a use case,  it would 
 help.
 
 Cheers,
 
 Bob
 
 
 
 
 On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
 
 Chris, James
 
 thanks for bumping this, we are using this internally at 'scale'
 (million+ keys). I want this to work for couchdb as we want to give
 back for such a great product and support this going forward, so any
 suggestions welcomed and we will test and add them to the local github
 account with the aim of getting this into trunk.
 
 Norman
 
 On Fri, Sep 17, 2010 at 7:00 PM, James Hayton theb...@purplebulldog.com 
 wrote:
 I want to use it!  I just haven't gotten around to it.  I was going to 
 try
 and test it out this weekend and if I am able, I will certainly report 
 back
 what I find.
 
 James
 
 On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson jch...@apache.org 
 wrote:
 
 On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
 norman.bar...@gmail.com
 wrote:
 Bob,
 
 I can and have been testing the multiview at this scale

Re: multiview on github

2010-08-23 Thread Robert Dionne
Hi Norman,

  I took a peek at multiview. I haven't followed this too closely on the 
mailing list but this is *view intersection*? Is there a 5 line summary of what 
this does somewhere? 

  I'm curious as to why the daemon needs to be a supervisor, most if not all of 
the other daemons are gen_servers. OTP allows this but I think this is a good 
area where some CouchDB guidelines on plugins would apply.

  It strikes me that views, the use of map/reduce, etc. are one of the trickier 
aspects of using CouchDB, particularly for new users coming from the SQL world. 
People are also reporting issues with performance of views, I guess often 
because reduce functions go out of control.

  I think the project would be better served if features like this were 
available as plugins. I would put GeoCouch in the same category. Its very neat 
and timely (given everyone wants to know where everyone else is using their 
telephone but without talking other than asynchronously), but a server plugin 
architecture that would allow this to be done cleanly should come first.

  This is just my opinion. I'd love to see some of the project founders and 
committers weigh in on this and set some direction.

Best regards,

Bob


 


On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:

 I would like to take this multiview code and have it added to trunk if
 possible, what are the next steps?
 
 thanks,
 
 Norman
 
 On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker norman.bar...@gmail.com 
 wrote:
 I have made
 
 http://github.com/normanb/couchdb
 
 which is a fork of the latest couchdb trunk with the multiview code
 and tests added.
 
 If geocouch is available then it can still be used.
 
 There are a couple of questions about the multiview on the user /dev
 list so I will be adding some more test cases during today.
 
 thanks,
 
 Norman
 
 On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker norman.bar...@gmail.com 
 wrote:
 this is possible, I forked geocouch since I use it, but I have already
 separated the geocouch dependencies from the trunk.
 
 I can do this tomorrow, certainly be interested in any feedback.
 
 thanks,
 
 Norman
 
 
 
 On Tue, Aug 17, 2010 at 7:49 PM, Volker Mische volker.mis...@gmail.com 
 wrote:
 On 08/18/2010 03:26 AM, J Chris Anderson wrote:
 
 On Aug 16, 2010, at 4:38 PM, Norman Barker wrote:
 
 Hi,
 
 I have made the changes as recommended, adding a test case
 multiview.js and also adding the userCtx to open the db.
 
 I have also forked geocouch and this is available here
 
 
 this patch seems important (especially as people are already asking for
 help using it on user@)
 
 to get it committed, it either must remove the dependency on GeoCouch, or
 become part of CouchDB when (and if) GeoCouch becomes part of CouchDB.
 
 Is it possible / useful to make a version that doesn't use GeoCouch? And
 then to make the GeoCouch capabilities part GeoCouch for now?
 
 Chris
 
 
 Hi Norman,
 
 if the patch is ready for trunk, I'd be happy to move the GeoCouch bits to
 GeoCouch itself (as GeoCouch isn't ready for trunk yet).
 
 Lately I haven't been that responsive when it comes to GeoCouch, but that
 will change (in about a month) after holidays and FOSS4G.
 
 Cheers,
  Volker
 
 
 



Re: splitting the code in different apps or rewrite httpd layer

2010-08-23 Thread Robert Dionne



On Aug 22, 2010, at 4:58 PM, Mikeal Rogers wrote:

 One idea that was floated at least once was to replace all the code currently 
 have on top of mochiweb directly with webmachine.

If I recall, Paul Davis did some prototyping work on this at one point



 
 This would make extensions and improvements follow already well defined 
 patterns provided by webmachine.
 
 -Mikeal
 
 Sent from my iPhone
 
 On Aug 20, 2010, at 2:09 AM, Benoit Chesneau bchesn...@gmail.com wrote:
 
 Hi all,
 
 I work a lot these days around the httpd code and the more I work on
 the more I think we should refactor it to make it easier to hack and
 extend.  There is indeed a lot of code in one module (couch_httpd_db)
 and recent issue like vhost and location rewriting could be easier to
 solve if we had an http layer more organized in my opinion.
 
 Actually we do (in 1.0.1 or trunk) :
 
 request - couch_httpd loop - request_handler - check vhost and
 eventually rewrite url - request_int - request_db - request
 doc|request _design | request attachment | request global handler |
 request misc handler
 
 with extra level : request_design - rewrite handler|
 show|lists|update\lview ... and request_int that catch all errors and
 has the responsibility to send errors if anything happend and wasn't
 catched on other layers.
 
 It could be easier. We could do it more resource oriented for example
 than it is. 1 module, 1 resource. Refactoring httpd code would also
 allow us to reuse more code than we do actually maybe by wrapping api.
 
 How :
 
 - Some times ago we started to port it using webmachine with davisp,
 but we didn't finish. Maybe it's a good time ? Or do we want to follow
 another way ?
 
 - If we go on this refactoring it could be also a good time to split
 couchdb in different apps : couchdb-core and couchdb foe example
 (maybe couchdb-appengine ?) so we could develop independantly each
 levels and make code history cleaner.
 
 
 Thoughts ?
 
 
 - benoit



Re: splitting the code in different apps or rewrite httpd layer

2010-08-20 Thread Robert Dionne
+1

I would change the or in the subject line to and, .ie. do both :)

I think this is an excellent idea and a good time to start this. At a 
conceptual level CouchDB is dirt simple internally. This fact and it's use of 
Erlang in my opinion should be seen as it's main advantage. One way to leverage 
that advantage is to enable programmers who want to extend couch. I know of at 
least three projects [1,2,3] that have done this. A good measure of a 
successful refactor would be how much code these projects could throw away. 

In my terminology prototype [3] I'm currently using bitcask for persistence so 
I basically only extend the HTTP front end piece and need programmatic access 
to the b-tree storage layer. All this needs to be is some sort of mapping that 
let's one run a function over the b-tree, support for ranges, and access to 
changes.

Doing this is a thankless task, anyone already deeply familiar with the 
internals would likely have little *interest* (academic, financial, etc..) in 
it. CouchDB runs on phones now and in the cloud which is awesome and of course 
a strong argument to maintain the simple design. As the complexity of the code 
base increases however, the use of Erlang becomes a barrier to entry. 

Best,

Bob

[1] http://github.com/normanb/couchdb-multiview
[2] http://github.com/vmx/couchdb
[3] http://github.com/bdionne/bitstore




On Aug 20, 2010, at 5:09 AM, Benoit Chesneau wrote:

 Hi all,
 
 I work a lot these days around the httpd code and the more I work on
 the more I think we should refactor it to make it easier to hack and
 extend.  There is indeed a lot of code in one module (couch_httpd_db)
 and recent issue like vhost and location rewriting could be easier to
 solve if we had an http layer more organized in my opinion.
 
 Actually we do (in 1.0.1 or trunk) :
 
 request - couch_httpd loop - request_handler - check vhost and
 eventually rewrite url - request_int - request_db - request
 doc|request _design | request attachment | request global handler |
 request misc handler
 
 with extra level : request_design - rewrite handler|
 show|lists|update\lview ... and request_int that catch all errors and
 has the responsibility to send errors if anything happend and wasn't
 catched on other layers.
 
 It could be easier. We could do it more resource oriented for example
 than it is. 1 module, 1 resource. Refactoring httpd code would also
 allow us to reuse more code than we do actually maybe by wrapping api.
 
 How :
 
 - Some times ago we started to port it using webmachine with davisp,
 but we didn't finish. Maybe it's a good time ? Or do we want to follow
 another way ?
 
 - If we go on this refactoring it could be also a good time to split
 couchdb in different apps : couchdb-core and couchdb foe example
 (maybe couchdb-appengine ?) so we could develop independantly each
 levels and make code history cleaner.
 
 
 Thoughts ?
 
 
 - benoit



Re: 160-* etap test failure from time to time

2010-08-18 Thread Robert Dionne
The vhosts refactoring made this issue go away. The underlying problem still 
exists in couch_config. It's a race condition




On Aug 18, 2010, at 2:01 PM, Jan Lehnardt wrote:

 
 On 16 Aug 2010, at 13:00, Benoit Chesneau wrote:
 
 So I've found why 160- test fails from time to time:
 
 Vhosts are loaded at startup when creating the mochiweb loop. So it
 only get depending how fast is your machine 1 result and maybe 0 for
 really fast machines since the ini isn't defined.
 
 this sounds like a race condition that should be fixed in the test.
 
 Which also make it
 impossible to change vhosts on the fly apparently (via POST on
 /_config).
 
 I do that all the time and it works. I'm not sure how you're drawing
 your conclusion here :)
 
 Cheers
 Jan
 -- 
 
 
 To solve that and allows us to change/add vhosts on the fly I would
 like to change the code and put all vhosts handling in a gen_server
 keeping states etc so it could be possible to changes vhosts in memory
 depending on ini change. Also it would make vhost handling pluggable
 so someone could imagine to create them via doc posted in a db.
 
 What do you think ?
 
 - benoit
 



Re: 160-* etap test failure from time to time

2010-08-18 Thread Robert Dionne
actually as things currently stand, it no longer involves 160-* at all.

the issue as I see it is here:

http://github.com/bdionne/couchdb/blob/master/src/couchdb/couch_config.erl#L124

every call to couch_config:set  will cause registered notify_funs to be 
invoked, *but*, spawned in their own pid. This implies that when they execute 
the original couch_config:set may have already returned and another set 
processed. So there's no guarantee that the registered notify_funs see the 
config state that triggered their invocation.

In this case it meant that couch_httpd was not being restarted for each config 
change to vhosts. The first would trigger a restart but by the time it's gets 
to registering itself again the state was already changed. The mochiweb upgrade 
likely slowed things enough to expose it.

Pulling the vhost stuff into it's own gen_server made the issue go away.

A similar issue happened once before with the hash_password function in 
couch_server and that's the only other place where this problem exists that I 
can see.

Perhaps a distinction needs to be made between config changes that require a 
restart and those that don't



On Aug 18, 2010, at 3:35 PM, Benoit Chesneau wrote:

 On Wed, Aug 18, 2010 at 9:29 PM, Jan Lehnardt j...@apache.org wrote:
 
 On 18 Aug 2010, at 20:17, Robert Dionne wrote:
 
 The vhosts refactoring made this issue go away. The underlying problem 
 still exists in couch_config. It's a race condition
 
 The refactoring also added a whole lot of things that are separate from this 
 issue.
 
 I recon the test could start couch_http and only issue requests once it is 
 fully up*.
 
 *I haven't looked at any code here, just handwaving.
 
 Cheers
 Jan
 --
 
 
 On *slow* machines there are other tests failing for the same reason.
 140- for example. 160- .
 
 Even f ini was just set after the couch_server_sup start the problem
 happened. More likely due to the fact that couch_httpd is loaded in
 the same time and the Loop created at this time with value currently
 in ini.
 
 imo, the configuration need some love love or, maybe just the way we
 reload couch_httpd after a change in the conf.
 
 - benoit



Re: Migrating Wiki to a CouchApp

2010-08-13 Thread Robert Dionne
nice, it is very snappy. I could see this would encourage more use.

The home link is out of sync with the FrontPage. 

Oddly I'm not able to clone the github source without first cloning it. Anyway 
the issue with the home link seems to be in profileReady/mustache.html, it 
points to index which is a WIKI page not yet created

I can certainly live without email notifications. I don't need anymore email





On Aug 13, 2010, at 1:48 AM, J Chris Anderson wrote:

 Devs,
 
 With the help of code from Sebastian Cohnen and Paul Davis, I've imported the 
 wiki currently at http://wiki.apache.org/couchdb to a CouchApp.
 
 Here it is:
 
 http://wiki.couchdb.couchdb.org/page/FrontPage
 
 The work is still preliminary. I haven't vetted all the content, and the wiki 
 software itself still needs to be polished. But I think in the long run we 
 will be better off as a project to host our wiki on CouchDB.
 
 First of all, the response time when you click a link will be faster (yay not 
 being a slow cgi script!) Second, the code is a CouchApp, so not only will we 
 all be able to help improve it, we can easily replicate the wiki offline for 
 editing, etc.
 
 In the long run it would make sense to ship a copy of the wiki with CouchDB 
 (or at least make replicating a local instance of it super-simple).
 
 There are some missing features. The most notable one that I don't plan to 
 implement, is email notifications of changes to pages. I haven't added Atom 
 feeds of recent-changes yet, but I think that can make up for the missing 
 email feature. What do you think? If email is crucial to migrating away from 
 MoinMoin, it is possible. 
 
 The other missing feature that I think is critical, is some built-in way to 
 revert to old points in the history of a page. Currently history is stored 
 but to revert would require writing some more code.
 
 Code is here for those who want to hack:
 
 http://github.com/couchone/pages
 
 Hope you are as excited about this as me!
 
 Chris
 



Re: Migrating Wiki to a CouchApp

2010-08-13 Thread Robert Dionne



On Aug 13, 2010, at 6:44 AM, Mikhail A. Pokidko wrote:

 On Fri, Aug 13, 2010 at 9:48 AM, J Chris Anderson jch...@apache.org wrote:
 Devs,
 
 With the help of code from Sebastian Cohnen and Paul Davis, I've imported 
 the wiki currently at http://wiki.apache.org/couchdb to a CouchApp.
 
 That is a kind code nationalism - you your own code )))
 
 The other missing feature that I think is critical, is some built-in way to 
 revert to old points in the history of a page. Currently history is stored 
 but to revert would require writing some more code.
 
 You want exactly reverting? What about just moving HEAD version?

I suppose the new Github wikis readily support that as they are git backed.




 
 Hope you are as excited about this as me!
 
 Definitely yes!
 
 
 -- 
 xmpp: pma AT altlinux DOT org



Re: Migrating Wiki to a CouchApp

2010-08-13 Thread Robert Dionne




On Aug 13, 2010, at 8:00 AM, Noah Slater wrote:

 
 
 (or at least make replicating a local instance of it super-simple).
 
 
 Bingo.
 
 We could form it like a tutorial even. When you first install CouchDB, you go 
 to Futon, and there's a documentation section. It explains to you that the 
 project wiki is hosted on CouchDB, and you can replicate from it. It explains 
 some core concepts, and gives you a big fat button to press. You press it, 
 the official wiki is replicated to your local machine. You could even make 
 edits and replicate back. A perfect way to show off some of CouchDB's 
 strengths.
 

+1 - great idea

Re: Migrating Wiki to a CouchApp

2010-08-13 Thread Robert Dionne
 
 
 The home link is out of sync with the FrontPage. 
 
 Thanks. I'll think about how to fix that. I'd like to avoid deploying the 
 CouchDB version of the wiki as a fork of the basic Pages codebase, so maybe 
 it's worth it to rename FrontPage to index, and put a pointer (or redirect) 
 to index on the FrontPage.
 
 
 Oddly I'm not able to clone the github source without first cloning it.
 
 You broke my brain. Come again?

sorry, I should have said forking When I tried to git clone your repo locally 
it kept failing, so I forked in github and cloned that.

I have a background in logic so I tend to use a lot of Yogi Beraisms :)



Re: [VOTE] Apache CouchDB 1.0.1 release, second round

2010-08-11 Thread Robert Dionne
+1

all etaps, futon tests pass, make distcheck

OS X 10.6.4, Erlang R13B04



On Aug 11, 2010, at 3:15 PM, Noah Slater wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 1.0.1 release, second round.
 
 Changes since the last round:
 
  * Fix data corruption bug COUCHDB-844. Please see 
 http://couchdb.apache.org/notice/1.0.1.html for details.
 
 We have added a regression test to prevent this problem from happening again.
 
 We encourage the whole community to download and test these release artifacts 
 so that any critical issues can be resolved before the release is made. 
 Everyone is free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
  http://people.apache.org/~nslater/dist/1.0.1/
 
 These artifacts have been built from the 1.0.1 tag in Subversion:
 
  http://svn.apache.org/repos/asf/couchdb/tags/1.0.1/
 
 Happy voting,
 
 N
 



Re: [VOTE] Apache CouchDB 0.11.2 release, first round

2010-08-08 Thread Robert Dionne
+1

OS X 10.6
R13B04

all tests, make check, Futon, make distcheck --- pass




On Aug 8, 2010, at 9:43 AM, Noah Slater wrote:

 Can someone else test this release? We only have three votes so far.
 
 On 7 Aug 2010, at 11:47, Jan Lehnardt wrote:
 
 
 On 6 Aug 2010, at 13:36, Noah Slater wrote:
 
 Hello,
 
 I would like call a vote for the Apache CouchDB 0.11.2 release, first round.
 
 We encourage the whole community to download and test these release 
 artifacts so that any critical issues can be resolved before the release is 
 made. Everyone is free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~nslater/dist/0.11.2/
 
 These artifacts have been built from the 0.11.2 tag in Subversion:
 
 http://svn.apache.org/repos/asf/couchdb/tags/0.11.2/
 
 Happy voting,
 
 +1
 
 Cheers
 Jan
 -- 
 
 



Re: Data loss

2010-08-08 Thread Robert Dionne
I would also consider removing the download link for 1.0.0 and not depend on 
users patching it. It's broken.

I have to believe there are users who won't and who won't read the red sign. 
There's a good probability these are the kinds of users who will also be the 
most upset by data loss




On Aug 8, 2010, at 3:06 PM, Jan Lehnardt wrote:

 
 On 8 Aug 2010, at 18:37, J Chris Anderson wrote:
 
 Devs,
 
 I have started a document which we will use when announcing the bug. I plan 
 to move the document from this wiki location to the 
 http://couchdb.apache.org site before the end of the day. Please review and 
 edit the document before then.
 
 http://wiki.couchone.com/page/post-mortem
 
 I have a section called The Bug which needs a technical description of the 
 error and the fix. I'm hoping Adam or Randall can write this, as they are 
 most familiar with the issues.
 
 Once it is ready, we should do our best to make sure our users get a chance 
 to read it.
 
 I made a few more minor adjustments (see page history when you are logged in) 
 and have nothing more to add myself, but I'd appreciate if Adam or Randall 
 could add a few more tech bits.
 
 --
 
 In the meantime, I've put up a BIG FAT WARNING on the CouchDB downloads page: 
  
 
  http://couchdb.apache.org/downloads.html
 
 I plan to update the warning with a link to the post-mortem once that is done.
 
 --
 
 Thanks everybody for being on top of this!
 
 Cheers
 Jan
 -- 
 
 
 
 
 Thanks,
 Chris
 
 On Aug 8, 2010, at 5:16 AM, Robert Newson wrote:
 
 That was also Adam's conclusion (data loss bug confined to 1.0.0).
 
 B.
 
 On Sun, Aug 8, 2010 at 1:10 PM, Jan Lehnardt j...@apache.org wrote:
 
 On 8 Aug 2010, at 13:48, Noah Slater wrote:
 
 Do we need to abort 0.11.2 as well?
 
 0.11.x does not have this commit as far as I can see.
 
 Cheers
 Jan
 --
 
 
 On 8 Aug 2010, at 11:45, Jan Lehnardt wrote:
 
 
 On 8 Aug 2010, at 06:35, J Chris Anderson wrote:
 
 
 On Aug 7, 2010, at 8:45 PM, Dave Cottlehuber wrote:
 
 is this serious enough to justify pulling current 1.0.0 release
 binaries to avoid further installs putting data at risk?
 
 
 I'm not sure what Apache policy is about altering a release after the 
 fact. It's probably up to use to decide what to do.
 
 Altering releases are a no-no. The only real procedure is to release a 
 new version and deprecate the old one, while optionally keeping it 
 around for posterity.
 
 
 Probably as soon as 1.0.1 is available we should pull the 1.0.0 release 
 off of the downloads page, etc.
 
 +1.
 
 I also think we should do a post-mortem blog post announcing the issue 
 and the remedy, as well as digging into how we can prevent this sort of 
 thing in the future.
 
 We should make an official announcement before the end of the weekend, 
 with very clear steps to remedy it. (Eg: config delayed_commits to 
 false *without restarting the server* etc)
 
 I think so, too.
 
 Cheers
 Jan
 --
 
 
 
 On 8 August 2010 15:08, Randall Leeds randall.le...@gmail.com wrote:
 Yes. Adam already back ported it.
 
 Sent from my interstellar unicorn.
 
 On Aug 7, 2010 8:03 PM, Noah Slater nsla...@apache.org wrote:
 
 Time to abort the vote then?
 
 I'd like to get this fix into 1.0.1 if possible.
 
 
 On 8 Aug 2010, at 02:28, Damien Katz wrote:
 
 Thanks.
 
 Anyone up to create a repair tool for w...
 
 
 
 
 
 
 
 



Re: [jira] Created: (COUCHDB-831) badarity

2010-07-22 Thread Robert Dionne
looks like you're missing the view bar  ?




On Jul 22, 2010, at 9:29 AM, Harry Vangberg (JIRA) wrote:

 badarity
 
 
 Key: COUCHDB-831
 URL: https://issues.apache.org/jira/browse/COUCHDB-831
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.0
 Environment: mac os x
Reporter: Harry Vangberg
 
 
 I have an empty database with nothing but this design document:
 
 {
   _id: _design/foo,
   _rev: 1-19b6ac05cd5e878bbe8193c3fbce57bb,
   language: javascript,
   views: {
   foo: {
   map: function(doc) {emit(1,2);}
   }
   }
 }
 
 Which fails miserably. 
 
 $ curl http://127.0.0.1:5984/argh/_design/foo/_views/bar
 [Thu, 22 Jul 2010 13:27:53 GMT] [error] [0.1015.0] Uncaught error in HTTP 
 request: {error,
 {badarity,
  {#Funcouch_httpd_db.5.100501499,
   [{httpd,
 {mochiweb_request,#Port0.2266,'GET',
  /argh/_design/foo/_views/bar,
  {1,1},
  {3,
   {user-agent,
{'User-Agent',
 curl/7.19.7 
 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3},
{host,
 {'Host',127.0.0.1:5984},
 {accept,{'Accept',*/*},nil,nil},
 nil},
nil}}},
 127.0.0.1,'GET',
 [argh,_design,foo,
  _views,bar],
 {dict,5,16,16,8,80,48,
  {[],[],[],[],[],[],[],[],[],[],[],[],[],
   [],[],[]},
  {{[[_design|
  #Funcouch_httpd.8.61263750]],
[],
[[_view_cleanup|
  #Funcouch_httpd.8.61263750]],
[],[],[],[],[],
[[_compact|
  #Funcouch_httpd.8.61263750]],
[],[],
[[_temp_view|
  #Funcouch_httpd.8.61263750]],
[[_changes|
  #Funcouch_httpd.8.61263750]],
[],[],[]}}},
 {user_ctx,null,
  [_admin],
  {couch_httpd_auth, 
 default_authentication_handler}},
 undefined,
 {dict,6,16,16,8,80,48,
  {[],[],[],[],[],[],[],[],[],[],[],[],[],
   [],[],[]},
  {{[],
[[_show|
  #Funcouch_httpd.10.132977763]],
[[_info|
  #Funcouch_httpd.10.132977763],
 [_list|
  #Funcouch_httpd.10.132977763]],
[[_update|
  #Funcouch_httpd.10.132977763]],
[],[],[],[],[],
[[_rewrite|
  #Funcouch_httpd.10.132977763]],
[],[],[],
[[_view|
  #Funcouch_httpd.10.132977763]],
[],[]}}},
 undefined,#Funcouch_httpd.6.96187723,
 {dict,13,16,16,8,80,48,
  {[],[],[],[],[],[],[],[],[],[],[],[],[],
   [],[],[]},
  {{[[_restart|
  #Funcouch_httpd.6.96187723],
 [_replicate|
  #Funcouch_httpd.6.96187723]],
[[_active_tasks|
  #Funcouch_httpd.6.96187723]],
[],[],

Re: [VOTE] Apache CouchDB 1.0.0 release, second round

2010-07-09 Thread Robert Dionne
+1 

OS X 10.6
Erlang R13B04
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.10) 
Gecko/20100504 Firefox/3.5.10

make distcheck is ok

all Futon test pass. Note: my Firefox is slightly below Sebastian's

all tests pass in ./test/javascript/run *except*

not ok 25 form_submit false
not ok 43 replication ReferenceError: $ is not defined


On Jul 9, 2010, at 12:55 PM, Noah Slater wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 1.0.0 release, second round.
 
 Changes in this round:
 
   * Fixed various leftovers from internal refactoring
 
 We encourage the whole community to download and test these release artifacts 
 so
 that any critical issues can be resolved before the release is made. Everyone 
 is
 free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~nslater/dist/1.0.0/
 
 These artifacts have been built from the 1.0.0 tag in Subversion:
 
 http://svn.apache.org/repos/asf/couchdb/tags/1.0.0/
 
 Happy voting,
 
 N



Re: [VOTE] Apache CouchDB 1.0.0 release, first round

2010-07-08 Thread Robert Dionne
+1 

OS X 10.6
Erlang R13B04

make discheck fine
Futon tests pass in FF
Futon tests hang in attachments on Safari5 and Chrome

ship it!




On Jul 7, 2010, at 7:38 PM, Noah Slater wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 1.0.0 release, first round.
 
 We encourage the whole community to download and test these release artifacts 
 so
 that any critical issues can be resolved before the release is made. Everyone 
 is
 free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
 http://people.apache.org/~nslater/dist/1.0.0/
 
 These artifacts have been built from the 1.0.0 tag in Subversion:
 
 http://svn.apache.org/repos/asf/couchdb/tags/1.0.0/
 
 Happy voting,
 
 N



Re: 1.0 Vote

2010-07-02 Thread Robert Dionne
OS X 10.6.4
Erlang 13B04

make distcheck is fine


On Jul 2, 2010, at 7:13 PM, Jan Lehnardt wrote:

 
 On 29 Jun 2010, at 16:38, Noah Slater wrote:
 
 
 On 29 Jun 2010, at 15:20, J Chris Anderson wrote:
 
 So I went through both trunk and 0.11.x looking for things that are out of 
 place. I fixed one small thing in 0.11.x, and as far as I'm concerned it is 
 ready for release.
 
 For trunk, I think there are a couple of small patches that Adam wants to 
 hold back for 1.1. There is also the Windows stuff, which looks like we 
 should wait for, before cutting 1.0.
 
 I am waiting for a go command, so just let me know.
 
 Please can everyone check that make distcheck is working for them.
 
 Let's try to avoid the test failures again for this release.
 
 Works for me on 0.11.x and trunk and Mac OS X 10.6.4 and Ubuntu Karmic.
 
 Would love to see more people sending in results here :)
 
 Cheers
 Jan
 -- 
 



Re: replicator test hanging

2010-06-10 Thread Robert Dionne
same here, I can reproduce it every time on OS X with chrome.

Oddly for me, it work when I do a run all




On Jun 10, 2010, at 5:41 AM, Filipe David Manana wrote:

 I have the problem in non-SSD machines, both Linux and OS X
 
 On Thu, Jun 10, 2010 at 10:39 AM, Jan Lehnardt j...@apache.org wrote:
 
 Hi Paul,
 
 thanks for the report. Out of curiosity, are you running an SSD drive in
 the box that reproduces the hangs?
 
 And anyone: Can you reproduce this on non-SSD machines?
 
 Cheers
 Jan
 --
 
 On 10 Jun 2010, at 02:26, Paul Bonser wrote:
 
 Oh, I should also mention that I got the exact same error in multiple
 freezes. Twice it was in the same exact order, and once it was in this
 order:
 
 [info] [0.95.0] starting replication 15c25eda4ea6308af6bea9864d5319ef
 at
 0.1845.0
 [debug] [0.1207.0] OAuth Params: [{att_encoding_info,true}]
 [info] [0.1207.0] 127.0.0.1 - - 'GET'
 /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
 [debug] [0.1207.0] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
 Headers: [{'Accept',application/json},
 {'Accept-Encoding',gzip},
 {'Content-Length',167},
 {'Host',localhost:5985},
 {'User-Agent',CouchDB/0.12.0a953193},
 {X-Couch-Full-Commit,false}]
 [debug] [0.1207.0] OAuth Params: []
 [info] [0.1207.0] 127.0.0.1 - - 'POST'
 /test_suite_rep_docs_db_b/_bulk_docs 201
 [debug] [0.1076.0] 'GET'
 /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
 Headers: [{'Accept',application/json},
 {'Accept-Encoding',gzip},
 {'Host',localhost:5985},
 {'User-Agent',CouchDB/0.12.0a953193}]
 [debug] [0.1076.0] OAuth Params: [{att_encoding_info,true}]
 [debug] [0.1076.0] Minor error in HTTP request: {not_found,missing}
 [debug] [0.1076.0] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
{couch_httpd_db,db_doc_req,3},
{couch_httpd_db,do_db_req,2},
{couch_httpd,handle_request_int,5},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]
 [info] [0.1076.0] 127.0.0.1 - - 'GET'
 /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
 [debug] [0.1076.0] httpd 404 error response:
 {error:not_found,reason:missing}
 
 
 Could it be some sort of race condition?
 
 
 
 On Wed, Jun 9, 2010 at 8:22 PM, Paul Bonser mister...@gmail.com wrote:
 
 
 
 On Wed, Jun 9, 2010 at 7:33 PM, J Chris Anderson jch...@apache.org
 wrote:
 
 Devs,
 
 Is anyone else seeing the replicator test hang and never finish?
 
 It never hangs the first few runs, but after running ten or so times,
 I'll
 end up with the test suite waiting for a replication that never
 finishes.
 It's the same story on 0.11.0, 0.11.x, and trunk.
 
 Is anyone else able to reproduce this? Am I crazy?
 
 
 It just froze for me on the first try, using 0.12.0a935298, then ran
 successfully 3 times, then froze again. The last thing logged the first
 time
 was a _bulk_docs requests, the last thing logged this time was a PUT to
 /test_suite_db_b/_local%2F6598a76aa55cd39645e4730b4cb28c00
 
 I'm running a Firefox 3.6 nightly build on Linux. For me, it froze the
 first time when I did a run all and the second time when just directly
 running the replication test.
 
 After svn up-ing to the latest in trunk, it froze on the first try,
 directly running the replication test.
 
 Here's the debug output for the last _replicate request where it
 freezes.
 It's requesting a document that isn't there.
 
 
 [info] [0.95.0] starting new replication
 15c25eda4ea6308af6bea9864d5319ef at 0.848.0
 [debug] [0.191.0] 'GET'
 /test_suite_rep_docs_db_a/foo2?att_encoding_info=true {1,1}
 Headers: [{'Accept',application/json},
 {'Accept-Encoding',gzip},
 {'Host',localhost:5985},
 {'User-Agent',CouchDB/0.12.0a953193}]
 [debug] [0.191.0] OAuth Params: [{att_encoding_info,true}]
 [info] [0.191.0] 127.0.0.1 - - 'GET'
 /test_suite_rep_docs_db_a/foo2?att_encoding_info=true 200
 [debug] [0.189.0] 'GET'
 /test_suite_rep_docs_db_a/foo666?att_encoding_info=true {1,1}
 Headers: [{'Accept',application/json},
 {'Accept-Encoding',gzip},
 {'Host',localhost:5985},
 {'User-Agent',CouchDB/0.12.0a953193}]
 [debug] [0.189.0] OAuth Params: [{att_encoding_info,true}]
 [debug] [0.189.0] Minor error in HTTP request: {not_found,missing}
 [debug] [0.189.0] Stacktrace: [{couch_httpd_db,couch_doc_open,4},
{couch_httpd_db,db_doc_req,3},
{couch_httpd_db,do_db_req,2},
{couch_httpd,handle_request_int,5},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]
 [info] [0.189.0] 127.0.0.1 - - 'GET'
 /test_suite_rep_docs_db_a/foo666?att_encoding_info=true 404
 [debug] [0.189.0] httpd 404 error response:
 {error:not_found,reason:missing}
 
 [debug] [0.191.0] 'POST' /test_suite_rep_docs_db_b/_bulk_docs {1,1}
 Headers: [{'Accept',application/json},
 {'Accept-Encoding',gzip},
 {'Content-Length',167},
 {'Host',localhost:5985},
 

Re: replicator test hanging

2010-06-10 Thread Robert Dionne





On Jun 10, 2010, at 1:29 PM, J Chris Anderson wrote:

 
 On Jun 10, 2010, at 10:27 AM, Adam Kocoloski wrote:
 
 Thanks Paul!  Good sleuthing.  We'll get it fixed,
 
 
 I believe Filipe Manana has a fix for the replicator hang. He's told me he's 
 having trouble with his emails getting rejects as spam by d...@.

must be all those patches he's been attaching :)


 
 Chris
 
 



Re: _replicator DB

2010-05-19 Thread Robert Dionne
This sounds like a good approach, if I get the gist of it, it makes the 
replication state persistent. We also have a _users db now, is this a good time 
to think about consolidating and having one _system database ? 

Good stuff,

Bob




On May 19, 2010, at 5:31 AM, Filipe David Manana wrote:

 Dear all,
 
 I've been working on the _replicator DB along with Chris. Some of you have
 already heard about this DB in the mailing list, IRC, or whatever. Its
 purpose:
 
 - replications can be started by adding a replication document to the
 replicator DB _replicator (its name can be configured in the .ini files)
 
 - replication documents are basically the same JSON structures that we
 currently use when POSTing to _replicate/  (and we can give them an
 arbitrary id)
 
 - to cancel a replication, we simply delete the replication document
 
 - after the replication is started, the replicator adds the field state to
 the replication document with value triggered
 
 - when the replication finishes (for non continuous replications), the
 replication sets the doc's state field to completed
 
 - if an error occurs during a replication, the corresponding replication
 document will have the state field set to error
 
 - after detecting that an error was found, the replication is restarted
 after some time (10s for now, but maybe it should be configurable)
 
 - after a server restart/crash, CouchDB will remember replications and will
 restart them (this is specially useful for continuous replications)
 
 - in the replication document we can define a user_ctx property, which
 defines the user name and/or role(s) under which the replication will
 execute
 
 
 
 Some restrictions regarding the _replicator DB:
 
 - only server admins can add and delete replication documents
 
 - only the replicator itself can update replication documents - this is to
 avoid having race conditions between the replicator and server admins trying
 to update replication documents
 
 - the above point implies that to change a replication you have to add a new
 replication document
 
 All this restrictions are in replicator DB design doc -
 http://github.com/fdmanana/couchdb/blob/replicator_db/src/couchdb/couch_def_js_funs.hrlhttp://github.com/fdmanana/couchdb/blob/_replicator_db/src/couchdb/couch_def_js_funs.hrl
 
 
 The code is fully working and is located at:
 http://github.com/fdmanana/couchdb/tree/replicator_db
 
 It includes a comprehensive JavaScript test case.
 
 Feel free to try it and give your feedback. There are still some TODOs as
 comments in the code, so it's still subject to changes.
 
 
 For people more involved with CouchDB internals and development:
 
 That branch breaks the stats.js test and, occasionally, the
 delayed_commits.js tests.
 
 It breaks stats.js because:
 
 - internally CouchDB uses the _changes API to be aware of the
 addition/update/deletion of replication documents to/from the _replicator
 DB. The _changes implementation constantly opens and closes the DB (opens
 are triggered by a gen_event). This affects the stats open_databases and
 open_os_files.
 
 It breaks delayed_commits.js  occasionally because:
 
 - by listening to _replicator DB changes an  extra file descriptor is used
 which affects the max_open_dbs config parameter. This parameter is related
 to the max number of user opened DBs. This causes the error {error,
 all_dbs_active} (from couch_server.erl) during the execution of
 delayed_commits.js (as well as stats.js).
 
 I also have another branch that fixes these issues in a dirty way:
 http://github.com/fdmanana/couchdb/tree/_replicator_db  (has a big comment
 in couch_server.erl explaining the hack)
 
 Basically it doesn't increment stats for the _replicator DB and bypasses the
 max_open_dbs when opening _replicator DB as well as doesn't allow it to be
 closed in favour of a user requested DB (like it assigned it a +infinite LRU
 time to this DB).
 
 Sometimes (although very rarely) I also get the all_dbs_active error when
 the authentication handlers are executing (because they open the _users DB).
 This is not originated by my _replicator DB code at all, since I get it with
 trunk as well.
 
 I would also like to collect feedback about what to do regarding this 2
 issues, specially max_open_dbs. Somehow I feel that no matter how many user
 DBs are open, it should always be possible to open the _replicator DB
 internally (and the _users DB).
 
 
 cheers
 
 
 -- 
 Filipe David Manana,
 fdman...@gmail.com
 
 Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.



Re: [VOTE] Apache CouchDB 0.10.2 release, first round

2010-04-10 Thread Robert Dionne
+1

OS X 10.6.3
Erlang R13B03
FF : all tests pass



On Apr 7, 2010, at 11:39 AM, Noah Slater wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 0.10.2 release, first round.
 
 We encourage the whole community to download and test these release artifacts 
 so that any critical issues can be resolved before the release is made. 
 Everyone is free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
   http://people.apache.org/~nslater/dist/0.10.2/
 
 These artifacts have been built from the 0.10.2 tag in Subversion:
 
   http://svn.apache.org/repos/asf/couchdb/tags/0.10.2/
 
 Happy voting,
 
 N



Re: (lack of) couchdb windows binaries

2010-04-01 Thread Robert Dionne
I tend to agree with you on this. If, for example, you look at Eclipse you can 
see it's capable of using multiple versions of Java that might be installed on 
the same box. Many installers bundle their own JRE precisely to insure they get 
things right. However, the current level of Erlang and CouchDB I'd go with the 
latter approach for now. It sounds like you have larger fish to fry still.

Best,

Bob




On Apr 1, 2010, at 7:51 AM, Mark Hammond wrote:

 Just to follow up on a bit of this:
 
 On 1/04/2010 10:09 PM, Carl McDade wrote:
 Wampserver, Xammp etc. But it does not appear to do this yet. So while YAWS,
 Ejabbard and other software would be running on single instance of Erlang.
 guaranteeing use of single version, CouchDB might be running on a different
 version.
 
 It appears the ejabberd installer for windows takes the same approach as us - 
 the webpage says The installers contain all the libraries and dependencies 
 needed to run ejabberd and indeed, a copy of the erlang runtime and binaries 
 are installed directly in the ejabberd directory - ie, it appears to not 
 offer installing into an already installed erlang binary distribution either.
 
 Cheers,
 
 Mark



Re: [jira] Commented: (COUCHDB-716) CouchDB fails to start, just hangs.

2010-03-30 Thread Robert Dionne
thanks for thinking of me, I'm getting new glasses next week anyway so I should 
be ok.




On Mar 30, 2010, at 7:18 AM, Benoit Chesneau wrote:

 On Tue, Mar 30, 2010 at 12:45 PM, Robert Dionne
 dio...@dionne-associates.com wrote:
 I'm curious if it's starting the dependent apps. It might be a bad build. 
 Can you try:
 
 couch_app:start(_,[]).
 
 That's an underscore and a period at the end
 
 
 you should change your font if you need to explain ;)



Re: [VOTE] Apache CouchDB 0.11.0 release, second round

2010-03-23 Thread Robert Dionne
+1

OSX 10.6.2
Erlang R13B03

make check all pass
FF all pass
Safari -- couple of glitches





On Mar 23, 2010, at 2:49 PM, Noah Slater wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 0.11.0 release, second round.
 
 Changes since the last round:
 
   * Build system now supports OS X for release preparation. 
 
 We encourage the whole community to download and test these release artifacts 
 so that any critical issues can be resolved before the release is made. 
 Everyone is free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
   http://people.apache.org/~nslater/dist/0.11.0/
 
 These artifacts have been built from the 0.11.0 tag in Subversion:
 
   http://svn.apache.org/repos/asf/couchdb/tags/0.11.0/
 
 Happy voting,
 
 N



Re: [VOTE] Apache CouchDB 0.11.0 release, first round

2010-03-22 Thread Robert Dionne
+1

OS X 10.6.2
Erlang: R13B03
all test pass in make check
All Futon tests pass in FF and Safari and ./test/javascript/run

changes.js fails in Chrome with known browser detection issue





On Mar 22, 2010, at 11:54 AM, Noah Slater wrote:

 Hello,
 
 I would like call a vote for the Apache CouchDB 0.11.0 release, first round.
 
 We encourage the whole community to download and test these release artifacts 
 so that any critical issues can be resolved before the release is made. 
 Everyone is free to vote on this release, so get stuck in!
 
 We are voting on the following release artifacts:
 
   http://people.apache.org/~nslater/dist/0.11.0/
 
 These artifacts have been built from the 0.11.0 tag in Subversion:
 
   http://svn.apache.org/repos/asf/couchdb/tags/0.11.0/
 
 Happy voting,
 
 N



Re: Test suite blocking release

2010-03-21 Thread Robert Dionne
On Mar 21, 2010, at 4:00 AM, Jan Lehnardt wrote:

 
 On 20 Mar 2010, at 20:06, Paul Davis wrote:
 
 On Sat, Mar 20, 2010 at 2:31 PM, Noah Slater nsla...@me.com wrote:
 I think faulty test case should block the release, if I am to have any
 future sanity preparing releases. I don't want to delay and longer, so if
 you guys are absolutely sure this is a test error and not code error, then I
 propose that the test be commented out. Our tests form a contract between
 us, internally, and our users. If that contract has a bug, it should be
 removed or fixed - or it simply dilutes the importance of contract. If some
 one comments out the test, and we agree it is not indicative of an important
 bug, I can call the vote within hours.
 
 
 I'd have to agree on this. From the point of view of a release, if a
 test reports a failure then it should be made to not report a failure.
 If that's accomplished by disabling it, then there will be a commit
 with a message that explains why it was disabled and etc and such on
 and so forth.
 
 I'd do that if the test was failing for me :)

it's not failing for you when you run changes.js with the CLI ?  Fails for me 
every time. 

Anyway I poked at this a bit yesterday and am not 100% sure the issue is in the 
test. I tried putting a sleep in with no luck. If my understanding of the JS is 
correct, CouchDB is supposed to be synchronous so it's not timing.

If someone could comment on the test itself it would be helpful. The section of 
the code that fails:

// changes get all_docs style with deleted docs
  var doc = {a:1};
  db.save(doc);
  db.deleteDoc(doc);
  var req = CouchDB.request(GET, 
/test_suite_db/_changes?filter=changes_filter/bopstyle=all_docs);
  var resp = JSON.parse(req.responseText);
  TEquals(3, resp.results.length, should return matching rows);


seems odd to me. all_docs as I read the code will return docs with deletes and 
conflicts but in this call the filter bop will not apply to the doc {a:1} so 
I'm not sure what this delete prior to the call is about. Anyway I can make it 
fail in the debugger so perhaps I can find the root cause.




 
 Cheers
 Jan
 --
 



Re: Test suite blocking release

2010-03-21 Thread Robert Dionne




On Mar 21, 2010, at 1:16 PM, Jan Lehnardt wrote:

 
 On 21 Mar 2010, at 12:10, Noah Slater wrote:
 
 What are the CLI tests, if not the etap tests? Are they integrated into the 
 build system?
 
 The CLI tests are the same as the browser tests, just run through our couchjs 
 binary
 that has custom HTTP extensions to make the xhr work. At this point I don't 
 think it
 is reliable enough to mimic browser behaviour and that we shouldn't use it 
 for vetting
 the state of the code.

This is likely true, but in this particular case I think there's a bug in the 
changes code (that I'm trying to dig out). It's nice that it works on your 
machine but on my machine, using FF, it fails often enough. Moreover it's been 
around for a long time now so I figure it's worth figuring out. 

I don't have a dog in this fight (.ie. a paying customer) so this failure 
doesn't bother me. With respect to policy, given the various bogocities of 
browsers, I'd recommend something like these CLI tests plus the etaps ought to 
be the official  tests for vetting, and part of the build


 
 It is very useful when developing new code to not have to switch to and 
 reload the
 browser over and over again.
 
 Cheers
 Jan
 --
 
 
 
 
 
 On 21 Mar 2010, at 17:05, Jan Lehnardt wrote:
 
 
 On 21 Mar 2010, at 06:04, Robert Dionne wrote:
 
 On Mar 21, 2010, at 4:00 AM, Jan Lehnardt wrote:
 
 
 On 20 Mar 2010, at 20:06, Paul Davis wrote:
 
 On Sat, Mar 20, 2010 at 2:31 PM, Noah Slater nsla...@me.com wrote:
 I think faulty test case should block the release, if I am to have any
 future sanity preparing releases. I don't want to delay and longer, so 
 if
 you guys are absolutely sure this is a test error and not code error, 
 then I
 propose that the test be commented out. Our tests form a contract 
 between
 us, internally, and our users. If that contract has a bug, it should be
 removed or fixed - or it simply dilutes the importance of contract. If 
 some
 one comments out the test, and we agree it is not indicative of an 
 important
 bug, I can call the vote within hours.
 
 
 I'd have to agree on this. From the point of view of a release, if a
 test reports a failure then it should be made to not report a failure.
 If that's accomplished by disabling it, then there will be a commit
 with a message that explains why it was disabled and etc and such on
 and so forth.
 
 I'd do that if the test was failing for me :)
 
 it's not failing for you when you run changes.js with the CLI ?  Fails for 
 me every time. 
 
 I don't consider the CLI tests as part of the official test suite just yet.
 
 Cheers
 Jan
 --
 
 
 Anyway I poked at this a bit yesterday and am not 100% sure the issue is 
 in the test. I tried putting a sleep in with no luck. If my understanding 
 of the JS is correct, CouchDB is supposed to be synchronous so it's not 
 timing.
 
 If someone could comment on the test itself it would be helpful. The 
 section of the code that fails:
 
 // changes get all_docs style with deleted docs
 var doc = {a:1};
 db.save(doc);
 db.deleteDoc(doc);
 var req = CouchDB.request(GET, 
 /test_suite_db/_changes?filter=changes_filter/bopstyle=all_docs);
 var resp = JSON.parse(req.responseText);
 TEquals(3, resp.results.length, should return matching rows);
 
 
 seems odd to me. all_docs as I read the code will return docs with deletes 
 and conflicts but in this call the filter bop will not apply to the doc 
 {a:1} so I'm not sure what this delete prior to the call is about. Anyway 
 I can make it fail in the debugger so perhaps I can find the root cause.
 
 
 
 
 
 Cheers
 Jan
 --
 
 
 
 
 



Re: Test suite blocking release

2010-03-21 Thread Robert Dionne
Ok Noah,  This is only a test case issue, and not in the changes code as I 
though. Jan found the issue and it works fine for me now in both FF and CLI. -- 
Bob


On Mar 21, 2010, at 1:30 PM, Jan Lehnardt wrote:

 
 On 21 Mar 2010, at 12:24, Robert Dionne wrote:
 
 
 
 
 
 On Mar 21, 2010, at 1:16 PM, Jan Lehnardt wrote:
 
 
 On 21 Mar 2010, at 12:10, Noah Slater wrote:
 
 What are the CLI tests, if not the etap tests? Are they integrated into 
 the build system?
 
 The CLI tests are the same as the browser tests, just run through our 
 couchjs binary
 that has custom HTTP extensions to make the xhr work. At this point I don't 
 think it
 is reliable enough to mimic browser behaviour and that we shouldn't use it 
 for vetting
 the state of the code.
 
 This is likely true, but in this particular case I think there's a bug in 
 the changes code (that I'm trying to dig out). It's nice that it works on 
 your machine but on my machine, using FF, it fails often enough. Moreover 
 it's been around for a long time now so I figure it's worth figuring out. 
 
 I don't have a dog in this fight (.ie. a paying customer) so this failure 
 doesn't bother me. With respect to policy, given the various bogocities of 
 browsers, I'd recommend something like these CLI tests plus the etaps ought 
 to be the official  tests for vetting, and part of the build
 
 Not that I disagree, but part (most?) of the appeal of the browser based 
 tests are that they run in a real-world client instead of the lab that is 
 couchjs+http :)
 
 Cheers
 Jan
 --
 
 
 
 
 It is very useful when developing new code to not have to switch to and 
 reload the
 browser over and over again.
 
 Cheers
 Jan
 --
 
 
 
 
 
 On 21 Mar 2010, at 17:05, Jan Lehnardt wrote:
 
 
 On 21 Mar 2010, at 06:04, Robert Dionne wrote:
 
 On Mar 21, 2010, at 4:00 AM, Jan Lehnardt wrote:
 
 
 On 20 Mar 2010, at 20:06, Paul Davis wrote:
 
 On Sat, Mar 20, 2010 at 2:31 PM, Noah Slater nsla...@me.com wrote:
 I think faulty test case should block the release, if I am to have any
 future sanity preparing releases. I don't want to delay and longer, 
 so if
 you guys are absolutely sure this is a test error and not code error, 
 then I
 propose that the test be commented out. Our tests form a contract 
 between
 us, internally, and our users. If that contract has a bug, it should 
 be
 removed or fixed - or it simply dilutes the importance of contract. 
 If some
 one comments out the test, and we agree it is not indicative of an 
 important
 bug, I can call the vote within hours.
 
 
 I'd have to agree on this. From the point of view of a release, if a
 test reports a failure then it should be made to not report a failure.
 If that's accomplished by disabling it, then there will be a commit
 with a message that explains why it was disabled and etc and such on
 and so forth.
 
 I'd do that if the test was failing for me :)
 
 it's not failing for you when you run changes.js with the CLI ?  Fails 
 for me every time. 
 
 I don't consider the CLI tests as part of the official test suite just 
 yet.
 
 Cheers
 Jan
 --
 
 
 Anyway I poked at this a bit yesterday and am not 100% sure the issue is 
 in the test. I tried putting a sleep in with no luck. If my 
 understanding of the JS is correct, CouchDB is supposed to be 
 synchronous so it's not timing.
 
 If someone could comment on the test itself it would be helpful. The 
 section of the code that fails:
 
 // changes get all_docs style with deleted docs
 var doc = {a:1};
 db.save(doc);
 db.deleteDoc(doc);
 var req = CouchDB.request(GET, 
 /test_suite_db/_changes?filter=changes_filter/bopstyle=all_docs);
 var resp = JSON.parse(req.responseText);
 TEquals(3, resp.results.length, should return matching rows);
 
 
 seems odd to me. all_docs as I read the code will return docs with 
 deletes and conflicts but in this call the filter bop will not apply to 
 the doc {a:1} so I'm not sure what this delete prior to the call is 
 about. Anyway I can make it fail in the debugger so perhaps I can find 
 the root cause.
 
 
 
 
 
 Cheers
 Jan
 --
 
 
 
 
 
 
 



Re: Test suite blocking release

2010-03-20 Thread Robert Dionne



On Mar 19, 2010, at 7:25 PM, Jan Lehnardt wrote:

 
 On 19 Mar 2010, at 18:07, J Chris Anderson wrote:
 
 
 On Mar 19, 2010, at 11:43 AM, Paul Davis wrote:
 
 On Fri, Mar 19, 2010 at 2:02 PM, Jan Lehnardt j...@apache.org wrote:
 
 On 19 Mar 2010, at 12:50, Noah Slater wrote:
 
 
 On 19 Mar 2010, at 17:11, Jan Lehnardt wrote:
 
 We want to test the CouchDB code, not the browser's HTTP handling.
 
 Sure, but as one of CouchDB's primary interfaces is the browser, it seems 
 to makes sense that we would want to test how this works. Testing from 
 the browser allows us to test for and catch problems introduced by 
 caching, etc - which is what our real world users would be running into.
 
 Unless I'm missing something?
 
 I fully agree, but we should have a separate browser interaction
 suite for that. The test suite is a very untypical browser client and
 doesn't really test real-world browser use-cases.
 
 Cheers
 Jan
 --
 
 +a bajillion.
 
 
 I prefer the browser tests because I'm much happier with JavaScript.
 
 I'm not saying we should get rid of the browser tests. But intermittent errors
 in the current test suite are not to be worried about to block a release.

I agree with all the comments about separation of tests and so forth. This 
particular changes test is not intermittent, it consistently fails (on my 
machine), enough that it's a pleasant surprise when it succeeds. When running 
from the CLI I get the following:

not ok 10 changes expected '3', got '1'

When running in FF I also get the message above and occasionally:

• Exception raised: 
{message:JSON.parse,fileName:http://127.0.0.1:5984/_utils/script/couch_test_runner.js?0.11.0,lineNumber:154,stack:;(false)@http://127.0.0.1:5984/_utils/script/couch_test_runner.js?0.11.0:154\u000arun(-2)@http://127.0.0.1:5984/_utils/script/couch_test_runner.js?0.11.0:83\u000a}

I haven't looked into it closely to find the root cause, it might just be the 
test, but it's definitely not intermittent. From the CLI it happens almost 
always



 
 If we want proper browser client testing, we'd need an additional test suite
 that covers common and uncommon use-cases. I believe the current test
 suite is as untypical as a browser client can be.
 
 Cheers
 Jan
 --
 
 
 
 But maybe I'm crazy
 
 
 I think its important to maintain *some* tests in the browser to test
 its ability to use CouchDB as a client, but we should put more work
 into separating API tests and core tests.
 
 Also, Zed Shaw has a very informative (and colorful) description of
 confounding factors [1]. Its about two thirds of the way down under a
 heading of Confounding, Confounding, Confounding.
 
 http://www.zedshaw.com/essays/programmer_stats.html
 
 



Re: Test suite blocking release

2010-03-20 Thread Robert Dionne
This is the call in the Futon test that fails consistently:

var req = CouchDB.request(GET, 
/test_suite_db/_changes?filter=changes_filter/bopstyle=all_docs);




On Mar 20, 2010, at 2:19 PM, Benoit Chesneau wrote:

 On Sat, Mar 20, 2010 at 7:17 PM, Jan Lehnardt j...@apache.org wrote:
 
 On 20 Mar 2010, at 10:26, Noah Slater wrote:
 
 Jan, should this block the release? From what I can tell, it should.
 
 I don't think a faulty test case should block the release.
 
 Cheers
 Jan
 are we sure it's a faulty test? For what it worth i reproduce it in
 python in couchdbkit unittests.
 
 - benoit



Re: Test suite errors

2010-03-19 Thread Robert Dionne
I see similar issues, though never with 100-ref-counter.  It looks like a race 
condition but should be checked because the place where it's used, 
couch_db:is_idle, depends on that value being right.

make check is much faster that make cover   

I think it's ok for tests to take a long time to run and I suspect most users 
are used to it. It's a measure of how solid the code is. Perhaps there could be 
two levels of testing, one that's quick and superficial and sufficient to 
verify the build so you can run it repeatedly in reasonable time, and the other 
for users at install time that includes long running performance tests, test 
that run a server and so forth. At build time you'd only need to run this once 
at the end.



On Mar 19, 2010, at 7:13 AM, Noah Slater wrote:

 Some of the test suites rely on timing delays, and these are unpredictable, 
 resulting in non-deterministic test failures. The full tag/build/test cycle 
 is long enough as it is - but having to start again from scratch when the 
 last, and second, run of the test suite fails adds a significant amount of 
 friction for me. I would like to ask that this issue is address as soon as 
 possible. It is entirely my fault that this release has been delayed as much 
 as it has, but my job would be made significantly easier if the test suite 
 behaved consistently.
 
 I got the error included below this morning, and when I ran it again, there 
 was no error. I am going to ignore this for now, and just call a vote on the 
 release. But doing so is risky. I have no idea why this failed once, and as 
 release manager, it is my duty to understand the bugs we're shipping with. I 
 don't like being in a position where I am ignoring them for convenience. They 
 exist as warning beacons, primarily for me, and when I start having to ignore 
 them, they have utterly failed to do their job properly.
 
 Apologies if this email sounds frustrated. I am frustrated.
 
 I'm not finger pointing, just trying to illustrate the reasons for my belief 
 that this problem should be addressed as soon as possible.
 
 ./test/etap/run
 /tmp/couchdb/0.11.0/test/etap/001-loadok 
 /tmp/couchdb/0.11.0/test/etap/002-icu-driver..ok 
 /tmp/couchdb/0.11.0/test/etap/010-file-basics.ok 
 /tmp/couchdb/0.11.0/test/etap/011-file-headersok 
 /tmp/couchdb/0.11.0/test/etap/020-btree-basicsok 
 /tmp/couchdb/0.11.0/test/etap/021-btree-reductionsok 
 /tmp/couchdb/0.11.0/test/etap/030-doc-from-json...ok 
 /tmp/couchdb/0.11.0/test/etap/031-doc-to-json.ok 
 /tmp/couchdb/0.11.0/test/etap/040-utilok 
 /tmp/couchdb/0.11.0/test/etap/041-uuid-genok 
 /tmp/couchdb/0.11.0/test/etap/050-stream..ok 
 /tmp/couchdb/0.11.0/test/etap/060-kt-merging..ok 
 /tmp/couchdb/0.11.0/test/etap/061-kt-missing-leaves...ok 
 /tmp/couchdb/0.11.0/test/etap/062-kt-remove-leavesok 
 /tmp/couchdb/0.11.0/test/etap/063-kt-get-leaves...ok 
 /tmp/couchdb/0.11.0/test/etap/064-kt-counting.ok 
 /tmp/couchdb/0.11.0/test/etap/065-kt-stemming.ok 
 /tmp/couchdb/0.11.0/test/etap/070-couch-dbok 
 /tmp/couchdb/0.11.0/test/etap/080-config-get-set..ok 
 /tmp/couchdb/0.11.0/test/etap/081-config-override.ok 
 /tmp/couchdb/0.11.0/test/etap/082-config-register.ok 
 /tmp/couchdb/0.11.0/test/etap/083-config-no-files.ok 
 /tmp/couchdb/0.11.0/test/etap/090-task-status.ok 
 /tmp/couchdb/0.11.0/test/etap/100-ref-counter.FAILED test 8  
   Failed 1/8 tests, 87.50% okay
 /tmp/couchdb/0.11.0/test/etap/110-replication-httpc...ok 
 /tmp/couchdb/0.11.0/test/etap/111-replication-changes-feedok 
 /tmp/couchdb/0.11.0/test/etap/112-replication-missing-revsok 
 /tmp/couchdb/0.11.0/test/etap/120-stats-collect...ok 
 /tmp/couchdb/0.11.0/test/etap/121-stats-aggregatesok 
 /tmp/couchdb/0.11.0/test/etap/130-attachments-md5.ok 
 /tmp/couchdb/0.11.0/test/etap/140-attachment-comp.ok 
 /tmp/couchdb/0.11.0/test/etap/150-invalid-view-seqok 
 /tmp/couchdb/0.11.0/test/etap/160-vhosts..ok 
 Failed Test   Stat Wstat Total Fail  List of 
 Failed
 ---
 /tmp/couchdb/0.11.0/test/etap/100-ref-cou81  8
 Failed 

Re: Test suite errors

2010-03-19 Thread Robert Dionne




On Mar 19, 2010, at 7:48 AM, Noah Slater wrote:

 I have absolutely no problem with the time taken for the tests.

cool, I misunderstood part of your issue

 
 My only issue is that they intermittently fail. Because of that, I am now 
 suspicious of any results I get.
 
 Is it really an error, or is it a timing issue or a race condition?
 
 Suspicious tests are next to useless.

not really. In any event, looking at the code a bit couch_db:is_idle defines 
idle in part as there are now only two processes referring to this db, so it 
looks like it would err on the side of concluding a db was not idle when in 
fact it was.

Regardless, I would not release code with a failing test suite.

 
 On 19 Mar 2010, at 11:46, Robert Dionne wrote:
 
 I see similar issues, though never with 100-ref-counter.  It looks like a 
 race condition but should be checked because the place where it's used, 
 couch_db:is_idle, depends on that value being right.
 
 make check is much faster that make cover
 
 I think it's ok for tests to take a long time to run and I suspect most 
 users are used to it. It's a measure of how solid the code is. Perhaps there 
 could be two levels of testing, one that's quick and superficial and 
 sufficient to verify the build so you can run it repeatedly in reasonable 
 time, and the other for users at install time that includes long running 
 performance tests, test that run a server and so forth. At build time you'd 
 only need to run this once at the end.
 
 
 
 On Mar 19, 2010, at 7:13 AM, Noah Slater wrote:
 
 Some of the test suites rely on timing delays, and these are unpredictable, 
 resulting in non-deterministic test failures. The full tag/build/test cycle 
 is long enough as it is - but having to start again from scratch when the 
 last, and second, run of the test suite fails adds a significant amount of 
 friction for me. I would like to ask that this issue is address as soon as 
 possible. It is entirely my fault that this release has been delayed as 
 much as it has, but my job would be made significantly easier if the test 
 suite behaved consistently.
 
 I got the error included below this morning, and when I ran it again, there 
 was no error. I am going to ignore this for now, and just call a vote on 
 the release. But doing so is risky. I have no idea why this failed once, 
 and as release manager, it is my duty to understand the bugs we're shipping 
 with. I don't like being in a position where I am ignoring them for 
 convenience. They exist as warning beacons, primarily for me, and when I 
 start having to ignore them, they have utterly failed to do their job 
 properly.
 
 Apologies if this email sounds frustrated. I am frustrated.
 
 I'm not finger pointing, just trying to illustrate the reasons for my 
 belief that this problem should be addressed as soon as possible.
 
 ./test/etap/run
 /tmp/couchdb/0.11.0/test/etap/001-loadok
  
 /tmp/couchdb/0.11.0/test/etap/002-icu-driver..ok
  
 /tmp/couchdb/0.11.0/test/etap/010-file-basics.ok
  
 /tmp/couchdb/0.11.0/test/etap/011-file-headersok
  
 /tmp/couchdb/0.11.0/test/etap/020-btree-basicsok
  
 /tmp/couchdb/0.11.0/test/etap/021-btree-reductionsok
  
 /tmp/couchdb/0.11.0/test/etap/030-doc-from-json...ok
  
 /tmp/couchdb/0.11.0/test/etap/031-doc-to-json.ok
  
 /tmp/couchdb/0.11.0/test/etap/040-utilok
  
 /tmp/couchdb/0.11.0/test/etap/041-uuid-genok
  
 /tmp/couchdb/0.11.0/test/etap/050-stream..ok
  
 /tmp/couchdb/0.11.0/test/etap/060-kt-merging..ok
  
 /tmp/couchdb/0.11.0/test/etap/061-kt-missing-leaves...ok
  
 /tmp/couchdb/0.11.0/test/etap/062-kt-remove-leavesok
  
 /tmp/couchdb/0.11.0/test/etap/063-kt-get-leaves...ok
  
 /tmp/couchdb/0.11.0/test/etap/064-kt-counting.ok
  
 /tmp/couchdb/0.11.0/test/etap/065-kt-stemming.ok
  
 /tmp/couchdb/0.11.0/test/etap/070-couch-dbok
  
 /tmp/couchdb/0.11.0/test/etap/080-config-get-set..ok
  
 /tmp/couchdb/0.11.0/test/etap/081-config-override.ok
  
 /tmp/couchdb/0.11.0/test/etap/082-config-register.ok
  
 /tmp/couchdb/0.11.0/test/etap/083-config-no-files.ok
  
 /tmp/couchdb/0.11.0/test/etap/090-task-status.ok
  
 /tmp/couchdb/0.11.0/test/etap/100-ref-counter.FAILED test 8 
  
 Failed 1/8 tests, 87.50% okay
 /tmp/couchdb/0.11.0/test/etap/110-replication-httpc...ok
  
 /tmp/couchdb/0.11.0/test/etap/111-replication-changes-feed

Re: Test suite errors

2010-03-19 Thread Robert Dionne

 
 I got the error included below this morning, and when I ran it again, there 
 was no error. 

 /tmp/couchdb/0.11.0/test/etap/090-task-status.ok 
 /tmp/couchdb/0.11.0/test/etap/100-ref-counter.FAILED test 8  

I looked into this random fail a bit and it may be a real issue to consider. 
couch_ref_counter uses process_info in the count function to compute the number 
of pids referring to the given one, rather than interrogating the number of 
referrers being maintained in the state of the gen_servers. This diff could 
explain the apparent race condition that caused this fail (which doesn't 
reproduce on my box). From the who_calls trace it looks ok and I surmise the 
reason process_info is used is to handle the case where a Pid dies in the 
forest and no one accounts for it, throwing off the count, whereas process_info 
presumably never lies. 



Re: CouchDB Wiki / Documentation

2010-03-07 Thread Robert Dionne


On Mar 7, 2010, at 3:11 AM, J Chris Anderson wrote:

 
 On Mar 6, 2010, at 4:47 PM, Noah Slater wrote:
 
 
 On 7 Mar 2010, at 00:38, Mikeal Rogers wrote:
 
 We can provide the same providence and copyright assurances outside of
 JIRA. It's a checkbox, it's not hard.
 
 I agree. I don't want to be misunderstood as fighting against the initiative 
 of improving our documentation. I just want to make sure we do it in a way 
 that embraces our parent organisation. And as far as I know, and I am 
 incorrect rather frequently, that involves hosting whatever we use on ASF 
 infrastructure, and taking the proper measures to make sure that what is 
 produced is free for the community to use, where free is the ASF's 
 definition of free. If we can satisfy that, and improve our documentation, 
 it will have my full support. However, just creating a new repository on 
 Github and linking to it from our site is not satisfactory.
 
 I think the short run, putting together an informal Markdown repository of 
 docs or proto-docs would be cool. I don't care where it live but we'll need 
 to get an ASF Zone for a project CouchDB instance. I guess this means Damien 
 has to send an email to infra.
 
 It's simple to have CouchDB render Markdown. After 0.11 is out and we launch 
 the project CouchDB instance, I'm sure there are a ton of CouchApps we could 
 run it on that can handle Markdown.

I really like the idea of this dogfood exercise. With respect to Markdown, I'd 
like to also mention Org for those who are emacs users. It can be used for 
everything from project management to publishing web pages, journal articles, 
you name it. One can even embed LaTex in it. For those who already know Org, 
markdown-mode[1] enables editing of markdown with org style cycling, etc.. it 
can be published to any format.

I think the closer to doc stays to the code the more apt it is to be read, 
written, and useful. I'm wondering how useful it might be to have some of these 
markdown files reside with the code in the repository.


[1] http://jblevins.org/projects/markdown-mode/





 
 Under ASF guidelines I'm pretty sure we don't need to worry about CLAs for 
 documentation contributors. Let's get the barrier to entry for new 
 documenters as low as possible.
 
 Chris
 
 



  1   2   >