date:20100125

[jira] Commented: (COUCHDB-514) Redirect from _list using view rows

2010-01-25 Thread Chris Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804920#action_12804920
 ] 

Chris Anderson commented on COUCHDB-514:


Joscha,

If you can provide just the bug fix (not _stylistic changes) I'll be glad to 
help you finish it.

I do think this will require an Erlang fix. I won't let it slip past 1.0 but I 
don't have time to write it before 0.11

Chris


> Redirect from _list using view rows
> ---
>
> Key: COUCHDB-514
> URL: https://issues.apache.org/jira/browse/COUCHDB-514
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Affects Versions: 0.10
>Reporter: Zachary Zolton
> Attachments: list-redir.diff, list_views.diff, render.diff
>
>
> There is no way to redirect from a _list function after calling the getRow() 
> API function.
> Here's a link to the discussion on the dev mailing list:
> http://is.gd/3KZRg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (COUCHDB-514) Redirect from _list using view rows

2010-01-25 Thread Chris Anderson (JIRA)


 [ 
https://issues.apache.org/jira/browse/COUCHDB-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Anderson reassigned COUCHDB-514:
--

Assignee: Chris Anderson

> Redirect from _list using view rows
> ---
>
> Key: COUCHDB-514
> URL: https://issues.apache.org/jira/browse/COUCHDB-514
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Affects Versions: 0.10
>Reporter: Zachary Zolton
>Assignee: Chris Anderson
> Attachments: list-redir.diff, list_views.diff, render.diff
>
>
> There is no way to redirect from a _list function after calling the getRow() 
> API function.
> Here's a link to the discussion on the dev mailing list:
> http://is.gd/3KZRg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

2010-01-25 Thread Chris Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804919#action_12804919
 ] 

Chris Anderson commented on COUCHDB-583:


I haven't really been tuned into the full discussion of this patch -- I think 
the biggest questions for something that digs this deep into the file format 
are:

How does it impact stability? (looks fine at my cursory glance, aside from 
cross compatibility with older versions of the file format, which I'd have to 
look more closely at)

What is the payoff? How much space does this save in practice? (say, with email 
messages as attachments, vs with pngs or minified js) I'm not asking you to do 
all that work, just think that real numbers are a selling point.

If it's a big payoff then this becomes a priority. We might also want to add 
options for compressing the views.




> storing attachments in compressed form and serving them in compressed form if 
> accepted by the client
> 
>
> Key: COUCHDB-583
> URL: https://issues.apache.org/jira/browse/COUCHDB-583
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core, HTTP Interface
> Environment: CouchDB trunk
>Reporter: Filipe Manana
> Attachments: couchdb-583-trunk-10th-try.patch, 
> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, 
> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, 
> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, 
> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, 
> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, 
> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, 
> jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being 
> received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET 
> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if 
> the client's http request has gzip specified as a valid transfer encoding for 
> the response (using the http header "Accept-Encoding"). Otherwise couch 
> decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those 
> listed in a separate config file. Compression level is also configurable in 
> the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or 
> non-compressable files (would probably be too big for the regular ini file). 
> What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, 
> and serve the compressed bytes directly to clients that can handle it, and 
> decompressed for those that can't. For compressable types, it's a win for 
> both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

2010-01-25 Thread Paul Joseph Davis (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804917#action_12804917
 ] 

Paul Joseph Davis commented on COUCHDB-583:
---

Filipe,

Sorry, got distracted by a weekend project. I'll try and do a thorough review 
tomorrow before the big news day on Wednesday.

> storing attachments in compressed form and serving them in compressed form if 
> accepted by the client
> 
>
> Key: COUCHDB-583
> URL: https://issues.apache.org/jira/browse/COUCHDB-583
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core, HTTP Interface
> Environment: CouchDB trunk
>Reporter: Filipe Manana
> Attachments: couchdb-583-trunk-10th-try.patch, 
> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, 
> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, 
> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, 
> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, 
> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, 
> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, 
> jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being 
> received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET 
> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if 
> the client's http request has gzip specified as a valid transfer encoding for 
> the response (using the http header "Accept-Encoding"). Otherwise couch 
> decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those 
> listed in a separate config file. Compression level is also configurable in 
> the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or 
> non-compressable files (would probably be too big for the regular ini file). 
> What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, 
> and serve the compressed bytes directly to clients that can handle it, and 
> decompressed for those that can't. For compressable types, it's a win for 
> both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

2010-01-25 Thread Filipe Manana (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804915#action_12804915
 ] 

Filipe Manana commented on COUCHDB-583:
---

@Paul

any news on this?

> storing attachments in compressed form and serving them in compressed form if 
> accepted by the client
> 
>
> Key: COUCHDB-583
> URL: https://issues.apache.org/jira/browse/COUCHDB-583
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Database Core, HTTP Interface
> Environment: CouchDB trunk
>Reporter: Filipe Manana
> Attachments: couchdb-583-trunk-10th-try.patch, 
> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, 
> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, 
> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, 
> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, 
> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, 
> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, 
> jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being 
> received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET 
> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if 
> the client's http request has gzip specified as a valid transfer encoding for 
> the response (using the http header "Accept-Encoding"). Otherwise couch 
> decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those 
> listed in a separate config file. Compression level is also configurable in 
> the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or 
> non-compressable files (would probably be too big for the regular ini file). 
> What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, 
> and serve the compressed bytes directly to clients that can handle it, and 
> decompressed for those that can't. For compressable types, it's a win for 
> both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: svn commit: r903023 - in /couchdb/trunk/share: Makefile.am server/json2.js server/util.js www/script/json2.js www/script/test/show_documents.js www/script/test/update_documents.js www/script/tes

2010-01-25 Thread Chris Anderson

On Mon, Jan 25, 2010 at 8:04 PM, Jan Lehnardt  wrote:
> Hey Chris,
>
> great work, thanks. Can you update 
> http://wiki.apache.org/couchdb/Breaking_changes? :)
>

I wouldn't mind. But with that old wiki it'll take me 15 minutes to
figure out what username I'm shooting for.

... signing up for a new account:

http://wiki.apache.org/couchdb/FrontPage?action=newaccount

gives a 500 error. try again, and

OK I seem to have fixed it. Thanks for the prodding. The wiki wasn't
that bad but it could be more fun.

http://wiki.apache.org/couchdb/Breaking_changes

Chris


> Cheers
> Jan
> --
>
> On 25 Jan 2010, at 16:12, jch...@apache.org wrote:
>
>> Author: jchris
>> Date: Tue Jan 26 00:11:59 2010
>> New Revision: 903023
>>
>> URL: http://svn.apache.org/viewvc?rev=903023&view=rev
>> Log:
>> Replace the old JavaScript query server JSON library with json2.js
>>
>> This change makes us interoperate better with other JSON implementations. It 
>> also means we can use the native JSON handlers in JavaScript runtimes that 
>> support them. Should be faster right away on new Spidermonkeys.
>>
>> There are some potential breaking changes for apps that depend on Couch 
>> blowing up on 'undefined'. json2.js serialized undefined as 'null' instead 
>> of crashing.
>>
>> This change will also affect people using E4X, as you can't just return an 
>> XML object and have it serialized to a string for you. Calling 
>> .toXMLString() on these is all you need to do.
>>
>> Added:
>>    couchdb/trunk/share/server/json2.js
>>      - copied, changed from r902422, couchdb/trunk/share/www/script/json2.js
>> Modified:
>>    couchdb/trunk/share/Makefile.am
>>    couchdb/trunk/share/server/util.js
>>    couchdb/trunk/share/www/script/json2.js
>>    couchdb/trunk/share/www/script/test/show_documents.js
>>    couchdb/trunk/share/www/script/test/update_documents.js
>>    couchdb/trunk/share/www/script/test/view_errors.js
>>
>> Modified: couchdb/trunk/share/Makefile.am
>> URL: 
>> http://svn.apache.org/viewvc/couchdb/trunk/share/Makefile.am?rev=903023&r1=903022&r2=903023&view=diff
>> ==
>> --- couchdb/trunk/share/Makefile.am (original)
>> +++ couchdb/trunk/share/Makefile.am Tue Jan 26 00:11:59 2010
>> @@ -13,6 +13,7 @@
>> JS_FILE = server/main.js
>>
>> JS_FILE_COMPONENTS = \
>> +    server/json2.js \
>>     server/filter.js \
>>     server/mimeparse.js \
>>     server/render.js \
>>
>> Copied: couchdb/trunk/share/server/json2.js (from r902422, 
>> couchdb/trunk/share/www/script/json2.js)
>> URL: 
>> http://svn.apache.org/viewvc/couchdb/trunk/share/server/json2.js?p2=couchdb/trunk/share/server/json2.js&p1=couchdb/trunk/share/www/script/json2.js&r1=902422&r2=903023&rev=903023&view=diff
>> ==
>> --- couchdb/trunk/share/www/script/json2.js [utf-8] (original)
>> +++ couchdb/trunk/share/server/json2.js [utf-8] Tue Jan 26 00:11:59 2010
>> @@ -1,6 +1,6 @@
>> /*
>>     http://www.JSON.org/json2.js
>> -    2009-08-17
>> +    2009-09-29
>>
>>     Public Domain.
>>
>> @@ -8,6 +8,14 @@
>>
>>     See http://www.JSON.org/js.html
>>
>> +
>> +    This code should be minified before deployment.
>> +    See http://javascript.crockford.com/jsmin.html
>> +
>> +    USE YOUR OWN COPY. IT IS EXTREMELY UNWISE TO LOAD CODE FROM SERVERS YOU 
>> DO
>> +    NOT CONTROL.
>> +
>> +
>>     This file creates a global JSON object containing two methods: stringify
>>     and parse.
>>
>> @@ -136,15 +144,9 @@
>>
>>     This is a reference implementation. You are free to copy, modify, or
>>     redistribute.
>> -
>> -    This code should be minified before deployment.
>> -    See http://javascript.crockford.com/jsmin.html
>> -
>> -    USE YOUR OWN COPY. IT IS EXTREMELY UNWISE TO LOAD CODE FROM SERVERS YOU 
>> DO
>> -    NOT CONTROL.
>> */
>>
>> -/*jslint evil: true */
>> +/*jslint evil: true, strict: false */
>>
>> /*members "", "\b", "\t", "\n", "\f", "\r", "\"", JSON, "\\", apply,
>>     call, charCodeAt, getUTCDate, getUTCFullYear, getUTCHours,
>> @@ -153,7 +155,6 @@
>>     test, toJSON, toString, valueOf
>> */
>>
>> -"use strict";
>>
>> // Create a JSON object only if one does not already exist. We create the
>> // methods in a closure to avoid creating global variables.
>>
>> Modified: couchdb/trunk/share/server/util.js
>> URL: 
>> http://svn.apache.org/viewvc/couchdb/trunk/share/server/util.js?rev=903023&r1=903022&r2=903023&view=diff
>> ==
>> --- couchdb/trunk/share/server/util.js (original)
>> +++ couchdb/trunk/share/server/util.js Tue Jan 26 00:11:59 2010
>> @@ -13,14 +13,7 @@
>> var Couch = {
>>   // moving this away from global so we can move to json2.js later
>>   toJSON : function (val) {
>> -    if (typeof(val) == "undefined") {
>> -      throw "Cannot encode 'undefined' value as JSON";
>> -    }
>> -    if (typeof(val) == "xml") { // E4X support

JavaScript bcrypt (was Re: authentication cleanup)

2010-01-25 Thread Chris Anderson

On Tue, Jan 5, 2010 at 10:21 PM, Benoit Chesneau  wrote:
>> --
> There is a blowfish encryption implementation available in javascript.
> doesn't bcrypt  stand for "blowfish crypt" ?
> http://www.openbsd.org/cgi-bin/man.cgi?query=bcrypt&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html
>
> fro where it has been created.
>
> - benoît
>

Is anyone up to replace our salted hashes with a JS bcrypt implementation?

If we can start supporting bcrypt for 0.11 we're less likely to have
salted hash passwords hanging around *forever* from people who create
user docs before 1.0.

If no one else picks this up soon I'll look at it again for 1.0

Thanks,
Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: upgrading to json2.js

2010-01-25 Thread Paul Davis

On Mon, Jan 25, 2010 at 11:08 PM, Chris Anderson  wrote:
> On Tue, Dec 22, 2009 at 10:26 AM, Chris Anderson  wrote:
>> On Sat, Dec 19, 2009 at 5:07 PM, Chris Anderson  wrote:
>>> It's well known that in order to take advantage of native JSON
>>> libraries in the newest Mozilla JavaScript VMs, we'll need to change
>>> our handling of 'undefined' in the toJSON() routine.
>>>
>>> I propose we make this change now, by replacing our current JSON
>>> handling with json2.js, the current reference implementation.
>>>
>>> I've started the work here:
>>>
>>> http://github.com/jchris/couchdb/tree/json2
>>
>> I've update my json2 branch to reflect my latest commits to trunk.
>>
>
> I've committed this change to CouchDB. It will appear in the 0.11
> release. From the commit message:
>
>  Replace the old JavaScript query server JSON library with json2.js
>
>    This change makes us interoperate better with other JSON
> implementations. It also means we can use the native JSON handlers in
> JavaScript runtimes that support them.
>
>    There are some potential breaking changes for apps that depend on
> Couch blowing up on 'undefined'. json2.js serializes undefined as
> 'null' instead of crashing.

The change is that undefined in an array gets serialized as null. Thus:

 $ JSON.stringify([undefined]) -> "[null]"

plus the XML stuff.

No idea how JSON.stringify(undefined) behaves but we wrap all results
in an array before passing to Erlang so it shouldn't be a huge deal.

HTH,
Paul Davis

>    This change will also effect people using E4X, as you can't just
> return an XML object and have it serialized to a string for you.
> Calling .toXMLString() on these is what you need to do here.
>
>
> Best,
> Chris
>
>
>> Benoit has fixed the E4X issues. There are a few other test failures
>> which I believe have to do with the changed behavior. If anyone wants
>> to take a look at these and consider changing the tests where
>> appropriate, that'd be super helpful.
>>
>> Chris
>>
>>>
>>> Everything works except E4X. When I run the view_xml tests, I see this
>>> error in the logs:
>>>
>>> OS Process :: function raised exception (TypeError:
>>> String.prototype.toJSON called on incompatible XML) with doc._id
>>> 43840f81289e03fec4e9f620b2c03799
>>>
>>> In our old implementation of toJSON, we run value.toXMLString() to
>>> convert XML to strings. json2.js takes a callback parameter to allow
>>> modification of results, but the TypeError is triggered before the
>>> callback, it seems.
>>>
>>> If any of you JavaScript ninjas wanna give this a shot, please help me
>>> finish it.
>>>
>>> Chris
>>>
>>> --
>>> Chris Anderson
>>> http://jchrisa.net
>>> http://couch.io
>>>
>>
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Re: buildbot failure in ASF Buildbot on couchdb-trunk

2010-01-25 Thread Paul Davis

No more emails. license.skip is gonna rely on paths in the ignore
patterns though, so in copying it I would expect things to get
triggered again.

Paul Davis

On Mon, Jan 25, 2010 at 11:06 PM, Chris Anderson  wrote:
> On Mon, Jan 25, 2010 at 4:42 PM, Paul Davis  
> wrote:
>> Looks like you forgot to add json2.js to license.skip
>>
>
> Thanks.
>
> Since json2.js has been in _utils for a long time I figured licenses
> would be taken care of.
>
> Fixed. (I think)
>
>> On Mon, Jan 25, 2010 at 7:57 PM,   wrote:
>>> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot.
>>> Full details are available at:
>>>  http://ci.apache.org/builders/couchdb-trunk/builds/170
>>>
>>> Buildbot URL: http://ci.apache.org/
>>>
>>> Buildslave for this Build: bb-vm_ubuntu
>>>
>>> Build Reason:
>>> Build Source Stamp: [branch couchdb/trunk] 903023
>>> Blamelist: jchris
>>>
>>> BUILD FAILED: failed compile_5
>>>
>>> sincerely,
>>>  -The ASF Buildbot
>>>
>>>
>>
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Re: upgrading to json2.js

2010-01-25 Thread Chris Anderson

On Tue, Dec 22, 2009 at 10:26 AM, Chris Anderson  wrote:
> On Sat, Dec 19, 2009 at 5:07 PM, Chris Anderson  wrote:
>> It's well known that in order to take advantage of native JSON
>> libraries in the newest Mozilla JavaScript VMs, we'll need to change
>> our handling of 'undefined' in the toJSON() routine.
>>
>> I propose we make this change now, by replacing our current JSON
>> handling with json2.js, the current reference implementation.
>>
>> I've started the work here:
>>
>> http://github.com/jchris/couchdb/tree/json2
>
> I've update my json2 branch to reflect my latest commits to trunk.
>

I've committed this change to CouchDB. It will appear in the 0.11
release. From the commit message:

 Replace the old JavaScript query server JSON library with json2.js

This change makes us interoperate better with other JSON
implementations. It also means we can use the native JSON handlers in
JavaScript runtimes that support them.

There are some potential breaking changes for apps that depend on
Couch blowing up on 'undefined'. json2.js serializes undefined as
'null' instead of crashing.

This change will also effect people using E4X, as you can't just
return an XML object and have it serialized to a string for you.
Calling .toXMLString() on these is what you need to do here.


Best,
Chris


> Benoit has fixed the E4X issues. There are a few other test failures
> which I believe have to do with the changed behavior. If anyone wants
> to take a look at these and consider changing the tests where
> appropriate, that'd be super helpful.
>
> Chris
>
>>
>> Everything works except E4X. When I run the view_xml tests, I see this
>> error in the logs:
>>
>> OS Process :: function raised exception (TypeError:
>> String.prototype.toJSON called on incompatible XML) with doc._id
>> 43840f81289e03fec4e9f620b2c03799
>>
>> In our old implementation of toJSON, we run value.toXMLString() to
>> convert XML to strings. json2.js takes a callback parameter to allow
>> modification of results, but the TypeError is triggered before the
>> callback, it seems.
>>
>> If any of you JavaScript ninjas wanna give this a shot, please help me
>> finish it.
>>
>> Chris
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: svn commit: r903023 - in /couchdb/trunk/share: Makefile.am server/json2.js server/util.js www/script/json2.js www/script/test/show_documents.js www/script/test/update_documents.js www/script/test/

2010-01-25 Thread Jan Lehnardt

Hey Chris,

great work, thanks. Can you update 
http://wiki.apache.org/couchdb/Breaking_changes? :)

Cheers
Jan
--

On 25 Jan 2010, at 16:12, jch...@apache.org wrote:

> Author: jchris
> Date: Tue Jan 26 00:11:59 2010
> New Revision: 903023
> 
> URL: http://svn.apache.org/viewvc?rev=903023&view=rev
> Log:
> Replace the old JavaScript query server JSON library with json2.js
> 
> This change makes us interoperate better with other JSON implementations. It 
> also means we can use the native JSON handlers in JavaScript runtimes that 
> support them. Should be faster right away on new Spidermonkeys.
> 
> There are some potential breaking changes for apps that depend on Couch 
> blowing up on 'undefined'. json2.js serialized undefined as 'null' instead of 
> crashing.
> 
> This change will also affect people using E4X, as you can't just return an 
> XML object and have it serialized to a string for you. Calling .toXMLString() 
> on these is all you need to do.
> 
> Added:
>couchdb/trunk/share/server/json2.js
>  - copied, changed from r902422, couchdb/trunk/share/www/script/json2.js
> Modified:
>couchdb/trunk/share/Makefile.am
>couchdb/trunk/share/server/util.js
>couchdb/trunk/share/www/script/json2.js
>couchdb/trunk/share/www/script/test/show_documents.js
>couchdb/trunk/share/www/script/test/update_documents.js
>couchdb/trunk/share/www/script/test/view_errors.js
> 
> Modified: couchdb/trunk/share/Makefile.am
> URL: 
> http://svn.apache.org/viewvc/couchdb/trunk/share/Makefile.am?rev=903023&r1=903022&r2=903023&view=diff
> ==
> --- couchdb/trunk/share/Makefile.am (original)
> +++ couchdb/trunk/share/Makefile.am Tue Jan 26 00:11:59 2010
> @@ -13,6 +13,7 @@
> JS_FILE = server/main.js
> 
> JS_FILE_COMPONENTS = \
> +server/json2.js \
> server/filter.js \
> server/mimeparse.js \
> server/render.js \
> 
> Copied: couchdb/trunk/share/server/json2.js (from r902422, 
> couchdb/trunk/share/www/script/json2.js)
> URL: 
> http://svn.apache.org/viewvc/couchdb/trunk/share/server/json2.js?p2=couchdb/trunk/share/server/json2.js&p1=couchdb/trunk/share/www/script/json2.js&r1=902422&r2=903023&rev=903023&view=diff
> ==
> --- couchdb/trunk/share/www/script/json2.js [utf-8] (original)
> +++ couchdb/trunk/share/server/json2.js [utf-8] Tue Jan 26 00:11:59 2010
> @@ -1,6 +1,6 @@
> /*
> http://www.JSON.org/json2.js
> -2009-08-17
> +2009-09-29
> 
> Public Domain.
> 
> @@ -8,6 +8,14 @@
> 
> See http://www.JSON.org/js.html
> 
> +
> +This code should be minified before deployment.
> +See http://javascript.crockford.com/jsmin.html
> +
> +USE YOUR OWN COPY. IT IS EXTREMELY UNWISE TO LOAD CODE FROM SERVERS YOU 
> DO
> +NOT CONTROL.
> +
> +
> This file creates a global JSON object containing two methods: stringify
> and parse.
> 
> @@ -136,15 +144,9 @@
> 
> This is a reference implementation. You are free to copy, modify, or
> redistribute.
> -
> -This code should be minified before deployment.
> -See http://javascript.crockford.com/jsmin.html
> -
> -USE YOUR OWN COPY. IT IS EXTREMELY UNWISE TO LOAD CODE FROM SERVERS YOU 
> DO
> -NOT CONTROL.
> */
> 
> -/*jslint evil: true */
> +/*jslint evil: true, strict: false */
> 
> /*members "", "\b", "\t", "\n", "\f", "\r", "\"", JSON, "\\", apply,
> call, charCodeAt, getUTCDate, getUTCFullYear, getUTCHours,
> @@ -153,7 +155,6 @@
> test, toJSON, toString, valueOf
> */
> 
> -"use strict";
> 
> // Create a JSON object only if one does not already exist. We create the
> // methods in a closure to avoid creating global variables.
> 
> Modified: couchdb/trunk/share/server/util.js
> URL: 
> http://svn.apache.org/viewvc/couchdb/trunk/share/server/util.js?rev=903023&r1=903022&r2=903023&view=diff
> ==
> --- couchdb/trunk/share/server/util.js (original)
> +++ couchdb/trunk/share/server/util.js Tue Jan 26 00:11:59 2010
> @@ -13,14 +13,7 @@
> var Couch = {
>   // moving this away from global so we can move to json2.js later
>   toJSON : function (val) {
> -if (typeof(val) == "undefined") {
> -  throw "Cannot encode 'undefined' value as JSON";
> -}
> -if (typeof(val) == "xml") { // E4X support
> -  val = val.toXMLString();
> -}
> -if (val === null) { return "null"; }
> -return (Couch.toJSON.dispatcher[val.constructor.name])(val);
> +return JSON.stringify(val);
>   },
>   compileFunction : function(source) {
> if (!source) throw(["error","not_found","missing function"]);
> @@ -47,55 +40,6 @@
>   }
> }
> 
> -Couch.toJSON.subs = {'\b': '\\b', '\t': '\\t', '\n': '\\n', '\f': '\\f',
> -  '\r': '\\r', '"' : '\\"', '\\': ''};
> -Couch.toJSON.dispatcher = {
> -"Array": function(v) {
> -  var buf = [];
> -  for (var

Re: buildbot failure in ASF Buildbot on couchdb-trunk

2010-01-25 Thread Chris Anderson

On Mon, Jan 25, 2010 at 4:42 PM, Paul Davis  wrote:
> Looks like you forgot to add json2.js to license.skip
>

Thanks.

Since json2.js has been in _utils for a long time I figured licenses
would be taken care of.

Fixed. (I think)

> On Mon, Jan 25, 2010 at 7:57 PM,   wrote:
>> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot.
>> Full details are available at:
>>  http://ci.apache.org/builders/couchdb-trunk/builds/170
>>
>> Buildbot URL: http://ci.apache.org/
>>
>> Buildslave for this Build: bb-vm_ubuntu
>>
>> Build Reason:
>> Build Source Stamp: [branch couchdb/trunk] 903023
>> Blamelist: jchris
>>
>> BUILD FAILED: failed compile_5
>>
>> sincerely,
>>  -The ASF Buildbot
>>
>>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: buildbot failure in ASF Buildbot on couchdb-trunk

2010-01-25 Thread Paul Davis

And 15M to team Gavin for giving us buildbot + notifications

On Mon, Jan 25, 2010 at 7:48 PM, Noah Slater  wrote:
> Score 1 for team Noah.
>
> On 26 Jan 2010, at 00:42, Paul Davis wrote:
>
>> Looks like you forgot to add json2.js to license.skip
>>
>> On Mon, Jan 25, 2010 at 7:57 PM,   wrote:
>>> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot.
>>> Full details are available at:
>>>  http://ci.apache.org/builders/couchdb-trunk/builds/170
>>>
>>> Buildbot URL: http://ci.apache.org/
>>>
>>> Buildslave for this Build: bb-vm_ubuntu
>>>
>>> Build Reason:
>>> Build Source Stamp: [branch couchdb/trunk] 903023
>>> Blamelist: jchris
>>>
>>> BUILD FAILED: failed compile_5
>>>
>>> sincerely,
>>>  -The ASF Buildbot
>>>
>>>
>
>

Re: buildbot failure in ASF Buildbot on couchdb-trunk

2010-01-25 Thread Noah Slater

Score 1 for team Noah.

On 26 Jan 2010, at 00:42, Paul Davis wrote:

> Looks like you forgot to add json2.js to license.skip
> 
> On Mon, Jan 25, 2010 at 7:57 PM,   wrote:
>> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot.
>> Full details are available at:
>>  http://ci.apache.org/builders/couchdb-trunk/builds/170
>> 
>> Buildbot URL: http://ci.apache.org/
>> 
>> Buildslave for this Build: bb-vm_ubuntu
>> 
>> Build Reason:
>> Build Source Stamp: [branch couchdb/trunk] 903023
>> Blamelist: jchris
>> 
>> BUILD FAILED: failed compile_5
>> 
>> sincerely,
>>  -The ASF Buildbot
>> 
>>

Re: buildbot failure in ASF Buildbot on couchdb-trunk

2010-01-25 Thread Paul Davis

Looks like you forgot to add json2.js to license.skip

On Mon, Jan 25, 2010 at 7:57 PM,   wrote:
> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot.
> Full details are available at:
>  http://ci.apache.org/builders/couchdb-trunk/builds/170
>
> Buildbot URL: http://ci.apache.org/
>
> Buildslave for this Build: bb-vm_ubuntu
>
> Build Reason:
> Build Source Stamp: [branch couchdb/trunk] 903023
> Blamelist: jchris
>
> BUILD FAILED: failed compile_5
>
> sincerely,
>  -The ASF Buildbot
>
>

[jira] Commented: (COUCHDB-632) Generic _changes listener added to jquery.couch.js

2010-01-25 Thread Benoit Chesneau (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804819#action_12804819
 ] 

Benoit Chesneau commented on COUCHDB-632:
-

If couchdb stop to get changes we received undefined results and oldest id :

Got 79ca040d0a6d784619c61b28e5ff
Got undefined
Got 79ca040d0a6d784619c61b28e5ff

Here is a quick test.html to reproduce :



  
Test

  
  
Test changes




  
  
  
  
  
var db = $.couch.db("test");
var changes = db.changes({seq:15})
changes.addListener(function(data) {
console.log(data);
$("#lines").append(" Got " + data.id + "");
});
changes.start();
  




> Generic _changes listener added to jquery.couch.js
> --
>
> Key: COUCHDB-632
> URL: https://issues.apache.org/jira/browse/COUCHDB-632
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Futon
> Environment: the Browser!
>Reporter: mikeal
>Priority: Minor
> Attachments: changes.diff, changes1.diff, jquery.couch.js
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> I've written a Generic _changes listener and added it to jquery.couch.js 
> taken from Futon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-632) Generic _changes listener added to jquery.couch.js

2010-01-25 Thread Benoit Chesneau (JIRA)


 [ 
https://issues.apache.org/jira/browse/COUCHDB-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Chesneau updated COUCHDB-632:


Attachment: changes1.diff

Updated diff. fix a try catch non closed :

http://github.com/benoitc/couchdb/commit/41eafc56799b1516c9a8d2207fa53366787be0bf



> Generic _changes listener added to jquery.couch.js
> --
>
> Key: COUCHDB-632
> URL: https://issues.apache.org/jira/browse/COUCHDB-632
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Futon
> Environment: the Browser!
>Reporter: mikeal
>Priority: Minor
> Attachments: changes.diff, changes1.diff, jquery.couch.js
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> I've written a Generic _changes listener and added it to jquery.couch.js 
> taken from Futon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-632) Generic _changes listener added to jquery.couch.js

2010-01-25 Thread Chris Anderson (JIRA)


 [ 
https://issues.apache.org/jira/browse/COUCHDB-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Anderson updated COUCHDB-632:
---

Attachment: changes.diff

here's a diff merged to work with latest trunk

> Generic _changes listener added to jquery.couch.js
> --
>
> Key: COUCHDB-632
> URL: https://issues.apache.org/jira/browse/COUCHDB-632
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Futon
> Environment: the Browser!
>Reporter: mikeal
>Priority: Minor
> Attachments: changes.diff, jquery.couch.js
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> I've written a Generic _changes listener and added it to jquery.couch.js 
> taken from Futon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-632) Generic _changes listener added to jquery.couch.js

2010-01-25 Thread mikeal (JIRA)


 [ 
https://issues.apache.org/jira/browse/COUCHDB-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mikeal updated COUCHDB-632:
---

Attachment: jquery.couch.js

modified jquery.couch.js , jchris said I should just attach the whole file 
instead of a diff.

> Generic _changes listener added to jquery.couch.js
> --
>
> Key: COUCHDB-632
> URL: https://issues.apache.org/jira/browse/COUCHDB-632
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Futon
> Environment: the Browser!
>Reporter: mikeal
>Priority: Minor
> Attachments: jquery.couch.js
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> I've written a Generic _changes listener and added it to jquery.couch.js 
> taken from Futon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (COUCHDB-632) Generic _changes listener added to jquery.couch.js

2010-01-25 Thread mikeal (JIRA)

Generic _changes listener added to jquery.couch.js
--

 Key: COUCHDB-632
 URL: https://issues.apache.org/jira/browse/COUCHDB-632
 Project: CouchDB
  Issue Type: Improvement
  Components: Futon
 Environment: the Browser!
Reporter: mikeal
Priority: Minor


I've written a Generic _changes listener and added it to jquery.couch.js taken 
from Futon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-631) Replication by doc Ids

2010-01-25 Thread Filipe Manana (JIRA)


 [ 
https://issues.apache.org/jira/browse/COUCHDB-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-631:
--

Attachment: replication-by-doc-ids_trunk.patch

The following patch adds support for the optional "doc_ids" attribute (array of 
strings) of a JSON replication object.

The idea was suggested recently by Chris Anderson in the dev mailing list.

> Replication by doc Ids
> --
>
> Key: COUCHDB-631
> URL: https://issues.apache.org/jira/browse/COUCHDB-631
> Project: CouchDB
>  Issue Type: New Feature
>  Components: Replication
> Environment: trunk
>Reporter: Filipe Manana
>Priority: Minor
> Attachments: replication-by-doc-ids_trunk.patch
>
>
> The following patch adds support for the optional "doc_ids" attribute (array 
> of strings) of a JSON replication object.
> The idea was suggested recently by Chris Anderson in the dev mailing list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (COUCHDB-631) Replication by doc Ids

2010-01-25 Thread Filipe Manana (JIRA)

Replication by doc Ids
--

 Key: COUCHDB-631
 URL: https://issues.apache.org/jira/browse/COUCHDB-631
 Project: CouchDB
  Issue Type: New Feature
  Components: Replication
 Environment: trunk
Reporter: Filipe Manana
Priority: Minor
 Attachments: replication-by-doc-ids_trunk.patch

The following patch adds support for the optional "doc_ids" attribute (array of 
strings) of a JSON replication object.

The idea was suggested recently by Chris Anderson in the dev mailing list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-514) Redirect from _list using view rows

2010-01-25 Thread Joscha Feth (JIRA)


 [ 
https://issues.apache.org/jira/browse/COUCHDB-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joscha Feth updated COUCHDB-514:


Attachment: render.diff

Added render.diff which enables the list render function to disable/enable 
automatic flushing and fixes the problem with start() being sent automatically 
when getRow() gets called. Needs readline() to be fixed first to also return if 
there is no header sent, yet.

> Redirect from _list using view rows
> ---
>
> Key: COUCHDB-514
> URL: https://issues.apache.org/jira/browse/COUCHDB-514
> Project: CouchDB
>  Issue Type: Improvement
>  Components: JavaScript View Server
>Affects Versions: 0.10
>Reporter: Zachary Zolton
> Attachments: list-redir.diff, list_views.diff, render.diff
>
>
> There is no way to redirect from a _list function after calling the getRow() 
> API function.
> Here's a link to the discussion on the dev mailing list:
> http://is.gd/3KZRg

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Updating the CouchDB roadmap

2010-01-25 Thread Brian Candler

> Thanks for reminding me that I should set _all_dbs to hide dbs the
> curertn user can't read if that doesn't incur much additional
> overhead.

I think that it will incur a huge overhead, if there are a large number of
databases and the reader rights are stored within the databases themselves. 
It'll have to open and read every single database file on disk, even if the
user only has access to one.

Storing the rights within the user record avoids this problem completely.

Re: Pinning revs

2010-01-25 Thread Brian Candler

On Mon, Jan 25, 2010 at 02:38:22PM +, Robert Newson wrote:
> when you PUT a new document over an existing one you are implicitly
> removing the document that was previously there. A concurrent read
> operation might see the b+tree before or after that change; either
> answer is consistent with some historical version of the database and
> no locking is required.
> 
> If, instead, you really wanted to make a new version (from your
> applications point of view) you should insert a brand new document and
> add a view (or a naming convention) that lets you find the version
> history.
> 
> A simple idea would be to append the version to the _id.
> (i.e, to 'update' doc1-v1, you would PUT doc1-v2).

That's what I thought of first. Given 1000 revisions of one document, stored
as 1000 separate documents, then (as you say) you can make a view to find
the most recent one.

However, you can't apply a view to a view, so it's then impossible to write
a view which makes use of only the most recent version of a document. It
becomes a bit of a mess.

So I think I need to store all the revisions within a single document.
Options might be:

1. Store all the revisions nested with the JSON document - or store the
prevision revisions as attachments.  Unfortunately, I need to version the
binary attachments too.

2. Store each attachment with a special naming convention, e.g. blob:r1,
blob:r2 etc

3. Store each rev's attachments in a single .zip file attachment.

4. Store each attachment with a name equal to its sha1, and the revisions as
nested JSON each containing an "attachments": member that points to the
sha1's.  Probably the cleanest and also saves duplicating identical content,
but still something of a PITA.

I guess that, as you say, it could be layered on-top of couchdb as some sort
of middleware, or else the client would have to take responsibility for
doing the versioning properly.

(An _update handler could update the JSON part of a multi-rev document, but
I don't think it can do clever stuff with attachments)

Regards,

Brian.

Re: replicator options

2010-01-25 Thread Chris Anderson

On Mon, Jan 25, 2010 at 8:28 AM, Zachary Zolton
 wrote:
> Having the replicator handle chaining views would really help people
> who are already hacking this together with scripts. So, I'd definitely
> +1 the idea. Isn't view size and indexing time a separate problem from
> designing this replicator API?

Yes.

The big missing piece in this view-copy API is:

What to do if the "replication" dies in the middle. Currently with
real replication, you just pick it up where you left off, with the
sequence index.

For something like a group reduce query, I guess you'd just have to
pick up where you left off in the key range. The problem is that
someone may have made updates to the db since you started, and you get
an inconsistent copy of the view.

To properly support this, we'd need an API that allows you to specify
a db-update sequence in your view request. As long as the view haven't
been compacted (and that seq # actually exists as a snapshot point in
the index) then you could pick up with the same index and avoid
inconsistencies.

Chris

>
> On Sun, Jan 24, 2010 at 9:47 PM, Chris Anderson  wrote:
>> On Sun, Jan 24, 2010 at 5:16 PM, Glenn Rempe  wrote:
>>> On Sun, Jan 24, 2010 at 2:11 PM, Chris Anderson  wrote:
>>>
 On Sun, Jan 24, 2010 at 2:04 PM, Glenn Rempe  wrote:
 > On Sun, Jan 24, 2010 at 12:09 AM, Chris Anderson 
 wrote:
 >
 >> Devs,
 >>
 >> I've been thinking there are a few simple options that would magnify
 >> the power of the replicator a lot.
 >>
 >> ...
 >> The fun one is chained map reduce. It occurred to me the other night
 >> that simplest way to present a chainable map reduce abstraction to
 >> users is through the replicator. The action "copy these view rows to a
 >> new db" is a natural fit for the replicator. I imagine this would be
 >> super useful to people doing big messy data munging, and it wouldn't
 >> be too hard for the replicator to handle.
 >>
 >>
 > I like this idea as well, as chainable map/reduce has been something I
 think
 > a lot of people would like to use.  The thing I am concerned about, and
 > which is related to another ongoing thread, is the size of views on disk
 and
 > the slowness of generating them.  I fear that we would end up ballooning
 > views on disk to a size that is unmanageable if we chained them.  I have
 an
 > app in production with 50m rows, whose DB has grown to >100GB, and the
 views
 > take up approx 800GB (!). I don't think I could afford the disk space to
 > even consider using this especially when you consider that in order to
 > compact a DB or view you need roughly 2x the disk space of the files on
 > disk.
 >
 > I also worry about the time to generate chained views, when the time
 needed
 > for generating views currently is already a major weak point of CouchDB
 > (Generating my views took more than a week).
 >
 > In practice, I think only those with relatively small DB's would be able
 to
 > take advantage of this feature.
 >

 For large data, you'll want a cluster. The same holds true for other
 Map Reduce frameworks like Hadoop or Google's stuff.


>>>
>>> That would not resolve the issue I mentioned where views can be a multiple
>>> in size of the original data DB.  I have about 9 views in a design doc, and
>>> my resultant view files on disk are about 9x the size of the original DB
>>> data.
>>>
>>> How would sharding this across multiple DBs in a cluster resolve this?  You
>>> would still end up with views that are some multiple in size of their
>>> original sharded DB. Compounded by how many replicas you have of that view
>>> data for chained M/R.
>>>
>>>
 I'd be interested if anyone with partitioned CouchDB query experience
 (Lounger or otherwise) can comment on view generation time when
 parallelized across multiple machines.


>>> I would also be interested in seeing any architectures that make use of this
>>> to parallelize view generation.  I'm not sure your example of Hadoop or
>>> Google M/R are really valid because they provide file system abstractions
>>> (e.g. Hadoop FS) for automatically streaming a single copy of the data to
>>> where it is needed to be Mapped/Reduced and CouchDB has nothing similar.
>>>
>>> http://hadoop.apache.org/common/docs/current/hdfs_design.html
>>>
>>> Don't get me wrong, I would love to see these things happen, I just wonder
>>> if there are other issues that need to be resolved first before this is
>>> practical for anything but a small dataset.
>>>
>>
>> I know Hadoop and Couch are dissimilar, but the way to parallelize
>> CouchDB view generation is with a partitioned cluster like
>> CouchDB-Lounge or the Cloudant stuff.
>>
>> It doesn't help much with the size inefficiencies but will help with
>> generation time.
>>
>> Chris
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.i

Re: Parallel view generation (was Re: replicator options)

2010-01-25 Thread Chris Anderson

On Mon, Jan 25, 2010 at 3:10 AM, Simon Metson
 wrote:
> Hi,
>        This is OT for the original discussion imho
>
> On 25 Jan 2010, at 01:16, Glenn Rempe wrote:
>
>>> I'd be interested if anyone with partitioned CouchDB query experience
>>> (Lounger or otherwise) can comment on view generation time when
>>> parallelized across multiple machines.
>>>
>>>
>> I would also be interested in seeing any architectures that make use of
>> this
>> to parallelize view generation.  I'm not sure your example of Hadoop or
>> Google M/R are really valid because they provide file system abstractions
>> (e.g. Hadoop FS) for automatically streaming a single copy of the data to
>> where it is needed to be Mapped/Reduced and CouchDB has nothing similar.
>
> IMHO something like HDFS isn't needed, since there's already a simple,
> scalable way of getting at the data. What I'd like (to have time to work
> on...) is the following:
>
> 1. be able to configure a pipeline of documents that are sent to the view
> server
>        1a. be able to set the size of that pipeline to 0, which just sends a
> sane header (there are N documents in the database)
> 2. view server spawns off child processes (I'm thinking Disco, but Hadoop
> would be able to do the same) on the various worker nodes
> 3. each worker is given a range of documents to process, pulls these in from
> _all_docs
> 4. worker processes its portion of the database
> 5. worker returns its results to the view server which aggregates them up
> into the final view
>
> The main issue here is how good your view server is; can it take getting
> 1000's of responses at once? An HTTP view response would be nice... I'm
> pretty sure that CouchDB could handle getting all the requests from workers.
> I think this could also allow for view of view processing, without going
> through/maintaining an intermediate database.

The reason we haven't implemented something like this yet is that it
assumes that your bottleneck is CPU-time, and that it's worth it to
move docs across a cluster to be processed, then return the rows to
the original Couch for indexing.

This might help a little bit in cases where your map function is very
CPU intensive, but you aren't going to get 8x faster by using 8 boxes,
because the bottleneck will quickly become updating the view index
file on the original Couch.

Partitioning (a cluster of say 8 Couches where each Couch has 1/8th
the data) will speed your view generation up by roughly 8x (at the
expense of slightly higher http-query overhead.) In this approach, the
map reduce model on an individual of one of those Couches isn't any
different than it is today. The functions are run close to the data
(no map reduce network overhead), and the rows are stored in 1 index
per Couch, which is what allows the 8x speedup.

Does this make sense?

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Updating the CouchDB roadmap

2010-01-25 Thread Chris Anderson

On Mon, Jan 25, 2010 at 1:14 AM, Brian Candler  wrote:
> On Sun, Jan 24, 2010 at 09:33:02PM -0800, Chris Anderson wrote:
>> To round out this list, I think
>>
>> * Reader ACLs
> ...
>>
>> look like they will make it into 0.11.
>
> That's the jchris/readeracl branch presumably?
>
> I was hoping to turn my counter-proposal(*) into code, but I've not had any
> time to do so unfortunately.
>
> Regards,
>
> Brian.
>
> (*) which was, in summary:
>
> 1. user record has roles like "foo:_reader" or ["foo","_reader"]
>
> 2. _anon user has roles of ":_reader" for all public databases
>
> 3. you can read database foo only if you have one of
>     "foo:_reader", "foo:_admin", "_reader" or "_admin" roles
>
> 4. /_all_dbs lists only those databases to which you or _anon have read access
>   (but shows every database if you have _reader or _admin roles)

Thanks for reminding me that I should set _all_dbs to hide dbs the
curertn user can't read if that doesn't incur much additional
overhead.

Also, I plan to put a Futon interface on the reader and admin lists.

And, the security object still needs work, to round out the capability
set to be something like what you describe here.

>
> 5. userdb validate_doc_update allows someone with "foo:_admin" to add and
>   remove roles foo:*. Also "foo:_manager" to add and remove roles foo:*
>   apart from foo:_admin
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: replicator options

2010-01-25 Thread Zachary Zolton

Having the replicator handle chaining views would really help people
who are already hacking this together with scripts. So, I'd definitely
+1 the idea. Isn't view size and indexing time a separate problem from
designing this replicator API?

On Sun, Jan 24, 2010 at 9:47 PM, Chris Anderson  wrote:
> On Sun, Jan 24, 2010 at 5:16 PM, Glenn Rempe  wrote:
>> On Sun, Jan 24, 2010 at 2:11 PM, Chris Anderson  wrote:
>>
>>> On Sun, Jan 24, 2010 at 2:04 PM, Glenn Rempe  wrote:
>>> > On Sun, Jan 24, 2010 at 12:09 AM, Chris Anderson 
>>> wrote:
>>> >
>>> >> Devs,
>>> >>
>>> >> I've been thinking there are a few simple options that would magnify
>>> >> the power of the replicator a lot.
>>> >>
>>> >> ...
>>> >> The fun one is chained map reduce. It occurred to me the other night
>>> >> that simplest way to present a chainable map reduce abstraction to
>>> >> users is through the replicator. The action "copy these view rows to a
>>> >> new db" is a natural fit for the replicator. I imagine this would be
>>> >> super useful to people doing big messy data munging, and it wouldn't
>>> >> be too hard for the replicator to handle.
>>> >>
>>> >>
>>> > I like this idea as well, as chainable map/reduce has been something I
>>> think
>>> > a lot of people would like to use.  The thing I am concerned about, and
>>> > which is related to another ongoing thread, is the size of views on disk
>>> and
>>> > the slowness of generating them.  I fear that we would end up ballooning
>>> > views on disk to a size that is unmanageable if we chained them.  I have
>>> an
>>> > app in production with 50m rows, whose DB has grown to >100GB, and the
>>> views
>>> > take up approx 800GB (!). I don't think I could afford the disk space to
>>> > even consider using this especially when you consider that in order to
>>> > compact a DB or view you need roughly 2x the disk space of the files on
>>> > disk.
>>> >
>>> > I also worry about the time to generate chained views, when the time
>>> needed
>>> > for generating views currently is already a major weak point of CouchDB
>>> > (Generating my views took more than a week).
>>> >
>>> > In practice, I think only those with relatively small DB's would be able
>>> to
>>> > take advantage of this feature.
>>> >
>>>
>>> For large data, you'll want a cluster. The same holds true for other
>>> Map Reduce frameworks like Hadoop or Google's stuff.
>>>
>>>
>>
>> That would not resolve the issue I mentioned where views can be a multiple
>> in size of the original data DB.  I have about 9 views in a design doc, and
>> my resultant view files on disk are about 9x the size of the original DB
>> data.
>>
>> How would sharding this across multiple DBs in a cluster resolve this?  You
>> would still end up with views that are some multiple in size of their
>> original sharded DB. Compounded by how many replicas you have of that view
>> data for chained M/R.
>>
>>
>>> I'd be interested if anyone with partitioned CouchDB query experience
>>> (Lounger or otherwise) can comment on view generation time when
>>> parallelized across multiple machines.
>>>
>>>
>> I would also be interested in seeing any architectures that make use of this
>> to parallelize view generation.  I'm not sure your example of Hadoop or
>> Google M/R are really valid because they provide file system abstractions
>> (e.g. Hadoop FS) for automatically streaming a single copy of the data to
>> where it is needed to be Mapped/Reduced and CouchDB has nothing similar.
>>
>> http://hadoop.apache.org/common/docs/current/hdfs_design.html
>>
>> Don't get me wrong, I would love to see these things happen, I just wonder
>> if there are other issues that need to be resolved first before this is
>> practical for anything but a small dataset.
>>
>
> I know Hadoop and Couch are dissimilar, but the way to parallelize
> CouchDB view generation is with a partitioned cluster like
> CouchDB-Lounge or the Cloudant stuff.
>
> It doesn't help much with the size inefficiencies but will help with
> generation time.
>
> Chris
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Re: Pinning revs

2010-01-25 Thread Joscha Feth

Robert Newson wrote:

> It's not clear if any of that belongs inside couchdb but clearly
> something like it would be useful to a lot of folks. Perhaps it's
> another tool outside of couchdb that, like couchapp, adds some finesse
> over a fundamental concept?

But isn't there the chance that any external program might miss some
revisions if old revisions are discarded prior to being picked up by
any external process fetching them?

Let's say we have a scenario like this with a program fetching
revisions based on a cronjob which runs in a predefined interval:

A has rev-1
fetching revisions
Update on A. Revison gets bumped to rev-2
Compaction run on database
Update on A. Revison gets bumped to rev-3
fetching revisions <-- no way to fetch rev-2 as it got deleted already.

so for using an external program to do the revision work, there is no
way to just do this at a predefined interval, except if compactions can
be suppressed until it runs again and as far as I understand the
current design might not guarantee this?!

So the only other option would be a program which listens on _changes
and uses the ?since parameter - what happens to the scenario above
then? Are events on _changes preserved even over compaction?

regards,
Joscha

--

Re: Pinning revs

2010-01-25 Thread Paul Davis

On Mon, Jan 25, 2010 at 9:38 AM, Robert Newson  wrote:
> fwiw, I have the same hinky feeling about this proposal. If
> implemented, it would be the case that revisions are a history
> mechanism under user control, when couch has always, and rightly, said
> that it is not.
>
> when you PUT a new document over an existing one you are implicitly
> removing the document that was previously there. A concurrent read
> operation might see the b+tree before or after that change; either
> answer is consistent with some historical version of the database and
> no locking is required.
>
> If, instead, you really wanted to make a new version (from your
> applications point of view) you should insert a brand new document and
> add a view (or a naming convention) that lets you find the version
> history. A simple idea would be to append the version to the _id.
> (i.e, to 'update' doc1-v1, you would PUT doc1-v2). Purging some or all
> history would then be a sequence of DELETE's up to, and exclusive of,
> the latest version. This approach will work correctly through all
> compaction, replication, multi-master and offline scenarios.
>
> It's not clear if any of that belongs inside couchdb but clearly
> something like it would be useful to a lot of folks. Perhaps it's
> another tool outside of couchdb that, like couchapp, adds some finesse
> over a fundamental concept?
>
> B.
>
>
>
> On Mon, Jan 25, 2010 at 1:08 PM, Robert Dionne
>  wrote:
>> I gave this some more thought over the weekend and don't think it's a good 
>> idea. Admittedly I'm not as deep in the code as the core devs but this 
>> strikes me as non-trivial to get right. There has also been a lot of effort 
>> put into telling folks to not think of _rev as a history mechanism. It seems 
>> very doable but as you point out it needs a strategy for retention, which 
>> would likely need to be configurable for different scenarios and there would 
>> need to be a strategy for replication also, how much history to carry along 
>> and so forth.
>>
>> If this is not best done by clients I think something in the server that was 
>> entirely orthogonal, .eg. based on some changes notification and using a log 
>> or different store, would be better. This would keep the design simpler and 
>> enable users to leave it out if not needed.
>>
>> Just my two cents but I'd be -0 on it
>>
>> Regards,
>>
>> Bob
>>
>>
>>
>> On Jan 24, 2010, at 5:20 AM, Brian Candler wrote:
>>
>>> Have there been any more thoughts about being able to use _rev as a history
>>> mechanism?
>>>
>>> I think this just means that certain older _revs can survive compaction, and
>>> ISTM that the simplest way to achieve this would be to have a bit which
>>> marks a particular revision as "pinned" (cannot be discareded).  This would
>>> be very flexible.  For example, you could prune this bit so that you keep
>>> one revision per day for the last week, one revision per month before that,
>>> and so on.  When making an update in a wiki, you could pin the previous
>>> revision only if it's more than 1 hour old, allowing multiple updates within
>>> this window to be coalesced.
>>>
>>> I think this would be a very convenient mechanism, and much moreso than
>>> building a document with all the previous versions of interest within the
>>> document itself, or as attachments.
>>>
>>> I've even considered introducing artificial conflicts into the database
>>> purely as a way to retain previous revs, but that's pretty messy.
>>>
>>> Regards,
>>>
>>> Brian.
>>
>>
>

As Bob and Rob point out, it may seem easy at first blush, but
replication ends up getting fairly complicated. What happens when
you're trying to replicate between two servers that have different
sets of revisions pinned?

HTH,
Paul Davis

Re: Pinning revs

2010-01-25 Thread Robert Newson

fwiw, I have the same hinky feeling about this proposal. If
implemented, it would be the case that revisions are a history
mechanism under user control, when couch has always, and rightly, said
that it is not.

when you PUT a new document over an existing one you are implicitly
removing the document that was previously there. A concurrent read
operation might see the b+tree before or after that change; either
answer is consistent with some historical version of the database and
no locking is required.

If, instead, you really wanted to make a new version (from your
applications point of view) you should insert a brand new document and
add a view (or a naming convention) that lets you find the version
history. A simple idea would be to append the version to the _id.
(i.e, to 'update' doc1-v1, you would PUT doc1-v2). Purging some or all
history would then be a sequence of DELETE's up to, and exclusive of,
the latest version. This approach will work correctly through all
compaction, replication, multi-master and offline scenarios.

It's not clear if any of that belongs inside couchdb but clearly
something like it would be useful to a lot of folks. Perhaps it's
another tool outside of couchdb that, like couchapp, adds some finesse
over a fundamental concept?

B.

On Mon, Jan 25, 2010 at 1:08 PM, Robert Dionne
 wrote:
> I gave this some more thought over the weekend and don't think it's a good 
> idea. Admittedly I'm not as deep in the code as the core devs but this 
> strikes me as non-trivial to get right. There has also been a lot of effort 
> put into telling folks to not think of _rev as a history mechanism. It seems 
> very doable but as you point out it needs a strategy for retention, which 
> would likely need to be configurable for different scenarios and there would 
> need to be a strategy for replication also, how much history to carry along 
> and so forth.
>
> If this is not best done by clients I think something in the server that was 
> entirely orthogonal, .eg. based on some changes notification and using a log 
> or different store, would be better. This would keep the design simpler and 
> enable users to leave it out if not needed.
>
> Just my two cents but I'd be -0 on it
>
> Regards,
>
> Bob
>
>
>
> On Jan 24, 2010, at 5:20 AM, Brian Candler wrote:
>
>> Have there been any more thoughts about being able to use _rev as a history
>> mechanism?
>>
>> I think this just means that certain older _revs can survive compaction, and
>> ISTM that the simplest way to achieve this would be to have a bit which
>> marks a particular revision as "pinned" (cannot be discareded).  This would
>> be very flexible.  For example, you could prune this bit so that you keep
>> one revision per day for the last week, one revision per month before that,
>> and so on.  When making an update in a wiki, you could pin the previous
>> revision only if it's more than 1 hour old, allowing multiple updates within
>> this window to be coalesced.
>>
>> I think this would be a very convenient mechanism, and much moreso than
>> building a document with all the previous versions of interest within the
>> document itself, or as attachments.
>>
>> I've even considered introducing artificial conflicts into the database
>> purely as a way to retain previous revs, but that's pretty messy.
>>
>> Regards,
>>
>> Brian.
>
>

Re: Nightly and binary builds

2010-01-25 Thread Joscha Feth

Lincoln Stoll wrote:

> FWIW I've setup a nightly builder for CouchDBX - it can be found here:
> 
> http://couch.lstoll.net/nightly/

great, I added a link to this site in the wiki!

regards,
Joscha

--

Re: Pinning revs

2010-01-25 Thread Robert Dionne

I gave this some more thought over the weekend and don't think it's a good 
idea. Admittedly I'm not as deep in the code as the core devs but this strikes 
me as non-trivial to get right. There has also been a lot of effort put into 
telling folks to not think of _rev as a history mechanism. It seems very doable 
but as you point out it needs a strategy for retention, which would likely need 
to be configurable for different scenarios and there would need to be a 
strategy for replication also, how much history to carry along and so forth. 

If this is not best done by clients I think something in the server that was 
entirely orthogonal, .eg. based on some changes notification and using a log or 
different store, would be better. This would keep the design simpler and enable 
users to leave it out if not needed.

Just my two cents but I'd be -0 on it

Regards,

Bob

On Jan 24, 2010, at 5:20 AM, Brian Candler wrote:

> Have there been any more thoughts about being able to use _rev as a history
> mechanism?
> 
> I think this just means that certain older _revs can survive compaction, and
> ISTM that the simplest way to achieve this would be to have a bit which
> marks a particular revision as "pinned" (cannot be discareded).  This would
> be very flexible.  For example, you could prune this bit so that you keep
> one revision per day for the last week, one revision per month before that,
> and so on.  When making an update in a wiki, you could pin the previous
> revision only if it's more than 1 hour old, allowing multiple updates within
> this window to be coalesced.
> 
> I think this would be a very convenient mechanism, and much moreso than
> building a document with all the previous versions of interest within the
> document itself, or as attachments.
> 
> I've even considered introducing artificial conflicts into the database
> purely as a way to retain previous revs, but that's pretty messy.
> 
> Regards,
> 
> Brian.

Re: Pinning revs

2010-01-25 Thread Joscha Feth

Brian Candler wrote:

> Have there been any more thoughts about being able to use _rev as a
> history mechanism?

+1 from here - this would make the revision scenario for my current
project incredibly easy!

regards,
Joscha

--

Parallel view generation (was Re: replicator options)

2010-01-25 Thread Simon Metson


Hi,
This is OT for the original discussion imho

On 25 Jan 2010, at 01:16, Glenn Rempe wrote:


I'd be interested if anyone with partitioned CouchDB query experience
(Lounger or otherwise) can comment on view generation time when
parallelized across multiple machines.


I would also be interested in seeing any architectures that make use  
of this
to parallelize view generation.  I'm not sure your example of Hadoop  
or
Google M/R are really valid because they provide file system  
abstractions
(e.g. Hadoop FS) for automatically streaming a single copy of the  
data to
where it is needed to be Mapped/Reduced and CouchDB has nothing  
similar.


IMHO something like HDFS isn't needed, since there's already a simple,  
scalable way of getting at the data. What I'd like (to have time to  
work on...) is the following:


1. be able to configure a pipeline of documents that are sent to the  
view server
	1a. be able to set the size of that pipeline to 0, which just sends a  
sane header (there are N documents in the database)
2. view server spawns off child processes (I'm thinking Disco, but  
Hadoop would be able to do the same) on the various worker nodes
3. each worker is given a range of documents to process, pulls these  
in from _all_docs

4. worker processes its portion of the database
5. worker returns its results to the view server which aggregates them  
up into the final view


The main issue here is how good your view server is; can it take  
getting 1000's of responses at once? An HTTP view response would be  
nice... I'm pretty sure that CouchDB could handle getting all the  
requests from workers. I think this could also allow for view of view  
processing, without going through/maintaining an intermediate database.

Cheers
Simon

Re: couchdb rewrite handler

2010-01-25 Thread Benoit Chesneau

On Mon, Jan 25, 2010 at 6:06 AM, Chris Anderson  wrote:
> On Sun, Jan 24, 2010 at 11:00 AM, Benoit Chesneau  wrote:
>> Hi,
>>
>> Folloging suggestion of @jchris I revisited my rewrite handler. This
>> time instead of using a javascript function to handle rewriting it
>> uses pattern matching in Erlang to do it. The rewriting root is the
>> the design doc :
>>
>> /yourdb/_design/ddocname/_rewrite/
>>
>> ddocname should contain a "rewrites" member which a list of rewriting
>> rules. If not it will return a 404.
>>
>> ex :
>>
>> {
>>    
>>    "rewrite": [
>>        {
>>            "from": "",
>>            "to": "index.html",
>>            "method": "GET",
>>            "query": {}
>>        }
>>    ]
>> }
>>
>> Urls are relatives to the db if they start by / or to the current path.
>>
>> Rewriting can use variables. Variables in path are prefixed by ":".
>> For example the following rule:
>>
>> { "from": "show/:id", "to": "_show/mydoc/:id" }
>>
>> will rewrite
>>          "/mydb/_design/test/_rewrite/show/someid" to
>> "/mydb/_design/test/_rewrite/_show/someid".
>>
>
> do you mean?
>
> "/mydb/_design/test/_show/someid"

yes sorry.

>
>> or { "from": "view/:type", "to": "_list/types/by_types", query: {
>> "key": "type" }
>> will rewrite :
>>
>>          "/mydb/_design/test/_rewrite/view/sometype" to
>> "/mydb/_design/test/_rewrite/_list/types/by_types?key=sometype".
>>
>
> do you mean?
>
> "/mydb/_design/test/_list/types/by_types?key=sometype"
and yes. Lack of sleep I guess

Re: Updating the CouchDB roadmap

2010-01-25 Thread Brian Candler

On Sun, Jan 24, 2010 at 09:33:02PM -0800, Chris Anderson wrote:
> To round out this list, I think
> 
> * Reader ACLs
...
> 
> look like they will make it into 0.11.

That's the jchris/readeracl branch presumably?

I was hoping to turn my counter-proposal(*) into code, but I've not had any
time to do so unfortunately.

Regards,

Brian.

(*) which was, in summary:

1. user record has roles like "foo:_reader" or ["foo","_reader"]

2. _anon user has roles of ":_reader" for all public databases

3. you can read database foo only if you have one of
 "foo:_reader", "foo:_admin", "_reader" or "_admin" roles

4. /_all_dbs lists only those databases to which you or _anon have read access
   (but shows every database if you have _reader or _admin roles)

5. userdb validate_doc_update allows someone with "foo:_admin" to add and
   remove roles foo:*. Also "foo:_manager" to add and remove roles foo:*
   apart from foo:_admin

Re: Document validation involving other documents

2010-01-25 Thread Brian Candler

On Sun, Jan 24, 2010 at 12:21:25PM -0800, Chris Anderson wrote:
> The problem with this approach is that validation is run during
> replication as well, so any multi-doc data dependencies become
> problematic in ad-hoc clusters.

But not every application makes sense as an ad-hoc cluster. In tight-knit
clusters the databases trust each other and you want the data to be as
coherent as possible, so you'd run replication as a user which has
permit-everything rights in validate_doc_update.

In these models you're more interested in validating the data once at its
point of entry, not at every point of replication.

It would be horrendous to have a document in instance 1 but not in instance
2, just because it was accepted initially according to some set of rules,
but failed to replicate because the rules had changed in the mean time.

This is especially true if the rules themselves are documents, and hence may
be a bit stale. At worst you may accept an update which would be invalid if
you had the most up-to-date rules, or reject one which would be valid, but
the fact that you *did* accept or reject it should be consistent throughout
the cluster.

Re: Pinning revs

2010-01-25 Thread Metin Akat

Absolutely! I'm voting on this one. That would ease my work immensely.

On Sun, Jan 24, 2010 at 12:20 PM, Brian Candler  wrote:
> Have there been any more thoughts about being able to use _rev as a history
> mechanism?
>
> I think this just means that certain older _revs can survive compaction, and
> ISTM that the simplest way to achieve this would be to have a bit which
> marks a particular revision as "pinned" (cannot be discareded).  This would
> be very flexible.  For example, you could prune this bit so that you keep
> one revision per day for the last week, one revision per month before that,
> and so on.  When making an update in a wiki, you could pin the previous
> revision only if it's more than 1 hour old, allowing multiple updates within
> this window to be coalesced.
>
> I think this would be a very convenient mechanism, and much moreso than
> building a document with all the previous versions of interest within the
> document itself, or as attachments.
>
> I've even considered introducing artificial conflicts into the database
> purely as a way to retain previous revs, but that's pretty messy.
>
> Regards,
>
> Brian.
>

40 matches

Mail list logo