Re: Addition of modify-on-document-write hooks

2010-09-19 Thread Benoit Chesneau
On Mon, Sep 20, 2010 at 12:34 AM, Randall Leeds  wrote:
> On Thu, Sep 9, 2010 at 12:19, James Jackson  wrote:
>> Hi all,
>>
>> Moving this from the users forum, as it appears what I'm after isn't 
>> currently available. For the security model I with to implement in a 
>> production CouchDB cluster, I would like to be able to force a field to be 
>> written to all docs based on the user context. The _update functionality is 
>> not what I am after as it requires the user to actually call it when writing 
>> a document (means security could be got-around by not calling this, and 
>> setting the required field in the passed document to something arbitrary, 
>> which would then not get caught by a validation function), and can't modify 
>> a document which is passed to it (as far as I can tell it can only modify 
>> existing documents, or create new ones).
>
> Is the rewrite handler powerful enough to force normal PUT operations
> to go through an _update function? Would this break replication? Just
> a quick, off-the-cuff thought.
>
A _rewrite rule can have a `method` property. So you can redirect
differently based on the request method (GET, POST, PUT, ...). So yes,
it's eventually possible to mimic the CouchDB api behind a _rewrite/ .

- benoit


[jira] Commented: (COUCHDB-889) improved docs for windows compile from source in INSTALL.Windows

2010-09-19 Thread Mark Hammond (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912304#action_12912304
 ] 

Mark Hammond commented on COUCHDB-889:
--

Many of the changes look good, but have the following comments:

* I wonder if including today's current versions is a good thing - it seems it 
will just increase the bitrot in the future.  Can we just default to "latest 
available" except in the cases where we know it is not?  If not, then we 
probably can't justify the existing "latest available" references either.

* Where is nsis used?

* The inclusion of your entire PATH doesn't seem to add much value either - if 
the instructions are correct the path will be correct - so something is 
redundant.  Less is more when it comes to busy people trying to get a build up, 
and the specified "perfect path" will be incorrect if the retail version of 
MSVC is used, for example.

* Finally, using seamonkey instead of spidermonkey is a fair bit more effort 
wrt compilation - it might be reasonable to note that spidermonkey can be used 
if the reader can decypher the build instructions, or at least indicate 
something like "almost all mozilla products will build the spidermonkey we need 
(and spidermonkey can, with some difficulty, even be built stand-alone)  - 
below are instructions for seamonkey, but get a spidermonkey using whatever 
technique you like"

> improved docs for windows compile from source in INSTALL.Windows
> 
>
> Key: COUCHDB-889
> URL: https://issues.apache.org/jira/browse/COUCHDB-889
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Build System, Documentation
>Affects Versions: 0.11.2, 1.0.1
> Environment: Windows only.
>Reporter: Dave Cottlehuber
> Fix For: 0.11.3, 1.0.2
>
> Attachments: windows_build_from_source_docs.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> ./INSTALL.Windows does not have enough detail to compile from source, due to 
> internet bit rot. 
> Updates include -
> - clarification on versions for 32-bit and 64-bit compile setup
> - using free Microsoft Visual Studio 2008 Express C++ compiler instead of 
> full commercial release
> - improved details on building javascript, libcurl from source

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Addition of modify-on-document-write hooks

2010-09-19 Thread Randall Leeds
On Thu, Sep 9, 2010 at 12:19, James Jackson  wrote:
> Hi all,
>
> Moving this from the users forum, as it appears what I'm after isn't 
> currently available. For the security model I with to implement in a 
> production CouchDB cluster, I would like to be able to force a field to be 
> written to all docs based on the user context. The _update functionality is 
> not what I am after as it requires the user to actually call it when writing 
> a document (means security could be got-around by not calling this, and 
> setting the required field in the passed document to something arbitrary, 
> which would then not get caught by a validation function), and can't modify a 
> document which is passed to it (as far as I can tell it can only modify 
> existing documents, or create new ones).

Is the rewrite handler powerful enough to force normal PUT operations
to go through an _update function? Would this break replication? Just
a quick, off-the-cuff thought.


Re: CouchDb not releasing files

2010-09-19 Thread Randall Leeds
If the bug is confirmed it should be on JIRA if it is not already. If
you have a test case that reproduces it that would be fanstastic
(bonus points for a JS test in Futon).

It's my opinion something this serious should block 1.1, but
ultimately that is up to the committers to determine, yes?

On Sun, Sep 19, 2010 at 22:09, [mRg]  wrote:
> Hi all,
>
> This is just a cross-post to highlight a thread on the user list. (with the
> same name as this one - all details etc are on there, happy to repeat here
> if needed). It seems this problem was discussed by some devs at CouchCamp
> and seems others are suffering this issue. I was just wondering if there was
> a JIRA issue related to this that I/we can track and if a fix for this will
> be included in any upcoming released (1.1 ?).
>
> Regards
>
> Stephen
>


CouchDb not releasing files

2010-09-19 Thread [mRg]
Hi all,

This is just a cross-post to highlight a thread on the user list. (with the
same name as this one - all details etc are on there, happy to repeat here
if needed). It seems this problem was discussed by some devs at CouchCamp
and seems others are suffering this issue. I was just wondering if there was
a JIRA issue related to this that I/we can track and if a fix for this will
be included in any upcoming released (1.1 ?).

Regards

Stephen


Re: View HTTP API extension proposal

2010-09-19 Thread Chris Anderson
On Sun, Sep 19, 2010 at 5:47 AM, Jan Lehnardt  wrote:
> Hi Tomas,
>
> this sounds like a valuable addition. Back in the day I remember skip allowed 
> for negative values to skip backwards, I'm not sure what happened to that.
>

I was just about to write in, why not use skip=-1 and limit =3?

if negative skip is no longer supported, is this intentional or an
accident? if it is intentional are the reasons good? if not we should
fix it because negative skip seems quite useful.

Chris


> `diameter` or how we want to call it would come with the same caveat that 
> `skip` comes with as in it should only be used with "small" values as it's 
> access is unindexed. Other than that, it sounds useful to me.
>
> I'm sure you know this, but currently the way to get prev-next links is being 
> smart with all the view options:
>
>  http://guide.couchdb.org/draft/recipes.html#pagination
>
> with the caveat that "jump to page" isn't really possible, but see the 
> chapter for details.
>
> Cheers
> Jan
> --
>
>
> On 6 Sep 2010, at 17:04, Tomas Sedovic wrote:
>
>> Hey all,
>>
>> I'd like to propose a small addition to the HTTP View API. I can open
>> a ticket later and maybe even submit a patch, but I want to discuss it
>> with y'all first.
>>
>> This new View extension would get you a specified document (by the
>> View key) plus a few documents before and after it (with regards to
>> that View's sort order).
>>
>> Say that calling this View:
>>
>>    http://southparkelementary.edu/database/_design/students/_view/names
>>
>> gives the following response (sorted by the key):
>>
>>    {"total_rows":7,"offset":0,"rows":[
>>    {"id":"624","key":"broflovski","value":"Kyle Broflovski"},
>>    {"id":"928","key":"cartman","value":"Eric Cartman"},
>>    {"id":"848","key":"marsh","value":"Stan Marsh"},
>>    {"id":"433","key":"mccormick","value":"Kenny McCormic"},
>>    {"id":"855","key":"stotch","value":"Butters Stotch"},
>>    {"id":"489","key":"testaburger","value":"Wendy Testaburger"},
>>    {"id":"292","key":"vulmer","value":"Jimmy Vulmer"}
>>    ]}
>>
>> Now, what I'd like to have is this:
>>
>>    
>> http://southparkelementary.edu/database/_design/students/_view/names?key="mccormick"&diameter=1
>>
>> which would return:
>>
>>    {"total_rows":7,"offset":2,"rows":[
>>    {"id":"848","key":"marsh","value":"Stan Marsh"},
>>    {"id":"433","key":"mccormick","value":"Kenny McCormic"},
>>    {"id":"855","key":"stotch","value":"Butters Stotch"}
>>    ]}
>>
>> (I'm not sure what's the best word for this query argument. Here are
>> some other suggestions: vicinity, surroundings, neighborhood, nearby)
>>
>> Essentially, this combines two Views you can get by the clever use of
>> `startkey/endkey`, `limit` and `descending` arguments. The advantage
>> of this API addition is that it can be used in the CouchDB Lists.
>>
>> The obvious use case are the previous/next links between document
>> pages. Following the example, if I had a web interface where the South
>> Park elementary teachers would view the pages of the students, it
>> would be nice if every student's page had a link to the previous and
>> next student along with their names and small photos. This means that
>> the List generating the student's page must have the access to the
>> previous and next documents in the given View.
>>
>> For example getting a student's page:
>>
>>    
>> http://southparkelementary.edu/database/_design/students/_list/student_page/names?key="mccormick"&diameter=1
>>
>> would generate a similar HTML structure:
>>
>>    ...
>>    Student: Kenny McCormic
>>    ... (additional data)
>>
>>    
>>    > href="/database/_design/students/_list/student_page/names?key="marsh"&diameter=1">previous:
>> Stan Marsh
>>
>>    
>>    > href="/database/_design/students/_list/student_page/names?key="stotch"&diameter=1">next:
>> Butters Stotch
>>    
>>
>> As far as I can tell, there are two ways of doing that today:
>>
>> a) client-side
>> b) add the linking logic directly to the documents in the database
>>
>> The first option is not always feasible/desirable and I dislike the
>> second option because of its inflexibility. To maintain a
>> doubly-linked structure across the database means that changing a
>> single document could lead up to three separate document changes. And
>> the complexity raises if we want to have multiple ways of sorting.
>>
>> However, this extension directly plays into the strength of Views,
>> which is that you can have the same set of standalone documents sorted
>> by several different rules. You can use this in Lists to generate
>> linked pages by different sorting rules.
>>
>> It would also play nicely with the URL rewriting mechanisms, because
>> if a list can access the previous/next documents, it can use their
>> contents to generate the pretty URLs that are the rave these days.
>>
>> I have limited knowledge of the CouchDB internals, but from what I
>> know, it doesn't look like a big problem. As the views are B+trees,
>> the leafs for

Re: Addition of modify-on-document-write hooks

2010-09-19 Thread Chris Anderson
On Sun, Sep 19, 2010 at 5:37 AM, Jan Lehnardt  wrote:
>
> On 14 Sep 2010, at 03:26, J Chris Anderson wrote:
>
>>
>> On Sep 13, 2010, at 6:23 PM, Simon Metson wrote:
>>
>>> Hi James.
>>>      I think the thing to do is require that a document has a user field, 
>>> and that the value of that field matches the userCtx in the 
>>> validate_doc_update function. This then pushes the issue client side, and 
>>> makes the servers life easier. It could also be added by the front end 
>>> apache in the case of our deployment, I think. I can see this sort of 
>>> trigger thing being a good way of giving people a loaded gun aimed at their 
>>> foot, they certainly are in Oracle if you're not careful.
>>> Cheers
>>> Simon
>>
>> The big issue is that any code which runs on normal document updates, will 
>> also run during replication, as replication is just a normal client. So this 
>> means that adding a field will happen not just on the original client PUT 
>> but also when replication happens.
>>
>> This is why _update is a separate handler.
>
> Adding a required field should be idempotent (correct me if I
> am wrong) so it doesn't matter that replication is an agent of
> the user.

people want to add timestamps. they will call it doc.created_at and
then be surprised when it behaves like doc.replicated_at

the alternative of not setting doc.created_at unless it is null leaves
open the potential for clients to spoof timestamps (just like they can
now).

so I don't see a real use case for something like this, except for
trivial things like forcing that doc.foo == "foo" always, but what's
the point in that?

Chris

>
> In the past we talked about "blessing" a ddoc / update function
> that would magically invoke the update function on every write.
> (analog for _show and _list) and people seem to like to explore
> that idea. That said, the "magic" bit worries me a little :)
>
> Cheers
> Jan
> --
>
>
>
>>
>> Chris
>>
>>>
>>> On 9 Sep 2010, at 05:19, James Jackson wrote:
>>>
 Hi all,

 Moving this from the users forum, as it appears what I'm after isn't 
 currently available. For the security model I with to implement in a 
 production CouchDB cluster, I would like to be able to force a field to be 
 written to all docs based on the user context. The _update functionality 
 is not what I am after as it requires the user to actually call it when 
 writing a document (means security could be got-around by not calling 
 this, and setting the required field in the passed document to something 
 arbitrary, which would then not get caught by a validation function), and 
 can't modify a document which is passed to it (as far as I can tell it can 
 only modify existing documents, or create new ones).

 I see this ticket:

 https://issues.apache.org/jira/browse/COUCHDB-441

 which talks about the functionality I am after, but appears to have 
 morphed into what is now there.

 I am willing to implement such functionality, if it already doesn't exist, 
 but wonder if this would be welcome in the trunk, or if there are killer 
 pitfalls which stop this being possible. I note that in the discussion on 
 that ticket there is talk of how to deal with multiple such 
 modify-on-write functions, perhaps this is one area that needs discussion?

 In any case, I'll probably implement this for our CouchDB installation, 
 but it would be good to make it generic and globally useful such that I 
 can contribute it back. I know of a number of people who would like this 
 functionality...

 Regards,
 James.
>>>
>>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io


Re: multiview on github

2010-09-19 Thread Norman Barker
Bob,

it is just checking that a given id participates in a view, if it
makes it around the ring then it wins and gets streamed to the client,
adding disjoints would be fairly simple. Currently the only way I can
check if an id is in a view is to loop over the results of each view,
hence each node in the ring is in its own process to keep things
moving.

A use case is two views, one that emits datetime (numeric) and another
view that emits values, e.g. A, B, C ..., the query would then be to
find the all documents with value A between start time and end time.

Norman

On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
 wrote:
> I took another peek at this and I'm curious as to what it's doing. Is it just 
> checking that a given id participates in a view? So if it makes it around the 
> ring it wins? Or is it actually computing the result of passing the doc thru 
> all the views?
>
> If the answer is the former then would disjunction also be something one 
> might want? I'm just curious, I don't have a use case and I forget the 
> original discussion around this. I sort of think of views as a functional 
> mapping from the database to some subset. That's not entirely accurate given 
> there's this reduce phase also. So I could imagine composing views in a 
> functional way, but the same thing can be had with just a different map 
> function that is the composition.
>
> Anyway if you have a brief description of this, with a use case,  it would 
> help.
>
> Cheers,
>
> Bob
>
>
>
>
> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>
>> Chris, James
>>
>> thanks for bumping this, we are using this internally at 'scale'
>> (million+ keys). I want this to work for couchdb as we want to give
>> back for such a great product and support this going forward, so any
>> suggestions welcomed and we will test and add them to the local github
>> account with the aim of getting this into trunk.
>>
>> Norman
>>
>> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
>> wrote:
>>> I want to use it!  I just haven't gotten around to it.  I was going to try
>>> and test it out this weekend and if I am able, I will certainly report back
>>> what I find.
>>>
>>> James
>>>
>>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
>>>
 On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
 wrote:
> Bob,
>
> I can and have been testing the multiview at this scale, it is ok
> (fast enough), but I think being able to test inclusion of a document
> id in a view without having to loop would be a considerable speed
> improvement. If you have any ideas let me know.
>

 I just want to bump this thread, as I think this is a useful feature.
 I don't expect to be able to test it in the coming weeks, but if I did
 I would. Is anyone besides Norman using this? Has anyone used it at
 scale?

 Cheers,
 Chris

> thanks,
>
> Norman
>
> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
 wrote:
>> I'm sorry, I've had no time to play with this at scale.
>>
>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
 wrote:
>>> Hi,
>>>
>>> are there any more comments on this, if not can you describe the
>>> process (in particular how to obtain a wiki and jira account for
>>> couchdb which I have been unable to do) and I will start documenting
>>> this so we can put this into the trunk.
>>>
>>> Bob, were you able to do any more testing with large views, are there
>>> any suggestions on how to speed up the document id inclusion test as
>>> described below?
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
 norman.bar...@gmail.com> wrote:
 Bob,

 thanks for the feedback and for taking a look at the code. Guidelines
 on when to use a supervisor within couchdb with a gen_server would be
 appreciated, currently I have a supervisor and a gen_server, but if
 couchdb has a supervision process I could remove that layer.

 I think plugins is a great idea, however intersection of views is such
 as common request, perhaps there needs to plugin system and if a
 plugin is rated enough it goes into trunk as a core feature.

 the four (or slightly more) summary is here


 http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl

 %
 % send an id from the start list to the next node in the ring, if the
 id is in adjacent node then the this node sends to the next ring node
 
 % if the id gets all round the ring and back to the start node then is
 has intersected all queries and should be included. The nodes in the
 ring
 % should be sorted in size from small to large for this to be
 effective
 %
 % In addition send the initial id list round in par

Re: Rep. bug in R...... 1.0.1?

2010-09-19 Thread Nikolai Teofilov
Hi Jan,

I have had difficult time with the spam filter to post massages and open simply 
a ticket:

https://issues.apache.org/jira/browse/COUCHDB-885

There is also a script that reproduce this behavior inside. After a short 
discussion with Klaus, I am still not sure if this is a bug or not, but Please 
take a look again for sure. Furthermore if you try to repeat the steps manually 
from Futon it behave differently.  

Cheers
Nikolai

On 19.09.2010, at 14:34, Jan Lehnardt wrote:

> Hi Nikolai,
> 
> sorry to be terse, but can you provide a short script that
> exercises the behaviour? Ideally with placeholders for
> the two CouchDB URLs so we can fill in values for our 
> testing environment.
> 
> Cheers
> Jan
> -- 
> 
> On 11 Sep 2010, at 20:16, Nikolai Teofilov wrote:
> 
>> Hi Adam,
>> 
>> The words "pull" in step 4 and "push" in step 6 are correct. I exchanged the 
>> places of the curl commands ...
>> 
>> The idea is common scenario ... to have master db and each slave server get 
>> local copy of the master, make local changes ... (attach new files) and send 
>> the modified copy back to the master. The problem appears only if the 
>> documents have been updated with new attachments and only between databases  
>> on two different servers. It looks like by sending back a document updated 
>> with new attachment will affect the _rev number and a kind of side effect 
>> appears so if you try to delete those document on the remote db the last 
>> revision of the document before the update will be still in the database. It 
>> could be that this is correct but I think the delete operation of a document 
>> should remove all its revisions as well, correct?
>> 
>> 
>> 1.   -  make remote_db  (on different machine!)
>> 2.   -  create a doc  on the  remote_db
>> 3.   -  make local_db (on different machine from the remote couchdb!)
>> 4.   - (trigger from the local couchdb!)  remote_db->local_db
>> 5.   - put an attachment on local_db/doc
>> 6.  - trigger from local couchdb!   local_db -> remote_db
>> 7.  - try to delete the remote_db/doc
>>  the result should be the last _rev is deleted but a copy of the doc is 
>> still in the remote_db with the initial _rev number.
>> 
>> I am almost sure it is a bug because if you try this on a one couchdb server 
>> there is no such a problem. If you try with document without attachment 
>> there is no problem as well and the documents in both last cases are deleted 
>> completely.
>> 
>> Cheers 
>> Nikolai
>> 
>> 
>> On 10.09.2010, at 01:44, Adam Kocoloski wrote:
>> 
>>> Hi Nikolai, I'm not sure I understand.  In step 4 you said "pull ..." 
>>> but what you actually did was push the local (empty?) test database to the 
>>> remote server.  After that the subsequent steps don't make sense.  Can you 
>>> try describing the steps again?  Best,
>>> 
>>> Adam
>>> 
>> 
> 



Re: replication bug

2010-09-19 Thread Jan Lehnardt

On 24 Aug 2010, at 18:26, Nathan Stott wrote:

> I tried it, didn't fix my issue.

Can you open a new JIRA issue for this so we won't forget about it?

Cheers
Jan
-- 


> 
> On Tue, Aug 24, 2010 at 9:38 AM, Adam Kocoloski  wrote:
>> Hi Nathan, did you get a chance to see if 
>> https://issues.apache.org/jira/browse/COUCHDB-868 fixed this issue?
>> 
>> Adam
>> 
>> On Aug 23, 2010, at 3:57 PM, Nathan Stott wrote:
>> 
>>> I've identified a bug in replication in couchdb.
>>> 
>>> Here are the steps to reproduce:
>>> 
>>> Create a user named "bubba"
>>> Create a database with a design document that has attachments.
>>> Make this database have "bubba" as an admin and set a reader role of 
>>> "readme"
>>> 
>>> Try to replicate this DB on another machine with credentials for bubba
>>> in the URL (http://bubba:passw...@remotemachine:port/mydb)
>>> 
>>> You will receive 401s in the log in attachments.  It does not matter
>>> whether you give bubba the "readme" role or not, the results are the
>>> same.  Remove the attachment and the design doc will replicate fine.
>>> Remove the "readers" from the security object of the DB and the desing
>>> doc will replicate fine.
>>> 
>>> This is tested and reproduced on 1.0.1
>> 
>> 



Re: splitting the code in different apps or rewrite httpd layer

2010-09-19 Thread Jan Lehnardt

On 23 Aug 2010, at 13:46, Benoit Chesneau wrote:

> On Mon, Aug 23, 2010 at 1:07 PM, Robert Dionne
>  wrote:
>> 
>> 
>> 
>> On Aug 22, 2010, at 4:58 PM, Mikeal Rogers wrote:
>> 
>>> One idea that was floated at least once was to replace all the code 
>>> currently have on top of mochiweb directly with webmachine.
>> 
>> If I recall, Paul Davis did some prototyping work on this at one point
>> 
> 
> Yes some parts is on its repo some other on mine. But it's a 6 months
> old work now.

Does that mean you consider it a failed experiment? If yes, why? If not,
should we get some effort going to finish the code and get it into trunk?

Cheers
Jan
-- 



Re: Rekindle discussion: `reduce=false` fails unpredictably

2010-09-19 Thread Dirkjan Ochtman
On Mon, Aug 30, 2010 at 10:01, Jason Smith  wrote:
> I propose a minor change to validation: a simple check is made to
> determine if the extra parameter would result in a no-op. If so, no
> exception is thrown. Therefore:
>
> map view, reduce=false -> Allowed
> map view, reduce=true -> query_parse_error
> map view, group or group_level -> no change to today's behavior
> map/reduce view -> no change to today's behavior
>
> (It can't be known at query time whether group and group_level no-op.
> In general they do not. Therefore the client must explicitly get it
> right.)
>
> If this is acceptable, I will submit the patch to JIRA. Thank you.

Sounds awesome. I filed
https://issues.apache.org/jira/browse/COUCHDB-845 about similar
issues.

Cheers,

Dirkjan


Re: couchdb memory issues/leaks with validators and 20MB+ json docs

2010-09-19 Thread Jan Lehnardt

On 30 Aug 2010, at 08:34, sgoto wrote:

> Hey everyone,
> 
> I'm using couchdb to store docs that are somewhat large (20MB+), but
> within the configured max size.
> 
> Storing the docs isn't a problem, couchdb seems to handle it fine. I am
> having problems when using function validators and couchdb hanging my
> machine after all the memory resources are consumed on PUTs.
> 
> Below is a quick explanation of the issue I'm seeing.
> 
> Ideas ?
> 
> sam
> 
> 
> how to reproduce:
> 
> 1) create a db called testdb
> 
> 2) create an empty javascript validator function
> 
> function(newDoc, oldDoc, user) {}
> 
> 3) create a fake 20MB doc
> 
> if=/dev/zero of=test.mp3 bs=1024 count=2
> echo "{\"hello\":\"" > test.json; echo `base64 test.mp3` >> test.json;  echo
> "\"}" >> test.json;
> 
> 4) send it to couchdb
> 
> curl -X PUT http://127.0.0.1:5984/testdb/foobar21 -d @test.json
> 
> 5) open a memory/swap monitor and couchdb's binary consume all the memory
> (stopping when the swap memory ends)
> 
> kubuntu's system monitor (memory tab) ||
> top ||
> watch free ||
> 
> 6) remove the javascript validator
> 
> 7) repeat (5) and see how everything is fine
> 
> expected results:
> 
> (5) shouldn't happen. couchdb shouldn't leak memory or consume more memory
> than the size of the doc (20MB).

How much total memory do you have? CouchDB will consume more than the doc size
in memory (I've seen 2-3x) and using a validation function can blow this up 
more, but
unless you are on a really space constrained VPS, you shouldn't run into swap.

Cheers
Jan
-- 


> 
> -- 
> f u cn rd ths u cn b a gd prgmr !



Re: Rekindle discussion: `reduce=false` fails unpredictably

2010-09-19 Thread Jan Lehnardt

On 30 Aug 2010, at 10:01, Jason Smith wrote:

> This was discussed before here:
> 
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200912.mbox/%3c04c82f94-cf83-45d9-b599-47a8dd7c0...@gmail.com%3e
> 
> This is complicating my own client code. I went out of my way to make
> views that are valuable both in map and map/reduce form. But I
> discovered that client code must never send reduce=false to map views
> even though that is a no-op.
> 
> In the cited discussion, people debated how strictly CouchDB should
> validate query parameters.
> 
> You could download the ddoc and inspect it. (But CouchApps are
> approaching 1MB with all the vendor/ libraries). A _show function
> could inform the client which views are map vs. map/reduce. Finally,
> an unused reduce function (such as _count) could be used. None of
> these seem relaxing.
> 
> I propose a minor change to validation: a simple check is made to
> determine if the extra parameter would result in a no-op. If so, no
> exception is thrown. Therefore:
> 
> map view, reduce=false -> Allowed
> map view, reduce=true -> query_parse_error
> map view, group or group_level -> no change to today's behavior
> map/reduce view -> no change to today's behavior
> 
> (It can't be known at query time whether group and group_level no-op.
> In general they do not. Therefore the client must explicitly get it
> right.)
> 
> If this is acceptable, I will submit the patch to JIRA. Thank you.

yes please :)

Cheers
Jan
-- 



Re: View HTTP API extension proposal

2010-09-19 Thread Jan Lehnardt
Hi Tomas,

this sounds like a valuable addition. Back in the day I remember skip allowed 
for negative values to skip backwards, I'm not sure what happened to that.

`diameter` or how we want to call it would come with the same caveat that 
`skip` comes with as in it should only be used with "small" values as it's 
access is unindexed. Other than that, it sounds useful to me.

I'm sure you know this, but currently the way to get prev-next links is being 
smart with all the view options:

  http://guide.couchdb.org/draft/recipes.html#pagination

with the caveat that "jump to page" isn't really possible, but see the chapter 
for details.

Cheers
Jan
-- 


On 6 Sep 2010, at 17:04, Tomas Sedovic wrote:

> Hey all,
> 
> I'd like to propose a small addition to the HTTP View API. I can open
> a ticket later and maybe even submit a patch, but I want to discuss it
> with y'all first.
> 
> This new View extension would get you a specified document (by the
> View key) plus a few documents before and after it (with regards to
> that View's sort order).
> 
> Say that calling this View:
> 
>http://southparkelementary.edu/database/_design/students/_view/names
> 
> gives the following response (sorted by the key):
> 
>{"total_rows":7,"offset":0,"rows":[
>{"id":"624","key":"broflovski","value":"Kyle Broflovski"},
>{"id":"928","key":"cartman","value":"Eric Cartman"},
>{"id":"848","key":"marsh","value":"Stan Marsh"},
>{"id":"433","key":"mccormick","value":"Kenny McCormic"},
>{"id":"855","key":"stotch","value":"Butters Stotch"},
>{"id":"489","key":"testaburger","value":"Wendy Testaburger"},
>{"id":"292","key":"vulmer","value":"Jimmy Vulmer"}
>]}
> 
> Now, what I'd like to have is this:
> 
>
> http://southparkelementary.edu/database/_design/students/_view/names?key="mccormick"&diameter=1
> 
> which would return:
> 
>{"total_rows":7,"offset":2,"rows":[
>{"id":"848","key":"marsh","value":"Stan Marsh"},
>{"id":"433","key":"mccormick","value":"Kenny McCormic"},
>{"id":"855","key":"stotch","value":"Butters Stotch"}
>]}
> 
> (I'm not sure what's the best word for this query argument. Here are
> some other suggestions: vicinity, surroundings, neighborhood, nearby)
> 
> Essentially, this combines two Views you can get by the clever use of
> `startkey/endkey`, `limit` and `descending` arguments. The advantage
> of this API addition is that it can be used in the CouchDB Lists.
> 
> The obvious use case are the previous/next links between document
> pages. Following the example, if I had a web interface where the South
> Park elementary teachers would view the pages of the students, it
> would be nice if every student's page had a link to the previous and
> next student along with their names and small photos. This means that
> the List generating the student's page must have the access to the
> previous and next documents in the given View.
> 
> For example getting a student's page:
> 
>
> http://southparkelementary.edu/database/_design/students/_list/student_page/names?key="mccormick"&diameter=1
> 
> would generate a similar HTML structure:
> 
>...
>Student: Kenny McCormic
>... (additional data)
> 
>
> href="/database/_design/students/_list/student_page/names?key="marsh"&diameter=1">previous:
> Stan Marsh
> 
>
> href="/database/_design/students/_list/student_page/names?key="stotch"&diameter=1">next:
> Butters Stotch
>
> 
> As far as I can tell, there are two ways of doing that today:
> 
> a) client-side
> b) add the linking logic directly to the documents in the database
> 
> The first option is not always feasible/desirable and I dislike the
> second option because of its inflexibility. To maintain a
> doubly-linked structure across the database means that changing a
> single document could lead up to three separate document changes. And
> the complexity raises if we want to have multiple ways of sorting.
> 
> However, this extension directly plays into the strength of Views,
> which is that you can have the same set of standalone documents sorted
> by several different rules. You can use this in Lists to generate
> linked pages by different sorting rules.
> 
> It would also play nicely with the URL rewriting mechanisms, because
> if a list can access the previous/next documents, it can use their
> contents to generate the pretty URLs that are the rave these days.
> 
> I have limited knowledge of the CouchDB internals, but from what I
> know, it doesn't look like a big problem. As the views are B+trees,
> the leafs form a linked list already. I'm also guessing that the list
> is in fact doubly-linked (the presence of the `descending` View
> argument suggests so). Therefore, this change could be just a matter
> of finding the document requested by the key and traversing the list
> in both directions.
> 
> Please let me know what you think. Suggestions about the naming and
> behaviour of the API call are welcome. In the meantime, I'll div

Re: Addition of modify-on-document-write hooks

2010-09-19 Thread Jan Lehnardt

On 14 Sep 2010, at 03:26, J Chris Anderson wrote:

> 
> On Sep 13, 2010, at 6:23 PM, Simon Metson wrote:
> 
>> Hi James.
>>  I think the thing to do is require that a document has a user field, 
>> and that the value of that field matches the userCtx in the 
>> validate_doc_update function. This then pushes the issue client side, and 
>> makes the servers life easier. It could also be added by the front end 
>> apache in the case of our deployment, I think. I can see this sort of 
>> trigger thing being a good way of giving people a loaded gun aimed at their 
>> foot, they certainly are in Oracle if you're not careful.
>> Cheers
>> Simon
> 
> The big issue is that any code which runs on normal document updates, will 
> also run during replication, as replication is just a normal client. So this 
> means that adding a field will happen not just on the original client PUT but 
> also when replication happens.
> 
> This is why _update is a separate handler.

Adding a required field should be idempotent (correct me if I
am wrong) so it doesn't matter that replication is an agent of 
the user.

In the past we talked about "blessing" a ddoc / update function
that would magically invoke the update function on every write.
(analog for _show and _list) and people seem to like to explore
that idea. That said, the "magic" bit worries me a little :)

Cheers
Jan
-- 



> 
> Chris
> 
>> 
>> On 9 Sep 2010, at 05:19, James Jackson wrote:
>> 
>>> Hi all,
>>> 
>>> Moving this from the users forum, as it appears what I'm after isn't 
>>> currently available. For the security model I with to implement in a 
>>> production CouchDB cluster, I would like to be able to force a field to be 
>>> written to all docs based on the user context. The _update functionality is 
>>> not what I am after as it requires the user to actually call it when 
>>> writing a document (means security could be got-around by not calling this, 
>>> and setting the required field in the passed document to something 
>>> arbitrary, which would then not get caught by a validation function), and 
>>> can't modify a document which is passed to it (as far as I can tell it can 
>>> only modify existing documents, or create new ones).
>>> 
>>> I see this ticket:
>>> 
>>> https://issues.apache.org/jira/browse/COUCHDB-441
>>> 
>>> which talks about the functionality I am after, but appears to have morphed 
>>> into what is now there.
>>> 
>>> I am willing to implement such functionality, if it already doesn't exist, 
>>> but wonder if this would be welcome in the trunk, or if there are killer 
>>> pitfalls which stop this being possible. I note that in the discussion on 
>>> that ticket there is talk of how to deal with multiple such modify-on-write 
>>> functions, perhaps this is one area that needs discussion?
>>> 
>>> In any case, I'll probably implement this for our CouchDB installation, but 
>>> it would be good to make it generic and globally useful such that I can 
>>> contribute it back. I know of a number of people who would like this 
>>> functionality...
>>> 
>>> Regards,
>>> James.
>> 
> 



Re: Rep. bug in R...... 1.0.1?

2010-09-19 Thread Jan Lehnardt
Hi Nikolai,

sorry to be terse, but can you provide a short script that
exercises the behaviour? Ideally with placeholders for
the two CouchDB URLs so we can fill in values for our 
testing environment.

Cheers
Jan
-- 

On 11 Sep 2010, at 20:16, Nikolai Teofilov wrote:

> Hi Adam,
> 
> The words "pull" in step 4 and "push" in step 6 are correct. I exchanged the 
> places of the curl commands ...
> 
> The idea is common scenario ... to have master db and each slave server get 
> local copy of the master, make local changes ... (attach new files) and send 
> the modified copy back to the master. The problem appears only if the 
> documents have been updated with new attachments and only between databases  
> on two different servers. It looks like by sending back a document updated 
> with new attachment will affect the _rev number and a kind of side effect 
> appears so if you try to delete those document on the remote db the last 
> revision of the document before the update will be still in the database. It 
> could be that this is correct but I think the delete operation of a document 
> should remove all its revisions as well, correct?
> 
> 
> 1.   -  make remote_db  (on different machine!)
> 2.   -  create a doc  on the  remote_db
> 3.   -  make local_db (on different machine from the remote couchdb!)
> 4.   - (trigger from the local couchdb!)  remote_db->local_db
> 5.   - put an attachment on local_db/doc
> 6.  - trigger from local couchdb!   local_db -> remote_db
> 7.  - try to delete the remote_db/doc
>   the result should be the last _rev is deleted but a copy of the doc is 
> still in the remote_db with the initial _rev number.
> 
> I am almost sure it is a bug because if you try this on a one couchdb server 
> there is no such a problem. If you try with document without attachment there 
> is no problem as well and the documents in both last cases are deleted 
> completely.
> 
> Cheers 
> Nikolai
> 
> 
> On 10.09.2010, at 01:44, Adam Kocoloski wrote:
> 
>> Hi Nikolai, I'm not sure I understand.  In step 4 you said "pull ..." 
>> but what you actually did was push the local (empty?) test database to the 
>> remote server.  After that the subsequent steps don't make sense.  Can you 
>> try describing the steps again?  Best,
>> 
>> Adam
>> 
> 



Re: Database-level statistics/on-disk file names

2010-09-19 Thread Jan Lehnardt

On 17 Sep 2010, at 10:10, Dirkjan Ochtman wrote:

> Hi there,
> 
> I'd like to be able to determine how much disk space the index for a
> certain ddoc takes. Is there any easy way of doing that?

 curl $COUCH/db/_design/name/_info
{"name":"db","view_index":{"signature":"431517a0e7decdaa8a97d0dc9ffd7412","language":"javascript","disk_size":51,"updater_running":true,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":0,"purge_seq":0}}

The `signature` field corresponds to the .db_design/`signature`.view file.

--

> Relatedly, would it be possible to rejigger the on-disk layout a
> little bit to make the filenames easier to understand? For example, we
> now have:
> 
> /foo.couch
> /.foo_design/.view
> /.foo_temp
> 
> It would seem nicer to have, e.g.
> 
> /foo/data.couch
> /foo/users-.view
> /foo/.temp

I understand that it may seem nice to have everything related to a 
databases in a single directory, for say moving things around but 
practically, the current format is just as easy to script as the proposed 
version. What other reasons do you see that would improve with 
the proposed version?

In addition, creating and deleting databases would no longer be 
atomic (mkdir & fopen), but I'm not sure that's a hard requirement. 
I know the current model includes some very finicky details about 
deleting and renaming files on Windows which would need to be 
taken into consideration when changing the filesystem structure.

> (Unless maybe you want to be able to rename design docs without having
> to re-index, but I think even that would be manageable.)

This is a required feature :)

> This would also make index size fairly transparent.

I don't see how that's any different with either or any filesystem layout.

> Semi-relatedly, has anyone taken a swing at reflecting /_stats into a
> Futon page?

Oh I'd *love* to see that :)

Cheers
Jan
-- 



Re: multiview on github

2010-09-19 Thread Robert Dionne
I took another peek at this and I'm curious as to what it's doing. Is it just 
checking that a given id participates in a view? So if it makes it around the 
ring it wins? Or is it actually computing the result of passing the doc thru 
all the views?

If the answer is the former then would disjunction also be something one might 
want? I'm just curious, I don't have a use case and I forget the original 
discussion around this. I sort of think of views as a functional mapping from 
the database to some subset. That's not entirely accurate given there's this 
reduce phase also. So I could imagine composing views in a functional way, but 
the same thing can be had with just a different map function that is the 
composition.

Anyway if you have a brief description of this, with a use case,  it would help.

Cheers,

Bob




On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:

> Chris, James
> 
> thanks for bumping this, we are using this internally at 'scale'
> (million+ keys). I want this to work for couchdb as we want to give
> back for such a great product and support this going forward, so any
> suggestions welcomed and we will test and add them to the local github
> account with the aim of getting this into trunk.
> 
> Norman
> 
> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton  
> wrote:
>> I want to use it!  I just haven't gotten around to it.  I was going to try
>> and test it out this weekend and if I am able, I will certainly report back
>> what I find.
>> 
>> James
>> 
>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson  wrote:
>> 
>>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker 
>>> wrote:
 Bob,
 
 I can and have been testing the multiview at this scale, it is ok
 (fast enough), but I think being able to test inclusion of a document
 id in a view without having to loop would be a considerable speed
 improvement. If you have any ideas let me know.
 
>>> 
>>> I just want to bump this thread, as I think this is a useful feature.
>>> I don't expect to be able to test it in the coming weeks, but if I did
>>> I would. Is anyone besides Norman using this? Has anyone used it at
>>> scale?
>>> 
>>> Cheers,
>>> Chris
>>> 
 thanks,
 
 Norman
 
 On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson 
>>> wrote:
> I'm sorry, I've had no time to play with this at scale.
> 
> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker 
>>> wrote:
>> Hi,
>> 
>> are there any more comments on this, if not can you describe the
>> process (in particular how to obtain a wiki and jira account for
>> couchdb which I have been unable to do) and I will start documenting
>> this so we can put this into the trunk.
>> 
>> Bob, were you able to do any more testing with large views, are there
>> any suggestions on how to speed up the document id inclusion test as
>> described below?
>> 
>> thanks,
>> 
>> Norman
>> 
>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
>>> norman.bar...@gmail.com> wrote:
>>> Bob,
>>> 
>>> thanks for the feedback and for taking a look at the code. Guidelines
>>> on when to use a supervisor within couchdb with a gen_server would be
>>> appreciated, currently I have a supervisor and a gen_server, but if
>>> couchdb has a supervision process I could remove that layer.
>>> 
>>> I think plugins is a great idea, however intersection of views is such
>>> as common request, perhaps there needs to plugin system and if a
>>> plugin is rated enough it goes into trunk as a core feature.
>>> 
>>> the four (or slightly more) summary is here
>>> 
>>> 
>>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>>> 
>>> %
>>> % send an id from the start list to the next node in the ring, if the
>>> id is in adjacent node then the this node sends to the next ring node
>>> 
>>> % if the id gets all round the ring and back to the start node then is
>>> has intersected all queries and should be included. The nodes in the
>>> ring
>>> % should be sorted in size from small to large for this to be
>>> effective
>>> %
>>> % In addition send the initial id list round in parallel
>>> 
>>> it really needs some eyes from the core couchdb coders to see how to
>>> speed up the inclusion testing, looping is bad even if it is done in
>>> parallel.
>>> 
>>> Multiview is usable, I am using it with some pretty big mega-views (as
>>> per the raindrop) model, I am also available to add features to this
>>> as this is core part of our work and we want to give it to couch as a
>>> contribution.
>>> 
>>> thanks,
>>> 
>>> Norman
>>> 
>>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>>  wrote:
 Hi Norman,
 
  I took a peek at multiview. I haven't followed this too closely on
>>> the mailing list but this is *view intersection