[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon

2011-08-23 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089512#comment-13089512
 ] 

Damien Katz commented on COUCHDB-1153:
--

Robert, Benoit, your issues can still be addressed. You can submit patches that 
improve upon Filipes work. But telling Filipe to code the patch your way, 
without code is not how this community works. Filipe's work is a feature people 
care about, and any objections of correctness have been addressed. Switching 
the code to an evented model, or any other improvements is welcome from you or 
any other community member, but users want this feature, and Filipe should not 
be expected to code it up to everyone else expectations before any check-in can 
occur. Improvement can, and should, happen continuously.

 Database and view index compaction daemon
 -

 Key: COUCHDB-1153
 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
 Project: CouchDB
  Issue Type: New Feature
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
Priority: Minor
  Labels: compaction

 I've recently written an Erlang process to automatically compact databases 
 and they're views based on some configurable parameters. These parameters can 
 be global or per database and are: minimum database fragmentation, minimum 
 view fragmentation, allowed period and strict_window (whether an ongoing 
 compaction should be canceled if it doesn't finish within the allowed 
 period). These fragmentation values are based on the recently added 
 data_size parameter to the database and view group information URIs 
 (COUCHDB-1132).
 I've documented the .ini configuration, as a comment in default.ini, which I 
 paste here:
 [compaction_daemon]
 ; The delay, in seconds, between each check for which database and view 
 indexes
 ; need to be compacted.
 check_interval = 60
 ; If a database or view index file is smaller then this value (in bytes),
 ; compaction will not happen. Very small files always have a very high
 ; fragmentation therefore it's not worth to compact them.
 min_file_size = 131072
 [compactions]
 ; List of compaction rules for the compaction daemon.
 ; The daemon compacts databases and they're respective view groups when all 
 the
 ; condition parameters are satisfied. Configuration can be per database or
 ; global, and it has the following format:
 ;
 ; database_name = parameter=value [, parameter=value]*
 ; _default = parameter=value [, parameter=value]*
 ;
 ; Possible parameters:
 ;
 ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
 ;  of old data (and its supporting metadata) over the 
 database
 ;  file size is equal to or greater then this value, this
 ;  database compaction condition is satisfied.
 ;  This value is computed as:
 ;
 ;   (file_size - data_size) / file_size * 100
 ;
 ;  The data_size and file_size values can be obtained when
 ;  querying a database's information URI (GET /dbname/).
 ;
 ; * view_fragmentation - If the ratio (as an integer percentage), of the 
 amount
 ;of old data (and its supporting metadata) over the 
 view
 ;index (view group) file size is equal to or greater 
 then
 ;this value, then this view index compaction 
 condition is
 ;satisfied. This value is computed as:
 ;
 ;(file_size - data_size) / file_size * 100
 ;
 ;The data_size and file_size values can be obtained 
 when
 ;querying a view group's information URI
 ;(GET /dbname/_design/groupname/_info).
 ;
 ; * period - The period for which a database (and its view groups) compaction
 ;is allowed. This value must obey the following format:
 ;
 ;HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
 ;
 ; * strict_window - If a compaction is still running after the end of the 
 allowed
 ;   period, it will be canceled if this parameter is set to 
 yes.
 ;   It defaults to no and it's meaningful only if the 
 *period*
 ;   parameter is also specified.
 ;
 ; * parallel_view_compaction - If set to yes, the database and its views are
 ;  compacted in parallel. This is only useful on
 ;  certain setups, like for example when the 
 database
 ;  and view index directories point to different
 ;  disks. It defaults to no.
 ;
 ; Before a compaction is triggered, an estimation of how much free disk space 
 is
 

[jira] [Commented] (COUCHDB-1256) Incremental requests to _changes can skip revisions

2011-08-23 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089755#comment-13089755
 ] 

Damien Katz commented on COUCHDB-1256:
--

I agree with the fix adam proposes. The code in question is an optimization to 
prevent the sending/checking of documents we've already examined, but with 
checkpointing it breaks. Removal of the code is the right fix for now.

In the future, we can add the optimization back if the check-pointing can keep 
note of completed replications vs. checkpointed. Checkpointed records would 
keep a high water mark of the last completed replication, and the seq num and 
that high mark for completed replication would both be sent to the _changes 
handler. The _changes would not send docs with a seq below the checkpoint 
value. When the replication checkpoints, it saves the current seq and the last 
high water mark complete. When replication completes. it sets the last seq and 
high water mark to the same seq, and that is gets sent for the next replication.

Also, continuous replication would need a way to signal when a replication is 
complete as well, so that the high water mark can be set there as well.

 Incremental requests to _changes can skip revisions
 ---

 Key: COUCHDB-1256
 URL: https://issues.apache.org/jira/browse/COUCHDB-1256
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.10, 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2, 
 1.1, 1.0.3
 Environment: confirmed on Apache CouchDB 1.1.0, bug appears to be 
 present in 1.0.3 and trunk
Reporter: Adam Kocoloski
Assignee: Adam Kocoloski
Priority: Blocker
 Fix For: 1.0.4, 1.1.1, 1.2

 Attachments: jira-1256-test.diff


 Requests to _changes with style=all_docssince=N (requests made by the 
 replicator) are liable to suppress revisions of a document.  The following 
 sequence of curl commands demonstrates the bug:
 curl -X PUT localhost:5985/revseq 
 {ok:true}
 curl -X PUT -Hcontent-type:application/json localhost:5985/revseq/foo -d 
 '{a:123}'
 {ok:true,id:foo,rev:1-0dc33db52a43872b6f3371cef7de0277}
 curl -X PUT -Hcontent-type:application/json localhost:5985/revseq/bar -d 
 '{a:456}'
 {ok:true,id:bar,rev:1-cc609831f0ca66e8cd3d4c1e0d98108a}
 % stick a conflict revision in foo
 curl -X PUT -Hcontent-type:application/json 
 localhost:5985/revseq/foo?new_edits=false -d 
 '{_rev:1-cc609831f0ca66e8cd3d4c1e0d98108a, a:123}'
 {ok:true,id:foo,rev:1-cc609831f0ca66e8cd3d4c1e0d98108a}
 % request without since= gives the expected result
 curl -Hcontent-type:application/json 
 localhost:5985/revseq/_changes?style=all_docs
 {results:[
 {seq:2,id:bar,changes:[{rev:1-cc609831f0ca66e8cd3d4c1e0d98108a}]},
 {seq:3,id:foo,changes:[{rev:1-cc609831f0ca66e8cd3d4c1e0d98108a},{rev:1-0dc33db52a43872b6f3371cef7de0277}]}
 ],
 last_seq:3}
 % request starting from since=2 suppresses revision 
 1-0dc33db52a43872b6f3371cef7de0277 of foo
 macbook:~ (master) $ curl 
 localhost:5985/revseq/_changes?style=all_docs\since=2
 {results:[
 {seq:3,id:foo,changes:[{rev:1-cc609831f0ca66e8cd3d4c1e0d98108a}]}
 ],
 last_seq:3}
 I believe the fix is something like this (though we could refactor further 
 because Style is unused):
 diff --git a/src/couchdb/couch_db.erl b/src/couchdb/couch_db.erl
 index e8705be..65aeca3 100644
 --- a/src/couchdb/couch_db.erl
 +++ b/src/couchdb/couch_db.erl
 @@ -1029,19 +1029,7 @@ changes_since(Db, Style, StartSeq, Fun, Acc) -
  changes_since(Db, Style, StartSeq, Fun, [], Acc).
  
  changes_since(Db, Style, StartSeq, Fun, Options, Acc) -
 -Wrapper = fun(DocInfo, _Offset, Acc2) -
 -#doc_info{revs=Revs} = DocInfo,
 -DocInfo2 =
 -case Style of
 -main_only -
 -DocInfo;
 -all_docs -
 -% remove revs before the seq
 -DocInfo#doc_info{revs=[RevInfo ||
 -#rev_info{seq=RevSeq}=RevInfo - Revs, StartSeq  
 RevSeq]}
 -end,
 -Fun(DocInfo2, Acc2)
 -end,
 +Wrapper = fun(DocInfo, _Offset, Acc2) - Fun(DocInfo, Acc2) end,
  {ok, _LastReduction, AccOut} = couch_btree:fold(by_seq_btree(Db),
  Wrapper, Acc, [{start_key, StartSeq + 1}] ++ Options),
  {ok, AccOut}.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COUCHDB-1243) Compact and copy feature that resets changes

2011-08-08 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081324#comment-13081324
 ] 

Damien Katz commented on COUCHDB-1243:
--

I mostly agree with Robert Newsom, that what you are asking for is a dangerous 
thing for CouchDB replication. However, there is the purge option, which 
forgets documents, deleted or otherwise, completely removing them from the 
internal indexes. Once documents are purged, compaction will will completely 
remove them from the file forever. Unfortunately, I couldn't find actual 
documentation on the purge functionality, so the best place to figure out how 
to use the purge is to look at the purge test in the browser test suite, which 
can be found here:

http://svn.apache.org/viewvc/couchdb/trunk/share/www/script/test/purge.js?view=corevision=1086241content-type=text%2Fplain

I've often thought a it would be useful to purge docs during compaction, by 
providing a user defined function to signal to remove unwanted docs/stubs. But 
no such thing exists, in the meantime you can accomplish it with a purge + 
compaction.

 Compact and copy feature that resets changes
 

 Key: COUCHDB-1243
 URL: https://issues.apache.org/jira/browse/COUCHDB-1243
 Project: CouchDB
  Issue Type: New Feature
  Components: Database Core
Affects Versions: 1.0.1, 1.1
 Environment: Ubuntu, but not important
Reporter: Henrik Hofmeister
  Labels: cleanup, compaction
 Attachments: dump_load.php


 After running db and view compaction on a 70K doc db with 6+ mio. changes - 
 it takes up 0.8 GB. If copying the same documents to a new db (get and bulk 
 insert) - the same date with 70K changes (only the inserts) takes up 40 mb. 
 That is a huge difference. Has been verified on 2 db's that the difference is 
 more than 65 times the size of data.
 A Compact and copy feature that copies only documents, and resets the 
 changes for at db would be very nice to try and limit the disk usage a little 
 bit. (Our current test environment takes up nearly 100 GB... )
 I've attached the dump load php script for your convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (COUCHDB-1141) Docs deleted via PUT or POST do not have contents removed.

2011-04-26 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz resolved COUCHDB-1141.
--

Resolution: Not A Problem
  Assignee: Damien Katz  (was: Robert Newson)

This is by design. Deleted documents are supposed be able to contain meta 
information about who deleted them, etc, because they replicate. The problem 
might be a documentation issue, as clients need to make sure the document body 
is empty when bulk deleting.

 Docs deleted via PUT or POST do not have contents removed.
 --

 Key: COUCHDB-1141
 URL: https://issues.apache.org/jira/browse/COUCHDB-1141
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
 Environment: All
Reporter: Wendall Cada
Assignee: Damien Katz
 Fix For: 1.2


 If a doc is deleted via -X DELETE, the resulting doc contains only 
 id/rev_deleted:true. However, if a doc is deleted via PUT or POST, through 
 adding _deleted:true, the entire contents of the doc remains stored. Even 
 after compaction, the original document contents remain.
 This issue is causing databases with large docs to become bloated over time, 
 as the original doc contents remain in the database even after being deleted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1140) fetching _local docs by revision in URL fails

2011-04-25 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024871#comment-13024871
 ] 

Damien Katz commented on COUCHDB-1140:
--

I don't consider this a bug as we don't store previous revisions of local docs 
like we do for regular docs (and technically, getting docs by older revision 
wasn't every really supposed to be a feature). The error message could probably 
be better here though.

 fetching _local docs by revision in URL fails
 -

 Key: COUCHDB-1140
 URL: https://issues.apache.org/jira/browse/COUCHDB-1140
 Project: CouchDB
  Issue Type: Bug
  Components: HTTP Interface
Affects Versions: 1.0.2, 1.1
Reporter: Jan Lehnardt
Priority: Minor

 Via dev@
 Hi,
 Seems like a bug. You need to pass the current rev in the body of the
 document. Passing it as ?rev= does not work at all.
 B.
 On 24 April 2011 19:39, Pedro Landeiro lande...@gmail.com wrote:
 Already tried that but the rev argument does not accept (),  returns
 instead:
 {error:unknown_error,reason:badarg}
 On Sun, Apr 24, 2011 at 11:16 AM, Robert Newson 
 robert.new...@gmail.comwrote:
 try ?rev=0-1
 B.
 On 23 April 2011 22:41, Pedro Landeiro lande...@gmail.com wrote:
 Hi,
 I cannot retrieve a local doc by revision.
 I can get the doc like this:
 http://127.0.0.1:5984/thisisatempdb/_local/mylocaldoc
 {_id:_local/mylocaldoc,_rev:0-1,name:pedro,surname:landeiro,islocal:oh
 yeah}
 but if I request the some doc with the revision:
 http://127.0.0.1:5984/thisisatempdb/_local/mylocaldoc?rev=0-1
 {error:not_found,reason:missing}
 Am i doing some wrong?
 Thanks.
 --
 Pedro Landeiro
 http://www.linkedin.com/in/pedrolandeiro
 --
 Pedro Landeiro
 http://www.linkedin.com/in/pedrolandeiro

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1124) Refactor couch_btree.erl

2011-04-15 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020554#comment-13020554
 ] 

Damien Katz commented on COUCHDB-1124:
--

I have't looked closely at the patch, but with this module it's most important 
to not lose performance. One thing that jumps out at me is the cmp_keys 
function. I'd be sure to benchmark the the view indexing with large, complex 
keys, and the less comparisons will likely be happening more often and we've 
seen that be a performance bottleneck in the past.

 Refactor couch_btree.erl
 

 Key: COUCHDB-1124
 URL: https://issues.apache.org/jira/browse/COUCHDB-1124
 Project: CouchDB
  Issue Type: Improvement
Reporter: Paul Joseph Davis
 Attachments: 0001-Refactor-couch_btree.erl.patch


 I've completely refactored couch_btree.erl in an attempt to make it more 
 palatable for people that want to learn it. The current version is quite 
 organic in its nature and this cleans up the code to be more consumable. Most 
 everyone that's seen this patch has wanted it in trunk but I never got around 
 to committing it.
 The patch I'm about to attach is quite gnarly as it's basically deleting and 
 recreating the entire file. I find it quite a bit more helpful to read the 
 end result which you can do at [1].
 Also, if we do commit this then the code in COUCHDB-1084 will be quite broken 
 for the btree section. If that patch still applies cleanly to the other files 
 I'm going to try and update the btree code for it tonight.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1118) Adding a NIF based JSON decoding/encoding module

2011-04-02 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015085#comment-13015085
 ] 

Damien Katz commented on COUCHDB-1118:
--

Looks good to me. Check it in!

 Adding a NIF based JSON decoding/encoding module
 

 Key: COUCHDB-1118
 URL: https://issues.apache.org/jira/browse/COUCHDB-1118
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Filipe Manana
 Fix For: 1.2


 Currently, all the Erlang based JSON encoders and decoders are very slow, and 
 decoding and encoding JSON is something that we do basically everywhere.
 Via IRC, it recently discussed about adding a JSON NIF encoder/decoder. 
 Damien also started a thread at the development mailing list about adding 
 NIFs to trunk.
 The patch/branch at [1] adds such a JSON encoder/decoder. It is based on Paul 
 Davis' eep0018 project [2]. Damien made some modifications [3] to it mostly 
 to add support for big numbers (Paul's eep0018 limits the precision to 32/64 
 bits) and a few optimizations. I made a few corrections and minor 
 enhancements on top of Damien's fork as well [4]. Finally Benoît identified 
 some missing capabilities compared to mochijson2 (on encoding, allow atoms as 
 strings and strings as object properties).
 Also, the version added in the patch at [1] uses mochijson2 when the C NIF is 
 not loaded. Autotools configuration was adapted to compile the NIF only when 
 we're using an OTP release = R13B04 (R13B03 NIF API is too limited and 
 suffered many changes compared to R13B04 and R14) - therefore it should work 
 on any OTP release  R13B at least.
 I successfully tested this on R13B03, R13B04 and R14B02 in an Ubuntu 
 environment.
 I'm not sure if it builds at all on Windows - would appreciate if someone 
 could verify it.
 Also, I'm far from being good with the autotools, so I probably missed 
 something important or I'm doing something in a not very standard way.
 This NIF encoder/decoder is about one order of magnitude faster compared to 
 mochijson2 and other Erlang-only solutions such as jsx. A read and writes 
 test with relaximation shows this has a very positive impact, specially on 
 reads (the EJSON encoding is more expensive than JSON decoding) - 
 http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef634052381
 @Paul, since this is based on your eep0018 effort, do you think any other 
 missing files should be added (README, etap tests, etc)? Also, should we put 
 somewhere a note this is based on your project?
 [1] - https://github.com/fdmanana/couchdb/compare/json_nif
 [2] - https://github.com/davisp/eep0018
 [3] - https://github.com/Damienkatz/eep0018/commits/master
 [4] - https://github.com/fdmanana/eep0018/commits/final_damien

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (COUCHDB-1092) Storing documents bodies as raw JSON binaries instead of serialized JSON terms

2011-03-15 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007262#comment-13007262
 ] 

Damien Katz commented on COUCHDB-1092:
--

Great work Filipe! The size win alone is enough to make this patch compelling. 
I think most of the perf gain is coming from keeping the json in binary format, 
so that as they get passed around from erlang process to process, only pointers 
to the doc bodies are copied, not the actual complex json terms themselves. The 
dramatic gains in the indexer is evidence of this, as the pipelined processer 
passes documents and views rows from process to process. This new work makes 
that much more efficient.

 Storing documents bodies as raw JSON binaries instead of serialized JSON terms
 --

 Key: COUCHDB-1092
 URL: https://issues.apache.org/jira/browse/COUCHDB-1092
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Filipe Manana
Assignee: Filipe Manana

 Currently we store documents as Erlang serialized (via the term_to_binary/1 
 BIF) EJSON.
 The proposed patch changes the database file format so that instead of 
 storing serialized
 EJSON document bodies, it stores raw JSON binaries.
 The github branch is at:  
 https://github.com/fdmanana/couchdb/tree/raw_json_docs
 Advantages:
 * what we write to disk is much smaller - a raw JSON binary can easily get up 
 to 50% smaller
   (at least according to the tests I did)
 * when serving documents to a client we no longer need to JSON encode the 
 document body
   read from the disk - this applies to individual document requests, view 
 queries with
   ?include_docs=true, pull and push replications, and possibly other use 
 cases.
   We just grab its body and prepend the _id, _rev and all the necessary 
 metadata fields
   (this is via simple Erlang binary operations)
 * we avoid the EJSON term copying between request handlers and the db updater 
 processes,
   between the work queues and the view updater process, between replicator 
 processes, etc
 * before sending a document to the JavaScript view server, we no longer need 
 to convert it
   from EJSON to JSON
 The changes done to the document write workflow are minimalist - after JSON 
 decoding the
 document's JSON into EJSON and removing the metadata top level fields (_id, 
 _rev, etc), it
 JSON encodes the resulting EJSON body into a binary - this consumes CPU of 
 course but it
 brings 2 advantages:
 1) we avoid the EJSON copy between the request process and the database 
 updater process -
for any realistic document size (4kb or more) this can be very expensive, 
 specially
when there are many nested structures (lists inside objects inside lists, 
 etc)
 2) before writing anything to the file, we do a term_to_binary([Len, Md5, 
 TheThingToWrite])
and then write the result to the file. A term_to_binary call with a binary 
 as the input
is very fast compared to a term_to_binary call with EJSON as input (or 
 some other nested
structure)
 I think both compensate the JSON encoding after the separation of meta data 
 fields and non-meta data fields.
 The following relaximation graph, for documents with sizes of 4Kb, shows a 
 significant
 performance increase both for writes and reads - especially reads.   
 http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400b94f
 I've also made a few tests to see how much the improvement is when querying a 
 view, for the
 first time, without ?stale=ok. The size difference of the databases (after 
 compaction) is
 also very significant - this change can reduce the size at least 50% in 
 common cases.
 The test databases were created in an instance built from that experimental 
 branch.
 Then they were replicated into a CouchDB instance built from the current 
 trunk.
 At the end both databases were compacted (to fairly compare their final 
 sizes).
 The databases contain the following view:
 {
 _id: _design/test,
 language: javascript,
 views: {
 simple: {
 map: function(doc) { emit(doc.float1, doc.strings[1]); }
 }
 }
 }
 ## Database with 500 000 docs of 2.5Kb each
 Document template is at:  
 https://github.com/fdmanana/couchdb/blob/raw_json_docs/doc_2_5k.json
 Sizes (branch vs trunk):
 $ du -m couchdb/tmp/lib/disk_json_test.couch 
 1996  couchdb/tmp/lib/disk_json_test.couch
 $ du -m couchdb-trunk/tmp/lib/disk_ejson_test.couch 
 2693  couchdb-trunk/tmp/lib/disk_ejson_test.couch
 Time, from a user's perpective, to build the view index from scratch:
 $ time curl 
 http://localhost:5984/disk_json_test/_design/test/_view/simple?limit=1
 {total_rows:50,offset:0,rows:[
 {id:076a-c1ae-4999-b508-c03f4d0620c5,key:null,value:wfxuF3N8XEK6}
 ]}
 real  6m6.740s
 

[jira] Commented: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater

2011-03-08 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004089#comment-13004089
 ] 

Damien Katz commented on COUCHDB-1084:
--

Thanks for the feedback on the code style, we definitely want to clean it up 
before commiting. Right now I'm more interested on the performance impact and 
how fruitful removing the btree lookup is. I'm hoping this patch will improve 
performance for all writes, both inserts and updates, but I don't have time set 
up benchmarks right now.

 Remove unnecessary btree lookup inside couch_db_updater
 ---

 Key: COUCHDB-1084
 URL: https://issues.apache.org/jira/browse/COUCHDB-1084
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 1.2
Reporter: Damien Katz
Assignee: Damien Katz
 Attachments: remove_btree_lookup.patch


 The CouchDB update process has an unnecessary btree lookup, where it reads 
 the values in bulks, checks for conflicts, writes the docs to disk, updates 
 the values appropriately and writes them to the btree out in a second step. 
 It's possible to avoid this second step, and instead do all the checking, doc 
 writing and value transformation in a single btree lookup, thereby reducing 
 the number of btree traversals and disk IO.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater

2011-03-08 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004376#comment-13004376
 ] 

Damien Katz commented on COUCHDB-1084:
--

This isn't the first time i've seen weird unexpected results from relaxamation. 
I'm really thinking the benchmarks need some work to be more usable and useful. 
I can't explain wonder how the the improvements on the write path would cause 
this read impact. Perhaps more comprehensive tests would give a clearer picture 
what's going on, or maybe there is a bug in the tests themselves.

I've also had a hard time interpreting the graphs, I think they need some 
smoothing or something to make it easier to visualize the differences.

 Remove unnecessary btree lookup inside couch_db_updater
 ---

 Key: COUCHDB-1084
 URL: https://issues.apache.org/jira/browse/COUCHDB-1084
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 1.2
Reporter: Damien Katz
Assignee: Damien Katz
 Attachments: remove_btree_lookup.patch


 The CouchDB update process has an unnecessary btree lookup, where it reads 
 the values in bulks, checks for conflicts, writes the docs to disk, updates 
 the values appropriately and writes them to the btree out in a second step. 
 It's possible to avoid this second step, and instead do all the checking, doc 
 writing and value transformation in a single btree lookup, thereby reducing 
 the number of btree traversals and disk IO.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater

2011-03-07 Thread Damien Katz (JIRA)
Remove unnecessary btree lookup inside couch_db_updater
---

 Key: COUCHDB-1084
 URL: https://issues.apache.org/jira/browse/COUCHDB-1084
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 1.2
Reporter: Damien Katz
Assignee: Damien Katz


The CouchDB update process has an unnecessary btree lookup, where it reads the 
values in bulks, checks for conflicts, writes the docs to disk, updates the 
values appropriately and writes them to the btree out in a second step. It's 
possible to avoid this second step, and instead do all the checking, doc 
writing and value transformation in a single btree lookup, thereby reducing the 
number of btree traversals and disk IO.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater

2011-03-07 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-1084:
-

Attachment: remove_btree_lookup.patch

Applies to couchdb trunk revision 1078680

 Remove unnecessary btree lookup inside couch_db_updater
 ---

 Key: COUCHDB-1084
 URL: https://issues.apache.org/jira/browse/COUCHDB-1084
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 1.2
Reporter: Damien Katz
Assignee: Damien Katz
 Attachments: remove_btree_lookup.patch


 The CouchDB update process has an unnecessary btree lookup, where it reads 
 the values in bulks, checks for conflicts, writes the docs to disk, updates 
 the values appropriately and writes them to the btree out in a second step. 
 It's possible to avoid this second step, and instead do all the checking, doc 
 writing and value transformation in a single btree lookup, thereby reducing 
 the number of btree traversals and disk IO.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater

2011-03-07 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003794#comment-13003794
 ] 

Damien Katz commented on COUCHDB-1084:
--

The attached patch might have stability issues, but should give an idea of the 
performance impact of the change. Would like to see some bench markmarks to see 
if it actually helps.

 Remove unnecessary btree lookup inside couch_db_updater
 ---

 Key: COUCHDB-1084
 URL: https://issues.apache.org/jira/browse/COUCHDB-1084
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 1.2
Reporter: Damien Katz
Assignee: Damien Katz
 Attachments: remove_btree_lookup.patch


 The CouchDB update process has an unnecessary btree lookup, where it reads 
 the values in bulks, checks for conflicts, writes the docs to disk, updates 
 the values appropriately and writes them to the btree out in a second step. 
 It's possible to avoid this second step, and instead do all the checking, doc 
 writing and value transformation in a single btree lookup, thereby reducing 
 the number of btree traversals and disk IO.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (COUCHDB-864) multipart/related PUT's always close the connection.

2010-08-26 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903044#action_12903044
 ] 

Damien Katz commented on COUCHDB-864:
-

Last patch looks good. It's got my ok to check in to trunk and 1.0.x.

 multipart/related PUT's always close the connection.
 

 Key: COUCHDB-864
 URL: https://issues.apache.org/jira/browse/COUCHDB-864
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Robert Newson
 Attachments: chunked.erl, mp_doc_put_http_pipeline.patch, 
 mp_pipeline.patch


 I noticed that mochiweb always closes the connection when doing a 
 multipart/related PUT (to insert the JSON document and accompanying 
 attachments in one call). Ultimately it's because we call recv(0) and not 
 recv_body, thus consuming more data than we actually process. Mochiweb 
 notices that there is unread data on the socket and closes the connection.
 This impacts replication with attachments, as I believe they go through this 
 code path (and, thus, are forever reconnecting).
 The code below demonstrates a fix for this issue but isn't good enough for 
 trunk. Adam provided the important process dictionary fix.
 ---
  src/couchdb/couch_doc.erl  |1 +
  src/couchdb/couch_httpd_db.erl |   13 +
  2 files changed, 10 insertions(+), 4 deletions(-)
 diff --git a/src/couchdb/couch_doc.erl b/src/couchdb/couch_doc.erl
 index 5009f8f..f8c874b 100644
 --- a/src/couchdb/couch_doc.erl
 +++ b/src/couchdb/couch_doc.erl
 @@ -455,6 +455,7 @@ doc_from_multi_part_stream(ContentType, DataFun) -
  Parser ! {get_doc_bytes, self()},
  receive 
  {doc_bytes, DocBytes} -
 +erlang:put(mochiweb_request_recv, true),
  Doc = from_json_obj(?JSON_DECODE(DocBytes)),
  % go through the attachments looking for 'follows' in the data,
  % replace with function that reads the data from MIME stream.
 diff --git a/src/couchdb/couch_httpd_db.erl b/src/couchdb/couch_httpd_db.erl
 index b0fbe8d..eff7d67 100644
 --- a/src/couchdb/couch_httpd_db.erl
 +++ b/src/couchdb/couch_httpd_db.erl
 @@ -651,12 +651,13 @@ db_doc_req(#httpd{method='PUT'}=Req, Db, DocId) -
  } = parse_doc_query(Req),
  couch_doc:validate_docid(DocId),
  
 +Len = couch_httpd:header_value(Req,Content-Length),
  Loc = absolute_uri(Req, / ++ ?b2l(Db#db.name) ++ / ++ ?b2l(DocId)),
  RespHeaders = [{Location, Loc}],
  case couch_util:to_list(couch_httpd:header_value(Req, Content-Type)) of
  (multipart/related; ++ _) = ContentType -
  {ok, Doc0} = couch_doc:doc_from_multi_part_stream(ContentType,
 -fun() - receive_request_data(Req) end),
 +fun() - receive_request_data(Req, Len) end),
  Doc = couch_doc_from_req(Req, DocId, Doc0),
  update_doc(Req, Db, DocId, Doc, RespHeaders, UpdateType);
  _Else -
 @@ -775,9 +776,13 @@ send_docs_multipart(Req, Results, Options) -
  couch_httpd:send_chunk(Resp, --),
  couch_httpd:last_chunk(Resp).
  
 -receive_request_data(Req) -
 -{couch_httpd:recv(Req, 0), fun() - receive_request_data(Req) end}.
 -
 +receive_request_data(Req, undefined) -
 +receive_request_data(Req, 0);
 +receive_request_data(Req, Len) when is_list(Len)-
 +Remaining = list_to_integer(Len),
 +Bin = couch_httpd:recv(Req, Remaining),
 +{Bin, fun() - receive_request_data(Req, Remaining - iolist_size(Bin)) 
 end}.
 +
  update_doc_result_to_json({{Id, Rev}, Error}) -
  {_Code, Err, Msg} = couch_httpd:error_info(Error),
  {[{id, Id}, {rev, couch_doc:rev_to_str(Rev)},
 -- 
 1.7.2.2
 Umbra

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-863) be quiet about dropping invalid references

2010-08-20 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900706#action_12900706
 ] 

Damien Katz commented on COUCHDB-863:
-

This sounds like a deeper bug. Under what circumstances does it attempt to drop 
references and it's already closed??

 be quiet about dropping invalid references
 --

 Key: COUCHDB-863
 URL: https://issues.apache.org/jira/browse/COUCHDB-863
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 1.0.1
Reporter: Randall Leeds
Priority: Trivial

 couch_ref_counter:drop will complain, dying with noproc, if the reference 
 counter does not exist. Since dropping a reference to a non-existent process 
 isn't exactly an error I think we should squelch this one. I hate log noise 
 and I've noticed this pop up in the logs a bunch, especially running the test 
 suite. Extra noise doesn't make debugging easier and it could confuse people 
 trying to solve real problems.
 Trivial, trivial patch unless I'm missing something really silly. I'll save 
 everyone the extra emails from JIRA and just paste it here.
 diff --git a/src/couchdb/couch_ref_counter.erl 
 b/src/couchdb/couch_ref_counter.erl
 index 5a111ab..1edc474 100644
 --- a/src/couchdb/couch_ref_counter.erl
 +++ b/src/couchdb/couch_ref_counter.erl
 @@ -24,7 +24,9 @@ drop(RefCounterPid) -
  drop(RefCounterPid, self()).
  
  drop(RefCounterPid, Pid) -
 -gen_server:call(RefCounterPid, {drop, Pid}).
 +try gen_server:call(RefCounterPid, {drop, Pid})
 +catch exit:{noproc, _} - ok
 +end.
  
  
  add(RefCounterPid) -

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-844) Documents missing after CouchDB restart

2010-08-06 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896209#action_12896209
 ] 

Damien Katz commented on COUCHDB-844:
-

Hello Sascha.

What file system are you running? Can you run a consistency check on it?

It's strange. It looks like your file was either truncated or the header was 
never written. There is a bunch of data after the last header, and it contains 
your missing data, but none of it looks like a header for it. All the interval 
markers are set for data. This is consistent with a file that's been truncated. 
Still doing a bit more investigation to check the data regions to see if they 
might actually have a header.

We have seen instances in the past (0.8.0 and earlier) where file systems have 
truncated the db file, making recovery difficult, which is why we switched to 
pure tail append format. As I recall, those reports were associated with the 
file system running out of space.

Barring a physical corruption or truncation, the other only possibility I can 
think of is somehow there is bug where the couchdb isn't writing the header. I 
don't know of any other instances of that happening, but if that's what it is, 
it's a very serious bug.

 Documents missing after CouchDB restart
 ---

 Key: COUCHDB-844
 URL: https://issues.apache.org/jira/browse/COUCHDB-844
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 1.0
 Environment: Debian Version 5.0.5, Linux *** 2.6.29-xs5.5.0.17 #1 SMP 
 Mon Aug 3 17:37:37 UTC 2009 i686 GNU/Linux, XenServer Guest
Reporter: Sascha Reuter
Priority: Critical

 After a CouchDB restart, recently added/changed documents+designdocuments 
 (min. 2 weeks timeline!) are missing and cant be accessed trough REST Calls / 
 Futon. 
 All documents that are still available trought REST/Futon only exist in old 
 revisions.
 All documents/revisions can be found doing a manual search (less/egrep/...) 
 in the datafile (/var/lib/couchdb/database.couch)
 Example:
 strings dtap.couch | grep -i 226b2e6c-24b7-4336-92c7-257abf923b11
 $226b2e6c-24b7-4336-92c7-257abf923b11h
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11l
 $226b2e6c-24b7-4336-92c7-257abf923b11h
 $226b2e6c-24b7-4336-92c7-257abf923b11h
 curl http://localhost:5984/dtap/226b2e6c-24b7-4336-92c7-257abf923b11
 {error:not_found,reason:missing}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-812) implement randomization in views resultset

2010-06-28 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883212#action_12883212
 ] 

Damien Katz commented on COUCHDB-812:
-

I think this is a fairly useful feature. Many moons ago I needs something 
similar in Lotus Notes, to randomly display a document from the database. It 
was difficult to get working.

It should be possible to do a random view get by randomly navigating btree 
nodes until you reach a leaf node. Though there will be some bias when the tree 
is unbalanced.

 implement randomization in views resultset
 --

 Key: COUCHDB-812
 URL: https://issues.apache.org/jira/browse/COUCHDB-812
 Project: CouchDB
  Issue Type: Wish
  Components: Database Core
Affects Versions: 0.11
 Environment: CouchDB
Reporter: Mickael Bailly
Priority: Minor

 This is a proposal for a new feature in CouchDB : allow a randomization of 
 rows in a view response. We can for example add a randomize query parameter...
 This request would probably not return the same results for the same request.
 As an example :
 GET /db/_design/doc/_view/example :
 {
   ..
   rows: [
 {key: 1 ...},
 {key: 2 ...},
 {key: 3 ...}
   ]
 }
 GET /db/_design/doc/_view/example?randomize=true :
 {
   ..
   rows: [
 {key: 2 ...},
 {key: 3 ...},
 {key: 1 ...}
   ]
 }
 GET /db/_design/doc/_view/example?randomize=true :
 {
   ..
   rows: [
 {key: 1 ...},
 {key: 3 ...},
 {key: 2 ...}
   ]
 }
 This is a feature hard to implement client-side (but by reading all doc ids 
 and use client-side random function). It's implemented by the RDBMS from 
 ages, probably for the very same reasons : if we should read all the rows 
 client-side to random-select some of them, performances are awful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-86) (CouchDB on Windows) compaction can not be done.

2010-06-24 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882329#action_12882329
 ] 

Damien Katz commented on COUCHDB-86:


Can anyone verify if the patch in COUCHDB-780 fixes the problem on windows with 
the latest Erlang?

 (CouchDB on Windows) compaction can not be done.
 

 Key: COUCHDB-86
 URL: https://issues.apache.org/jira/browse/COUCHDB-86
 Project: CouchDB
  Issue Type: Bug
  Components: Build System
Affects Versions: 0.8
 Environment: Windows XP,Erlang/OTP R12B-3
Reporter: Li Zhengji
Assignee: Paul Joseph Davis
Priority: Blocker
 Fix For: 1.0

 Attachments: windows_file_fix_2.patch

   Original Estimate: 5h
  Remaining Estimate: 5h

 During compacting, rename the current DB file to a .old file is not allowed 
 on Windows.
 A possible workaround for this could be: 
 1. Close current DB file (.couch);
 2. Send db_updated to update to use .compact;
 3. After 5sec, delete the .couch file;   This is done in a linked 
 process, after that, this process send a message to update_loop;
 4. After received the message in update_loop, close current DB file which is 
 a .compact file, then rename it to .couch;
 5. Finally, db_updated again to use this new .couch file.
 Maybe, there would be a pause in service?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (COUCHDB-780) Don't block the updater process while compaction deletes old files

2010-06-23 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz resolved COUCHDB-780.
-

Fix Version/s: 1.0
   (was: 1.1)
   Resolution: Fixed

 Don't block the updater process while compaction deletes old files
 --

 Key: COUCHDB-780
 URL: https://issues.apache.org/jira/browse/COUCHDB-780
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.10.2, 0.11
Reporter: Randall Leeds
 Fix For: 1.0

 Attachments: 0001-async-file-deletions.-COUCHDB-780.patch, 
 async_compact_delete.patch


 I have what I think is a simple patch I'll attach. I don't see any reason not 
 to include it unless rename operations can be seriously slow on some 
 filesystems (but I expect this is not the case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-807) authentication cache (user docs cache)

2010-06-23 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-807.
---

Fix Version/s: 1.0
   Resolution: Fixed

 authentication cache (user docs cache)
 --

 Key: COUCHDB-807
 URL: https://issues.apache.org/jira/browse/COUCHDB-807
 Project: CouchDB
  Issue Type: Improvement
 Environment: trunk
Reporter: Filipe Manana
Assignee: Filipe Manana
 Fix For: 1.0

 Attachments: auth_cache.patch, auth_cache_2.patch


 Currently, in order to authenticate an incoming request, each authentication 
 handler will read a user doc from the _users DB.
 By default, 3 authentication handlers are defined (default.ini), which means 
 we can have 3 _users DB lookups (besides 3 DB open and close operations).
 Taking into account that this is done for each incoming HTTP request, for 
 very busy servers this current behaviour might be overkill.
 The following patch adds a new gen_server which implements an authentication 
 cache and keeps the _users DB open all the time, so that cache misses and 
 refreshes are as quick as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-800) Problem when writing larger than 4kb file headers

2010-06-15 Thread Damien Katz (JIRA)
Problem when writing larger than 4kb file headers
-

 Key: COUCHDB-800
 URL: https://issues.apache.org/jira/browse/COUCHDB-800
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.11
Reporter: Damien Katz
Assignee: Damien Katz
 Fix For: 0.11.1, 0.12


From Andrey Somov:

Hi,
while reading the CouchDB source I found a question in couch_file.erl,
I am not sure whether it is a bug or not.

Lines 297-311:

handle_call({write_header, Bin}, _From, #file{fd=Fd, eof=Pos}=File) -
   BinSize = size(Bin),
   case Pos rem ?SIZE_BLOCK of
   0 -
   Padding = ;
   BlockOffset -
   Padding = 0:(8*(?SIZE_BLOCK-BlockOffset))
   end,
   FinalBin = [Padding, 1, BinSize:32/integer | make_blocks(1, [Bin])],
   case file:write(Fd, FinalBin) of
   ok -
   {reply, ok, File#file{eof=Pos+iolist_size(FinalBin)}};
   Error -
   {reply, Error, File}
   end;


Because 1, BinSize:32/integer occupies 5 bytes make_blocks() shall
use offset=5, but the offset is only 1.
(it should be make_blocks(5, [Bin]))

Since the header is smaller then 4k there is no difference and it
works (the tests succeed with both 1 and 5). But it makes it more
difficult to understand the code for those who study the source to
understand how it works.

-
Thank you,
Andrey

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-800) Problem when writing larger than 4kb file headers

2010-06-15 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-800.
---

Resolution: Fixed

 Problem when writing larger than 4kb file headers
 -

 Key: COUCHDB-800
 URL: https://issues.apache.org/jira/browse/COUCHDB-800
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.11
Reporter: Damien Katz
Assignee: Damien Katz
 Fix For: 0.11.1, 0.12


 From Andrey Somov:
 Hi,
 while reading the CouchDB source I found a question in couch_file.erl,
 I am not sure whether it is a bug or not.
 Lines 297-311:
 handle_call({write_header, Bin}, _From, #file{fd=Fd, eof=Pos}=File) -
BinSize = size(Bin),
case Pos rem ?SIZE_BLOCK of
0 -
Padding = ;
BlockOffset -
Padding = 0:(8*(?SIZE_BLOCK-BlockOffset))
end,
FinalBin = [Padding, 1, BinSize:32/integer | make_blocks(1, [Bin])],
case file:write(Fd, FinalBin) of
ok -
{reply, ok, File#file{eof=Pos+iolist_size(FinalBin)}};
Error -
{reply, Error, File}
end;
 Because 1, BinSize:32/integer occupies 5 bytes make_blocks() shall
 use offset=5, but the offset is only 1.
 (it should be make_blocks(5, [Bin]))
 Since the header is smaller then 4k there is no difference and it
 works (the tests succeed with both 1 and 5). But it makes it more
 difficult to understand the code for those who study the source to
 understand how it works.
 -
 Thank you,
 Andrey

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-791) Changes not written if server shutdown during delayed_commits period

2010-06-14 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878667#action_12878667
 ] 

Damien Katz commented on COUCHDB-791:
-

Sleeping while waiting doesn't give a guarantee. On a heavily loaded server, it 
could take many seconds to completely flush everything.

If you want to ensure your data is to disk, use full commits. Nothing else 
gives any guarantees.

 Changes not written if server shutdown during delayed_commits period
 

 Key: COUCHDB-791
 URL: https://issues.apache.org/jira/browse/COUCHDB-791
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.11.1
 Environment: Linux (Ubuntu 10.04)
Reporter: Matt Goodall

 If the couchdb server is shutdown (couchdb -d, Ctrl+C at the console, etc) 
 during the delayed commits period then buffered updates are lost.
 Simple script to demonstrate the problem is:
 db=http://localhost:5984/scratch
 curl $db -X DELETE
 curl $db -X PUT
 curl $db -X POST -d '{}'
 /path/to/couchdb/bin/couchdb -d
 When couchdb is started again the database is empty.
 Affects 0.11.x and trunk branches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-780) Don't block the updater process while compaction deletes old files

2010-06-12 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878327#action_12878327
 ] 

Damien Katz commented on COUCHDB-780:
-

Haven't looked at the patch, but this sounds similar to mark hammonds patch for 
fixing the windows file problems (which requires a yet unreleased version of 
Erlang I think). Maybe Marks patch also fixes this problem, or could with a 
little more work. 

https://issues.apache.org/jira/browse/COUCHDB-86

 Don't block the updater process while compaction deletes old files
 --

 Key: COUCHDB-780
 URL: https://issues.apache.org/jira/browse/COUCHDB-780
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.10.2, 0.11
Reporter: Randall Leeds
 Fix For: 1.1

 Attachments: 0001-async-file-deletions.-COUCHDB-780.patch, 
 async_compact_delete.patch


 I have what I think is a simple patch I'll attach. I don't see any reason not 
 to include it unless rename operations can be seriously slow on some 
 filesystems (but I expect this is not the case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-767) do a non-blocking file:sync

2010-06-08 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876722#action_12876722
 ] 

Damien Katz commented on COUCHDB-767:
-

The Fsync on a separate thread/process might not work. Definitely load test the 
patch to ensure its giving you what your expect.

http://antirez.com/post/fsync-different-thread-useless.html

 do a non-blocking file:sync
 ---

 Key: COUCHDB-767
 URL: https://issues.apache.org/jira/browse/COUCHDB-767
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.11
Reporter: Adam Kocoloski
 Fix For: 1.1

 Attachments: 767-async-fsync.patch, async_fsync.patch


 I've been taking a close look at couch_file performance in our production 
 systems.  One of things I've noticed is that reads are occasionally blocked 
 for a long time by a slow call to file:sync.  I think this is unnecessary.  I 
 think we could do something like
 handle_call(sync, From, #file{name=Name}=File) -
 spawn_link(fun() - sync_file(Name, From) end),
 {noreply, File};
 and then
 sync_file(Name, From) -
 {ok, Fd} = file:open(Name, [read, raw]),
 gen_server:reply(From, file:sync(Fd)),
 file:close(Fd).
 Does anyone see a downside to this?  Individual clients of couch_file still 
 see exactly the same behavior as before, only readers are not blocked by 
 syncs initiated in the db_updater process.  When data needs to be flushed 
 file:sync is _much_ slower than spawning a local process and opening the file 
 again --  in the neighborhood of 1000x slower even on Linux with its 
 less-than-durable use of vanilla fsync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-782) Restarting replication

2010-06-03 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875190#action_12875190
 ] 

Damien Katz commented on COUCHDB-782:
-

Per database UUIDs have the problem of databases being copied around on the 
file system, or restored from backup. A better option is convert the URIs to a 
canonical format so they always look the same.

 Restarting replication 
 ---

 Key: COUCHDB-782
 URL: https://issues.apache.org/jira/browse/COUCHDB-782
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.10
 Environment: Ubuntu, 9.10
Reporter: Till Klampaeckel

 So we had to restart replication on a server and here's something I noticed.
 At first I restarted the replication via the following command from localhost:
 curl -X POST -d '{source:http://localhost:5984/foo;, 
 target:http://remote:5984/foo}' http://localhost:5984/_replicate
 In response, futon stats:
 W Processed source update #176841152
 That part is great.
 Last night I did not have immediate access to the shell so I restarted 
 replication from remote (through curl on my mobile):
 curl -X POST -d '{source:http://user:p...@public.host:5984/foo;, 
 target:http://remote:5984/foo}' 
 http://user:p...@pubic.host:5984/_replicate
 The response in futon this morning:
 W Processed source update #1066
 ... and it kept sitting there like it was stalled and only continued in 
 smaller increments.
 I restarted CouchDB and restarted from localhost - instant jump to 176 
 million.
 I'm just wondering what might be different accept for that one is against the 
 public interface, vs. localhost. I'd assume that replication behaves the same 
 regardless.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-763) duplicate and or missing revisions in changes feed

2010-05-17 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868454#action_12868454
 ] 

Damien Katz commented on COUCHDB-763:
-

This looks to be the same issue to what Simon Eisenmann encountered. I believe 
the problem was 2 databases servers running at the same time for a short time 
during an internal auto-restart. This problem has been fixed in trunk.

 duplicate and or missing revisions in changes feed
 --

 Key: COUCHDB-763
 URL: https://issues.apache.org/jira/browse/COUCHDB-763
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.10.1
Reporter: Randall Leeds
Priority: Critical

 I have no idea if this is unique to 0.10.1 or if it shows up on 0.11/trunk 
 since I have no clue how to repro.
 If we can identify why this happens we should work to be very sure it's fixed.
 I see something like the following in my changes feed (taken from consecutive 
 lines of an actually changes feed):
 {seq:36527,id:anonymized_docid,changes:[{rev:2186-967dbcd9d960b77955fcf6048fb219cc}]},
 {seq:36530,id:anonymized_docid,changes:[{rev:2188-ae8481b29fd3a42d5190aba7c13a522b}]},
 I was under the impression that _changes only showed the newest revision for 
 any document.
 Furthermore, the first of these two is actually missing. Querying the 
 document with ?revs_info=true shows it as such and this is confirmed by 
 trying to query for ?rev=2186-967dbcd9d960b77955fcf6048fb219cc
 1) Missing revisions should never show up in changes
 2) Changes shouldn't list a document twice
 3) This makes replication impossible since the reader tries to open missing 
 revisions.
 Mostly for number (3) I'm marking this as critical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

2010-05-13 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867197#action_12867197
 ] 

Damien Katz commented on COUCHDB-738:
-

I've been thinking about this issue, and I think storing the whole rev tree in 
the by_seq index is a bad idea.

I'm also thinking of ways to make the compaction faster.

Store the full_doc_info outside the by_id btree, but store the doc_info instead 
(with it's pointers to the main doc and conflicts) with a pointer to the 
full_doc_info as well. On reads, this avoids the overhead of loading up the 
ful_doc_info just to get the main doc. Updates and replications reads (that 
request the rev_info) will have to load up the full_doc_info with an extra 
read, but unlikely to be an extra disk io since it will be close the most 
recent doc revision.

The by_seq index will also have the doc_info and a pointer to the full_doc_info.

Then on compaction, scan the by_seq index, copying over the full_doc_info and 
the documents and attachments into a new file. Each full_doc_info should be 
linked to the next with an offset to next's file pos.

Then scan the newly written full_doc_info, converting them to doc_infos and 
pointers to the full_doc_info, and writing them out consecutively to the end of 
the file.

Then sort just this portion of the file, on disk, by the id in the doc_info. 
This is the most expensive part of the compaction, but sorting things on disk 
is common problem with lots of open libraries out there that are highly 
optimized.

Then convert this id sorted portion of the file to btree leaf nodes. Then 
rescan leaf nodes and build up the inner nodes, recurse until you have single 
root node left. This is now your by_id index.

Then rescan the full_doc_infos, and write out to the end of the file the 
doc_infos and pointers back to the full_doc_infos. This is already sorted 
by_seq.

Then convert this seq sorted portion of the file to btree leaf nodes. Then 
rescan leaf nodes and build up the inner nodes, recurse until you have single 
root node left. This is now your by_seq index.

You now a fully compacted database with no wasted btree nodes. I think this 
will be a lot faster. With the exception of the by_id sorting phase, this 
eliminates random disk seeks.

 more efficient DB compaction (fewer seeks)
 --

 Key: COUCHDB-738
 URL: https://issues.apache.org/jira/browse/COUCHDB-738
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.9.2, 0.10.1, 0.11
Reporter: Adam Kocoloski
Assignee: Adam Kocoloski
 Fix For: 1.1

 Attachments: 738-efficient-compaction-v1.patch, 
 738-efficient-compaction-v2.patch


 CouchDB's database compaction algorithm walks the by_seq btree, then does a 
 lookup in the by_id btree for every document in the database.  It does this 
 because the #full_doc_info{} record with the full revision tree is only 
 stored in the by_id tree.  I'm proposing instead to store duplicate copies of 
 #full_doc_info{} in both trees, and to have the compactor use the by_seq tree 
 exclusively.  The net effect is significantly fewer calls to pread(), and an 
 compaction IO pattern where reads tend to be clustered close to each other in 
 the file.
 If the by_id tree is fully cached, or if the id tree nodes are located near 
 the seq tree nodes, the performance improvement is small but noticeable (~10% 
 in some simple tests).  On the other hand, in the worst-case scenario of 
 randomly-generated docids and a database much larger than main memory the 
 improvement is huge.  Joe Williams did some simple benchmarks with a 50k 
 document, 600 MB database on a 256MB VPS.  The compaction time for that DB 
 dropped from 15m to 2m20s, so more than 6x faster.
 Storing the #full_doc_info{} in the seq tree also allows for some similar 
 optimizations in the replicator.
 This patch might have downsides when documents have a large number of edits.  
 These include an increase in the size of the database and slower view 
 indexing.  I expect both to be small effects.
 The patch can be applied directly to tr...@934272.  Existing DBs are still 
 readable, new updates will be written in the new format, and databases can be 
 fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

2010-05-03 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863632#action_12863632
 ] 

Damien Katz commented on COUCHDB-738:
-

Definitely I'd like to see performance metrics of view building on heaviily 
editted documents before committing this.

 more efficient DB compaction (fewer seeks)
 --

 Key: COUCHDB-738
 URL: https://issues.apache.org/jira/browse/COUCHDB-738
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.9.2, 0.10.1, 0.11
Reporter: Adam Kocoloski
Assignee: Adam Kocoloski
 Fix For: 1.1

 Attachments: 738-efficient-compaction-v1.patch


 CouchDB's database compaction algorithm walks the by_seq btree, then does a 
 lookup in the by_id btree for every document in the database.  It does this 
 because the #full_doc_info{} record with the full revision tree is only 
 stored in the by_id tree.  I'm proposing instead to store duplicate copies of 
 #full_doc_info{} in both trees, and to have the compactor use the by_seq tree 
 exclusively.  The net effect is significantly fewer calls to pread(), and an 
 compaction IO pattern where reads tend to be clustered close to each other in 
 the file.
 If the by_id tree is fully cached, or if the id tree nodes are located near 
 the seq tree nodes, the performance improvement is small but noticeable (~10% 
 in some simple tests).  On the other hand, in the worst-case scenario of 
 randomly-generated docids and a database much larger than main memory the 
 improvement is huge.  Joe Williams did some simple benchmarks with a 50k 
 document, 600 MB database on a 256MB VPS.  The compaction time for that DB 
 dropped from 15m to 2m20s, so more than 6x faster.
 Storing the #full_doc_info{} in the seq tree also allows for some similar 
 optimizations in the replicator.
 This patch might have downsides when documents have a large number of edits.  
 These include an increase in the size of the database and slower view 
 indexing.  I expect both to be small effects.
 The patch can be applied directly to tr...@934272.  Existing DBs are still 
 readable, new updates will be written in the new format, and databases can be 
 fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-623) File format for views is space and time inefficient - use a better one

2010-01-13 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-623.
---

Resolution: Invalid
  Assignee: Damien Katz

Closing as Invalid this has no objective criteria for being resolved.

 File format for views is space and time inefficient - use a better one
 --

 Key: COUCHDB-623
 URL: https://issues.apache.org/jira/browse/COUCHDB-623
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.10
Reporter: Roger Binns
Assignee: Damien Katz

 This was discussed on the dev mailing list over the last few days and noted 
 here so it isn't forgotten.
 The main database file format is optimised for data integrity - not losing or 
 mangling documents - and rightly so.
 That same append-only format is also used for views where it is a poor fit.  
 The more random the ordering of data supplied, the larger the btree.  The 
 larger the keys (in bytes) the larger the btree.  As an example my 2GB of raw 
 JSON data turns into a 3.9GB CouchDB database but a 27GB view file (before 
 compacting to 900MB).  Since views are not replicated, this requires a 
 disproportionate amount of disk space on each receiving server (not to 
 mention I/O load).  The format also affects view generation performance.  By 
 loading my documents into CouchDB in an order by the most emitted value in 
 views I was able to reduce load time from 75 minutes to 40 minutes with the 
 view file size being 15GB instead of 27GB, but still very distant from the 
 900MB post compaction.
 Views are a performance enhancement.  They save you from having to visit 
 every document when doing some queries.  The data within in a view is 
 generated and hence the only consequence of losing view data is a performance 
 one and the view can be regenerated anyway.  Consequently the file format 
 should be one that is optimised for performance and size.  The only integrity 
 feature needed is the ability to tell that the view is potentially corrupt 
 (eg the power failed while it was being generated/updated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

2010-01-13 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799937#action_12799937
 ] 

Damien Katz commented on COUCHDB-583:
-

I haven't looked at the patch, but I agree with most of Paul comments, except 
for figuring out when to compress files. Lots of compressed files might have 
uncompressed headers in the file, leading to unnecessary compression. MP3s with 
id3v2 tags immediately come to mind.

 storing attachments in compressed form and serving them in compressed form if 
 accepted by the client
 

 Key: COUCHDB-583
 URL: https://issues.apache.org/jira/browse/COUCHDB-583
 Project: CouchDB
  Issue Type: New Feature
  Components: Database Core, HTTP Interface
 Environment: CouchDB trunk
Reporter: Filipe Manana
 Attachments: couchdb-583-trunk-3rd-try.patch, 
 couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, 
 couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, 
 couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, 
 jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch


 This feature allows Couch to gzip compress attachments as they are being 
 received and store them in compressed form.
 When a client asks for downloading an attachment (e.g. GET 
 somedb/somedoc/attachment.txt), the attachment is sent in compressed form if 
 the client's http request has gzip specified as a valid transfer encoding for 
 the response (using the http header Accept-Encoding). Otherwise couch 
 decompresses the attachment before sending it back to the client.
 Attachments are compressed only if their MIME type matches one of those 
 listed in a separate config file. Compression level is also configurable in 
 the default.ini file.
 This follows Damien's suggestion from 30 November:
 Perhaps we need a separate user editable ini file to specify compressable or 
 non-compressable files (would probably be too big for the regular ini file). 
 What do other web servers do?
 Also, a potential optimization is to compress the file while writing to disk, 
 and serve the compressed bytes directly to clients that can handle it, and 
 decompressed for those that can't. For compressable types, it's a win for 
 both disk IO for reads and writes, and CPU on read.
 Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-604) _changes feed with ?feed=continuous does not return valid JSON

2009-12-21 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793257#action_12793257
 ] 

Damien Katz commented on COUCHDB-604:
-

Wow, lots of comments on this one.

I originally implemented this as a single JSON stream, it was switched to 
newline separated json objects for ease parsing by clients. I don't have an 
opinion one way or the other, but the thing starts to bothers me is the culture 
offering more options so that everyone can have it exactly as they want it.

The stems from the increasing texture of the API, and what must get 
documented and tested, and the burden of what must get implemented for those 
who want to make compatible CouchDB implementations. I tend to favor simpler 
APIs to the point of occasionally pushing some of the complexity to the client 
to ensure the server itself isn't completely overloaded with complexity and 
options.

 _changes feed with ?feed=continuous does not return valid JSON
 --

 Key: COUCHDB-604
 URL: https://issues.apache.org/jira/browse/COUCHDB-604
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.10
Reporter: Joscha Feth
Priority: Trivial

 When using the _changes interface via ?feed=continuous the JSON returned is 
 rather 
 a stream of JSON documents than a valid JSON file itself:
 {seq:38,id:f473fe61a8a53778d91c38b23ed6e20f,changes:[{rev:9-d3e71c7f5f991b26fe014d884a27087f}]}
 {seq:68,id:2a574814d61d9ec8a0ebbf43fa03d75b,changes:[{rev:6-67179f215e42d63092dc6b2199a3bf51}],deleted:true}
 {seq:70,id:75dbdacca8e475f5909e3cc298905ef8,changes:[{rev:1-0dee261a2bd4c7fb7f2abd811974d3f8}]}
 {seq:71,id:09fb03236f80ea0680a3909c2d788e43,changes:[{rev:1-a9646389608c13a5c26f4c14c6863753}]}
 to be valid there needs to be a root element (and then an array with commata) 
 like in the non-continuous feed:
 {results:[
 {seq:38,id:f473fe61a8a53778d91c38b23ed6e20f,changes:[{rev:9-d3e71c7f5f991b26fe014d884a27087f}]},
 {seq:68,id:2a574814d61d9ec8a0ebbf43fa03d75b,changes:[{rev:6-67179f215e42d63092dc6b2199a3bf51}],deleted:true},
 {seq:70,id:75dbdacca8e475f5909e3cc298905ef8,changes:[{rev:1-0dee261a2bd4c7fb7f2abd811974d3f8}]},
 {seq:71,id:09fb03236f80ea0680a3909c2d788e43,changes:[{rev:1-a9646389608c13a5c26f4c14c6863753}]},
 in short this means that if someone does not parse the change events in an 
 object like manner (e.g. waiting for a line-ending and then parsing the 
 line), but using a SAX-like parser (throwing events of each new object, etc.) 
 and expecting the response to be JSON (which it is not, because its not 
 {x:[{},{},{}]} but {}{}{} which is not valid) there is an error thrown.
 I can see, that people doing this line by line might be okay with the above 
 approach, but the response is not valid JSON and it would be nice if there 
 were a flag to make the response valid JSON.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

2009-11-30 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783732#action_12783732
 ] 

Damien Katz commented on COUCHDB-583:
-

One problem I think I see with the patch is that we are compressing regardless 
of mime type. For already compressed files (image, music and video), it does 
nothing but add CPU overhead.

Perhaps we need a separate user editable ini file to specify compressable or 
non-compressable files (would probably be too big for the regular ini file). 
What do other web servers do?

Also, a potential optimization is to compress the file while writing to disk, 
and serve the compressed bytes directly to clients that can handle it, and 
decompressed for those that can't. For compressable types, it's a win for both 
disk IO for reads and writes, and CPU on read.

 adding ?compression=(gzip|deflate) optional parameter to the attachment 
 download API
 

 Key: COUCHDB-583
 URL: https://issues.apache.org/jira/browse/COUCHDB-583
 Project: CouchDB
  Issue Type: New Feature
  Components: HTTP Interface
 Environment: CouchDB trunk revision 885240
Reporter: Filipe Manana
 Attachments: jira-couchdb-583-1st-try-trunk.patch, 
 jira-couchdb-583-2nd-try-trunk.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 The following new feature is added in the patch following this ticket 
 creation.
 A new optional http query parameter compression is added to the attachments 
 API.
 This parameter can have one of the values:  gzip or deflate.
 When asking for an attachment (GET http request), if the query parameter 
 compression is found, CouchDB will send the attachment compressed to the 
 client (and sets the header Content-Encoding with gzip or deflate).
 Further, it adds a new config option treshold_for_chunking_comp_responses 
 (httpd section) that specifies an attachment length threshold. If an 
 attachment has a length = than this threshold, the http response will be 
 chunked (besides compressed).
 Note that using non chunked compressed  body responses requires storing all 
 the compressed blocks in memory and then sending each one to the client. This 
 is a necessary evil, as we only know the length of the compressed body 
 after compressing all the body, and we need to set the Content-Length 
 header for non chunked responses. By sending chunked responses, we can send 
 each compressed block immediately, without accumulating all of them in memory.
 Examples:
 $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
 $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
 $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will 
 not be compressed
 $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # 
 will give a 500 error code
 Etap test case included.
 Feedback would be very welcome.
 cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-292) A deleted document may be resaved with an old revision and is then considered undeleted

2009-11-23 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-292.
---

Resolution: Fixed
  Assignee: Damien Katz

Fix with tests in svn r883494.

 A deleted document may be resaved with an old revision and is then considered 
 undeleted
 ---

 Key: COUCHDB-292
 URL: https://issues.apache.org/jira/browse/COUCHDB-292
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9
Reporter: Paul Carey
Assignee: Damien Katz
 Fix For: 0.11


 If a document is deleted, a PUT request may be issued with the same revision 
 that was passed to the DELETE request. When this happens the previously 
 deleted document is assigned a new revision and is no longer considered 
 deleted.
 This behaviour is new within the last few weeks.
 The following curl session illustrates the issue. 
 08:18 : ~ $ curl -X PUT -d '{_id:foo}' localhost:5984/scratch/foo
 {ok:true,id:foo,rev:1-3690485448}
 08:19 : ~ $ curl -X PUT -d '{_id:foo,_rev:1-3690485448}' 
 localhost:5984/scratch/foo
 {ok:true,id:foo,rev:2-966942539}
 08:19 : ~ $ curl -X DELETE localhost:5984/scratch/foo?rev=2-966942539
 {ok:true,id:foo,rev:3-421182311}
 08:20 : ~ $ curl -X GET localhost:5984/scratch/foo
 {error:not_found,reason:deleted}
 08:20 : ~ $ curl -X PUT -d '{_id:foo,_rev:2-966942539}' 
 localhost:5984/scratch/foo
 {ok:true,id:foo,rev:3-1867999175}
 08:20 : ~ $ curl -X GET localhost:5984/scratch/foo
 {_id:foo,_rev:3-1867999175}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-558) Validate Content-MD5 request headers on uploads

2009-11-02 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772742#action_12772742
 ] 

Damien Katz commented on COUCHDB-558:
-

Robert is correct, MD5 is fine for validating integrity, and as far as I know, 
it's the only hash function that's standardized in HTTP.

For fully secure, unspoofable transmission, SSL is the way to go anyway.

 Validate Content-MD5 request headers on uploads
 ---

 Key: COUCHDB-558
 URL: https://issues.apache.org/jira/browse/COUCHDB-558
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core, HTTP Interface
Reporter: Adam Kocoloski
 Fix For: 0.11


 We could detect in-flight data corruption if a client sends a Content-MD5 
 header along with the data and Couch validates the MD5 on arrival.
 RFC1864 - The Content-MD5 Header Field
 http://www.faqs.org/rfcs/rfc1864.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-517) changing uuid algorithm causes client errors

2009-10-03 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761906#action_12761906
 ] 

Damien Katz commented on COUCHDB-517:
-

I've reviewed the patch and it looks good. I'd commit it myself but I'm not 
sure of the patch flags I need for this diff.

 changing uuid algorithm causes client errors
 

 Key: COUCHDB-517
 URL: https://issues.apache.org/jira/browse/COUCHDB-517
 Project: CouchDB
  Issue Type: Bug
Reporter: Robert Newson
 Attachments: couchdb-517.patch


 When changing the uuid configuration (by PUT to _config/uuids/algorithm), a 
 client attempting an operation at the same time experiences a transitory 
 connection refused problem.
 Attached is a patch that changes couch_uuid.erl so that it changes its 
 internal state when the configuration changes rather than the current 
 behavior of stopping and then being restarted by the supervisor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-448) Support Gzip encoding for replicating over slow connections

2009-09-15 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755753#action_12755753
 ] 

Damien Katz commented on COUCHDB-448:
-

Ideally we'll store attachments gzipped, and then just stream them unchanged 
for clients that can handle, decompress for clients that can't.

We'll probably need a config file to avoid mime types that aren't compressable, 
like images and movies.

 Support Gzip encoding for replicating over slow connections
 ---

 Key: COUCHDB-448
 URL: https://issues.apache.org/jira/browse/COUCHDB-448
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Reporter: Jason Davies
Assignee: Adam Kocoloski

 This shouldn't be too hard to add, we should support it in general for all 
 HTTP requests to the server and also allow it to be enabled in the replicator 
 client for pull/push replication.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-495) Make views twice as fast

2009-09-14 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-495.
---

Resolution: Fixed

We now have a raw collation option, and regular json collation is much faster 
too.

 Make views twice as fast
 

 Key: COUCHDB-495
 URL: https://issues.apache.org/jira/browse/COUCHDB-495
 Project: CouchDB
  Issue Type: Improvement
  Components: JavaScript View Server
Reporter: Chris Anderson
 Fix For: 0.11

 Attachments: binary_collate.diff, couch_perf.py, less_json.patch, 
 numbers-davisp.txt, outputv.patch, perf.py, R13B1-uca-bif.patch, 
 term_collate.diff


 Devs,
 Damien's identified view collation as the most significant bottleneck for the 
 view generation. We've done some testing, and some preliminary patches, and 
 the upshot seems to be that even removing ICU from the collator is not a 
 significant boost. What does speed things up greatly is using raw Erlang term 
 comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A  B 
 end. provides a roughly 2x speedup.
 However, the patch is challenging for a few reasons: Making the collation 
 strategy switchable at all is tough. It's actually quite easy to get an 
 alternate less function into the btree writer (all you've got to do is set it 
 in couch_view_group:init_group). The hard part is propagating the same less 
 function to the PassedEndFun. There's a secondary problem that when you use 
 raw term comparison, a lot of terms turn out to come before nil, and after 
 {}, which we use as artificial first and last terms in the less_json 
 function. So just switching to raw collation alone will leave you with a view 
 with unreachable rows.
 I tried two different approaches to the problem last night, and both of them 
 led to (instructive) dead ends. I'll attach them for illustration purposes.
 The next line of attack we think should be tried is this:
 First - remove _all_docs_by_seq, as it is just adding complexity to the 
 problem, and has been deprecated by _changes anyway. Along the same lines, 
 _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
 completely different collation needs than make_view_fold_fun. We'll end up 
 duplicating a little code in the _all_docs implementation, but it should be 
 worth it because it will make the other work much simpler.
 Once those changes have laid the groundwork, the next step is to change 
 make_view_fold_fun and couch_view:fold, so that rather than 
 make_view_fold_fun being responsible for detecting when we've passed the 
 endkey. That means make_passed_end_fun and all references to PassedEnd and 
 PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
 couch_view:fold (and the underlying btree) will need to accept not just a 
 start, but also an endkey. This will make it much easier to use the less fun 
 that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
 This will move some complexity to the btree code from the view code, but will 
 keep the concerns more aligned. This also means that the btree will need to 
 accept not only an endkey for folds, but also an inclusive_end parameter.
 Once we have all these refactorings done, it will be easy to make the less 
 fun for an index configurable, as both the index writer and the index reader 
 will look for it in the same place (on the #btree record).
 My aim is to start a discussion and get someone excited to work on this 
 patch. Think of all the fast-views glory you'll get! Please ask questions and 
 otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-495:


Attachment: couch_perf.py

 Make views twice as fast
 

 Key: COUCHDB-495
 URL: https://issues.apache.org/jira/browse/COUCHDB-495
 Project: CouchDB
  Issue Type: Improvement
  Components: JavaScript View Server
Reporter: Chris Anderson
 Fix For: 0.11

 Attachments: binary_collate.diff, couch_perf.py, term_collate.diff


 Devs,
 Damien's identified view collation as the most significant bottleneck for the 
 view generation. We've done some testing, and some preliminary patches, and 
 the upshot seems to be that even removing ICU from the collator is not a 
 significant boost. What does speed things up greatly is using raw Erlang term 
 comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A  B 
 end. provides a roughly 2x speedup.
 However, the patch is challenging for a few reasons: Making the collation 
 strategy switchable at all is tough. It's actually quite easy to get an 
 alternate less function into the btree writer (all you've got to do is set it 
 in couch_view_group:init_group). The hard part is propagating the same less 
 function to the PassedEndFun. There's a secondary problem that when you use 
 raw term comparison, a lot of terms turn out to come before nil, and after 
 {}, which we use as artificial first and last terms in the less_json 
 function. So just switching to raw collation alone will leave you with a view 
 with unreachable rows.
 I tried two different approaches to the problem last night, and both of them 
 led to (instructive) dead ends. I'll attach them for illustration purposes.
 The next line of attack we think should be tried is this:
 First - remove _all_docs_by_seq, as it is just adding complexity to the 
 problem, and has been deprecated by _changes anyway. Along the same lines, 
 _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
 completely different collation needs than make_view_fold_fun. We'll end up 
 duplicating a little code in the _all_docs implementation, but it should be 
 worth it because it will make the other work much simpler.
 Once those changes have laid the groundwork, the next step is to change 
 make_view_fold_fun and couch_view:fold, so that rather than 
 make_view_fold_fun being responsible for detecting when we've passed the 
 endkey. That means make_passed_end_fun and all references to PassedEnd and 
 PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
 couch_view:fold (and the underlying btree) will need to accept not just a 
 start, but also an endkey. This will make it much easier to use the less fun 
 that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
 This will move some complexity to the btree code from the view code, but will 
 keep the concerns more aligned. This also means that the btree will need to 
 accept not only an endkey for folds, but also an inclusive_end parameter.
 Once we have all these refactorings done, it will be easy to make the less 
 fun for an index configurable, as both the index writer and the index reader 
 will look for it in the same place (on the #btree record).
 My aim is to start a discussion and get someone excited to work on this 
 patch. Think of all the fast-views glory you'll get! Please ask questions and 
 otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-495) Make views twice as fast

2009-09-03 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751215#action_12751215
 ] 

Damien Katz commented on COUCHDB-495:
-

I've attach the file I'm using for performance benchmarking couch_perf.py. In 
my tests, the majority of the time was spent inside the less comparator 
function. Part of the problem is the expense of the callout the ICU for 
collation, which is copying the strings to buffers before comparing them. That 
can be fixed by using a more efficient method of sending data to Erlang native 
ports, which is something I'm working on.

However, our json comparison function is also far more expensive than the 
built-in Erlang term comparison operators.

So the easy solution is to just do a native Erlang term collation option. 
This is the option used by views that don't need collation, just performance.

A better, but not really possible yet solution, is to code our own comparison 
function in C, to be on par with Erlangs built-in comparison. I think this 
isn't yet possible with Erlang and C code without hacking the core VM.

 Make views twice as fast
 

 Key: COUCHDB-495
 URL: https://issues.apache.org/jira/browse/COUCHDB-495
 Project: CouchDB
  Issue Type: Improvement
  Components: JavaScript View Server
Reporter: Chris Anderson
 Fix For: 0.11

 Attachments: binary_collate.diff, couch_perf.py, term_collate.diff


 Devs,
 Damien's identified view collation as the most significant bottleneck for the 
 view generation. We've done some testing, and some preliminary patches, and 
 the upshot seems to be that even removing ICU from the collator is not a 
 significant boost. What does speed things up greatly is using raw Erlang term 
 comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A  B 
 end. provides a roughly 2x speedup.
 However, the patch is challenging for a few reasons: Making the collation 
 strategy switchable at all is tough. It's actually quite easy to get an 
 alternate less function into the btree writer (all you've got to do is set it 
 in couch_view_group:init_group). The hard part is propagating the same less 
 function to the PassedEndFun. There's a secondary problem that when you use 
 raw term comparison, a lot of terms turn out to come before nil, and after 
 {}, which we use as artificial first and last terms in the less_json 
 function. So just switching to raw collation alone will leave you with a view 
 with unreachable rows.
 I tried two different approaches to the problem last night, and both of them 
 led to (instructive) dead ends. I'll attach them for illustration purposes.
 The next line of attack we think should be tried is this:
 First - remove _all_docs_by_seq, as it is just adding complexity to the 
 problem, and has been deprecated by _changes anyway. Along the same lines, 
 _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
 completely different collation needs than make_view_fold_fun. We'll end up 
 duplicating a little code in the _all_docs implementation, but it should be 
 worth it because it will make the other work much simpler.
 Once those changes have laid the groundwork, the next step is to change 
 make_view_fold_fun and couch_view:fold, so that rather than 
 make_view_fold_fun being responsible for detecting when we've passed the 
 endkey. That means make_passed_end_fun and all references to PassedEnd and 
 PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
 couch_view:fold (and the underlying btree) will need to accept not just a 
 start, but also an endkey. This will make it much easier to use the less fun 
 that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
 This will move some complexity to the btree code from the view code, but will 
 keep the concerns more aligned. This also means that the btree will need to 
 accept not only an endkey for folds, but also an inclusive_end parameter.
 Once we have all these refactorings done, it will be easy to make the less 
 fun for an index configurable, as both the index writer and the index reader 
 will look for it in the same place (on the #btree record).
 My aim is to start a discussion and get someone excited to work on this 
 patch. Think of all the fast-views glory you'll get! Please ask questions and 
 otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-486) Better separation between httpd and core through api layer

2009-08-26 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748022#action_12748022
 ] 

Damien Katz commented on COUCHDB-486:
-

I like the ideas behind this patch, but I think I don't like everything dumped 
into a single module. I think I'd prefer instead to have the same module names 
and the wrapper calls in the same modules, with the implementation code in a 
new file. So the public wrapper calls for couch_db would remain in couch_db, 
but the code is now moved to couch_db_imp or couch_db_priv.

I think export_all is fine if everything is going to be exported anyway.

 Better separation between httpd and core through api layer
 --

 Key: COUCHDB-486
 URL: https://issues.apache.org/jira/browse/COUCHDB-486
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Adam Kocoloski
 Fix For: 0.11

 Attachments: couch_api.patch


 I'm attaching a patch that routes non-purely-functional calls into core 
 CouchDB modules through a new couch_api module.  I also went ahead and wrote 
 down dialyzer specs for everything in couch_api.  I think this will be a 
 useful reference, will make the codebase a bit more accessible to newcomers, 
 and will help us maintain better separation between the purely functional 
 httpd layer and the core (useful in e.g. partitioning work).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-464) Allow POST to _log for external processes

2009-08-12 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742341#action_12742341
 ] 

Damien Katz commented on COUCHDB-464:
-

Why not just write the log messages to a db?

 Allow POST to _log for external processes
 -

 Key: COUCHDB-464
 URL: https://issues.apache.org/jira/browse/COUCHDB-464
 Project: CouchDB
  Issue Type: New Feature
Reporter: Robert Newson
 Attachments: 0001-Add-POST-support-to-_log.patch, 
 0001-Add-POST-support-to-_log.patch, 0001-Add-POST-support-to-_log.patch


 Add POST support to _log so that external processes can also log to 
 couch.log. This would allow couchdb-lucene (to pick a random example) to log 
 consistently. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-462) built-in conflicts view

2009-08-12 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742542#action_12742542
 ] 

Damien Katz commented on COUCHDB-462:
-

I think we should reconsider this patch.

For one thing, it's expensive at runtime, it requires doing a linear scan on 
the full doc index. If you have millions of docs and no conflicts, it will must 
scan through every doc meta record just to tell you that.

Another problem is you don't get any filtering or formatting. Using couchdb 
view, a user can construct a view that shows conflicts by author, customer, 
area, etc and format the results for display. Using this facility, you get no 
formatting or collation options.

I favor backing this change out.

 built-in conflicts view
 ---

 Key: COUCHDB-462
 URL: https://issues.apache.org/jira/browse/COUCHDB-462
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Reporter: Adam Kocoloski
 Fix For: 0.10

 Attachments: 462-jan-2.patch, conflicts_view.diff, 
 COUCHDB-462-adam-updated.patch, COUCHDB-462-jan.patch


 This patch adds a built-in _conflicts view indexed by document ID that looks 
 like
 GET /dbname/_conflicts
 {rows:[
 {id:foo, rev:1-1aa8851c9bb2777e11ba56e0bf768649, 
 conflicts:[1-bdc15320c0850d4ee90ff43d1d298d5d]}
 ]}
 GET /dbname/_conflicts?deleted=true
 {rows:[
 {id:bar, rev:5-dd31186f5aa11ebd47eb664fb342f1b1, 
 conflicts:[5-a0efbb1990c961a078dc5308d03b7044], 
 deleted_conflicts:[3-bdc15320c0850d4ee90ff43d1d298d5d,2-cce334eeeb02d04870e37dac6d33198a]},
 {id:baz, rev:2-eec205a9d413992850a6e32678485900, deleted:true, 
 deleted_conflicts:[2-10009b36e28478b213e04e71c1e08beb]}
 ]}
 As the HTTPd and view layers are a bit outside my specialty I figured I 
 should ask for a Review before Commit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (COUCHDB-420) OAuth authentication support (2-legged initially) and cookie-based authentication

2009-08-04 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz resolved COUCHDB-420.
-

Resolution: Fixed

 OAuth authentication support (2-legged initially) and cookie-based 
 authentication
 -

 Key: COUCHDB-420
 URL: https://issues.apache.org/jira/browse/COUCHDB-420
 Project: CouchDB
  Issue Type: New Feature
  Components: HTTP Interface
Reporter: Jason Davies
Priority: Blocker
 Fix For: 0.10

 Attachments: oauth.1.diff, oauth.2.patch, oauth.3.patch


 This patch adds two-legged OAuth support to CouchDB.
 1. In order to do this, a couple of changes have been made to the way auth 
 handlers are used.  Essentially, the patch allows multiple handlers to be 
 specified in a comma-separated list in the following in the [httpd] section 
 of your .ini config e.g.
 authentication_handlers = {couch_httpd_oauth, 
 oauth_authentication_handler}, {couch_httpd_auth, 
 default_authentication_handler}
 The handlers are tried in order until one of them successfully authenticates 
 and sets user_ctx on the request.  Then the request is passed to the main 
 handler.
 2. Now for the OAuth consumer keys and secrets: as Ubuntu need to be able to 
 bootstrap this i.e. add tokens without a running CouchDB, I have advised 
 creating a new config file in $PREFIX/etc/couchdb/default.d/ called oauth.ini 
 or similar.  This should get read by CouchDB's startup script when it loads 
 its config files (e.g. default.ini and local.ini as well).  There are three 
 sections available:
 i. [oauth_consumer_secrets] consumer_key = consumer_secret
 ii. [oauth_token_secrets] oauth_token = oauth_token_secret
 iii. [oauth_token_users] oauth_token = username
 The format I've used above is [section name] followed by how the keys and 
 values for that section will look on subsequent lines.  The secrets are a way 
 for the consumer to prove that it owns the corresponding consumer key or 
 access token.  The mapping of auth tokens to usernames is a way to specify 
 which user/roles to give to a consumer with a given access token.
 In the future we will also store tokens in the user database (see below).
 3. OAuth replication.  I've extended the JSON sent via POST when initiating a 
 replication as follows:
 {
  source: {
url: url,
auth: {
  oauth: {
consumer_key: oauth_consumer_key,
consumer_secret: oauth_consumer_secret,
token_secret: oauth_token_secret,
token: oauth_token
  }
}
  },
  target: /* same syntax as source, or string for a URL with no auth info, or 
 string for local database name */
 }
 4. This patch also includes cookie-authentication support to CouchDB.  I've 
 covered this here: 
 http://www.jasondavies.com/blog/2009/05/27/secure-cookie-authentication-couchdb/
 The cookie-authentication branch is being used on a couple of live sites and 
 the branch has also been worked on by jchris and benoitc.  As well as cookie 
 auth it includes the beginnings of support for a per-node user database, with 
 APIs for creating/deleting users etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-370) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed

2009-08-03 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738390#action_12738390
 ] 

Damien Katz commented on COUCHDB-370:
-

While I agree we should prevent unnecessary VM exits, we can't prevent them all 
and fortunately the VM is designed to terminate and restart quickly. This is 
part of CouchDB's design too, restarts are always fast.

A correct solution is one or more watchdog processes that watch the VM and the 
subprocess, and if the VM dies, it kills all the subprocesses and then itself.

 If the CouchDB vm is dies or is killed, view subprocesses (js) are not 
 automatically killed
 ---

 Key: COUCHDB-370
 URL: https://issues.apache.org/jira/browse/COUCHDB-370
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 0.9
Reporter: Damien Katz
Priority: Minor

 If CouchDB dies or is killed, it's subprocess are not forcefully killed. If 
 the subprocesses are in infinite loops, they will never die. We need some 
 kind of external watchdog process, or processes that kill the subprocess 
 automatically if the CouchDB erlang vm dies.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-434) 500 error when working with deleted bulk docs

2009-07-29 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-434.
---

   Resolution: Fixed
Fix Version/s: 0.10

Fixed in trunk.

 500 error when working with deleted bulk docs
 -

 Key: COUCHDB-434
 URL: https://issues.apache.org/jira/browse/COUCHDB-434
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Mark Hammond
Assignee: Damien Katz
 Fix For: 0.10

 Attachments: bulk_save_500.patch, bulk_save_500.patch


 When upgrading our app from 0.9 to trunk, I encountered a 500 error 
 attempting to update previously deleted documents.  I've hacked together a 
 patch to the test suite which demonstrates an almost identical error, but 
 note:
 * The test code is misplaced due to that test otherwise failing for me when 
 attempting to compact.  It should go either at the end of the test, or into 
 its own test.
 * As the comments note, the attempt to delete the documents appears to fail 
 with conflict errors - which it probably shouldn't.
 * If you ignore these conflicts (as the test does), the next attempt to 
 'resurrect' these docs causes a 500 error with a badmatch exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-421) add longpolling for _changes

2009-07-20 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733361#action_12733361
 ] 

Damien Katz commented on COUCHDB-421:
-

Style wise this patch looks good, however you must change the indention tabs to 
spaces.

I recommend instead of adding the  longpolling_changes call in the code, just 
reuse the keep_sending_changes call, and after the call to send_changes, add a 
check if  EndSeq  StartSeq and the long poll option is on, stop,

Also I'm not sure about calling it long_poll, but I don't have a better name 
myself.

 add longpolling for _changes
 

 Key: COUCHDB-421
 URL: https://issues.apache.org/jira/browse/COUCHDB-421
 Project: CouchDB
  Issue Type: New Feature
Affects Versions: 0.10
Reporter: Benoit Chesneau
 Attachments: longpoll.diff


 Implement longpolling on _changes. Instead of continuous, longpolling hold 
 request until an update is available then close it. The client will have 
 to ask a new connection. Should solve problem for XHR's  that don't have that 
 status changed (on ie, opera..) .
 I've put all the code in my github repo :
 http://github.com/benoitc/couchdb/tree/longpoll
 diff against trunk is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-285) tail append headers

2009-07-17 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-285.
---

   Resolution: Fixed
Fix Version/s: (was: 1.0)
   0.10
 Assignee: Damien Katz

 tail append headers
 ---

 Key: COUCHDB-285
 URL: https://issues.apache.org/jira/browse/COUCHDB-285
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Reporter: Chris Anderson
Assignee: Damien Katz
 Fix For: 0.10


 this will make .couch files resilient even when truncated (data-loss but 
 still usable). also cuts down on the # of disk seeks.
 [3:02pm] jchris
 damienkatz: the offset in a header would corresponds to number of bytes from 
 the front of the file?
 [3:03pm]
 andysky joined the chat room.
 [3:03pm] damienkatz
 jchris: yes
 [3:03pm] jchris
 because my offset seems to suggest that just over a MB of the file is missing
 [3:03pm] » jchris 
 blames not couchdb
 [3:03pm] jchris
 but the streaming writes you've talked about would make this more resilent, 
 eh?
 [3:03pm] jchris
 where the header is also appended each time
 [3:04pm] jchris
 there could be data lost but the db would still be usable
 [3:04pm] damienkatz
 yes, a file truncation just  gives you and earlier version of the file
 [3:05pm] jchris
 now's not a good time for me to work on that, but after Amsterdam I may want 
 to pick it up
 [3:05pm] damienkatz
 the hardest part is finding the header again
 [3:06pm] jan
 hu? isn't the header the firs 4k?
 [3:06pm] jan
 t
 [3:06pm] jchris
 it would only really change couch_file:read_header and write_header I think
 [3:06pm] jchris
 jan: we're talking about moving it to the end
 [3:06pm] jchris
 so it never gets overwritten
 [3:06pm] damienkatz
 jan: this is for tail append headers
 [3:06pm] jan
 duh
 [3:06pm] jan
 futuretalk
 [3:06pm] jan
 n/m me
 [3:07pm] damienkatz
 jchris: so one way is to sign the header regions, but you need to make it 
 unforgable.
 [3:08pm] jchris
 basically a boundary problem...
 [3:08pm] damienkatz
 because if a client wrote a binary that looked like it had a header, they 
 could do bad things.
 [3:08pm] jchris
 like for instance an attachment that's a .couch file :)
 [3:08pm] damienkatz
 right
 [3:09pm] damienkatz
 so you can salt the db file on creation with a key in the header. And use 
 that key to sign and verify headers.
 [3:09pm]
 tlrobinson joined the chat room.
 [3:09pm] jchris
 doesn't sound too tough
 [3:10pm] jan
 damienkatz: I looked into adding conflict inducing bulk docs in rep_security. 
 would this work: POST /db/_bulk_docs?allow_conflicts=true could do a regular 
 bulk save but grab the error responses and do a update_docs() call with the 
 replicated_changes option for all errors from the first bulk save while 
 assigning new _revs for new docs?
 [3:10pm] damienkatz
 the key is crypto-random, and must stay hidden from clients.
 [3:10pm] jchris
 if you have the file, you could forge headers...
 [3:10pm] jchris
 but under normal operation, it sounds like not a big deal
 [3:10pm]
 Qaexl joined the chat room.
 [3:11pm] jchris
 so we just give the db an internal secret uuid
 [3:11pm]
 mmalone left the chat room. (Connection reset by peer)
 [3:11pm]
 peritus_ joined the chat room.
 [3:11pm] damienkatz
 I'm not sure I like this approach.
 [3:11pm] jchris
 damienkatz: drawbacks?
 [3:11pm] damienkatz
 if a client can see a file share with the db, they can attack it.
 [3:12pm]
 mmalone joined the chat room.
 [3:12pm]
 mmalone left the chat room. (Read error: 104 (Connection reset by peer))
 [3:12pm] damienkatz
 how about this approach. every 4k, we write a NULL byte.
 [3:13pm] damienkatz
 we always write headers at the 4k boundary
 [3:13pm]
 mmalone joined the chat room.
 [3:13pm] damienkatz
 and make that byte 1
 [3:13pm] jan
 grr
 [3:13pm] jan
 did my bulk-docs proposal get through?
 [3:13pm] jchris
 the attacker could still get lucky
 [3:13pm] jan
 (or got it shot down? :)
 [3:13pm] damienkatz
 jan: sorry.
 [3:13pm] jan
 damienkatz: I couldn't read the backlog
 [3:13pm] damienkatz
 Let me think about the conflict stuff a little bit.
 [3:13pm] jan
 sure
 [3:13pm] jan
 no baclog then
 [3:14pm] jan
 +k
 [3:14pm] jchris
 jan: your paragraph is dense there -
 [3:14pm] damienkatz
 jchris: no, this is immune from attack
 [3:14pm] jchris
 because you'd write an attachment marker after the null byte for attachments?
 [3:14pm] damienkatz
 every 4k, we just write a 0 byte, we skip that byte.
 [3:15pm] jan
 jchris: yeah, sorry, will let you finish the file stuff
 [3:15pm] damienkatz
 no matter what, we never write anything into that byte.
 [3:15pm] jan
 wasting all these 0 bytes
 [3:15pm] damienkatz
 a big file right whil write all the surrounding bytes, but not 

[jira] Closed: (COUCHDB-391) Restoring the couchDB (restore Documents Views on different Servers)

2009-06-23 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-391.
---

Resolution: Invalid

Please ask questions in the appropriate CouchDB mailing list, not in the bug 
system:
http://couchdb.apache.org/community/lists.html

 Restoring the couchDB (restore Documents Views on different Servers)
 

 Key: COUCHDB-391
 URL: https://issues.apache.org/jira/browse/COUCHDB-391
 Project: CouchDB
  Issue Type: Question
 Environment: Microsoft Windows XP Professional version 2002
 service Pack2
 erlang : 5.6.5
 couchDb : 0.9.0
Reporter: Ajay jagdish Pawaskar

 i want to restore the CouchDB Documents  Views from One server to another 
 (on clients server)
 there is option for replication (remote server)..but i don't have access to 
 that server...(so i can't Replicate DB on that server using Replicator)
 ..but i need to deploy the Db on that server
 .. also in future maybe i will be needing the documents from client server 
 for debugging purpose
 how i can do this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-377) allow native view servers

2009-06-22 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722757#action_12722757
 ] 

Damien Katz commented on COUCHDB-377:
-

I think we should not try to avoid the JSON term conversion, as it's the most 
stable API and it's a low cost conversion, everything is still Erlang terms. 
It's the serialization to a string and back that's expensive.

I think we should go ahead and change the get os_process call to return a 
record, instead of a tuple, something like:

-record(proc, {
pid,
lang,
type,
prompt_fun
}}.

The prompt_fun replaces and wraps couch_os_process:prompt and now we call the 
prompt_fun with the same args. If it's an OS process, it's the same call to 
couch_os_process:prompt, otherwise it's a function that calls apply(Mod, Fun, 
[Pid, Args]) or something like that.

The patch otherwise looks good, I don't see any obvious bugs or style problems.

 allow native view servers
 -

 Key: COUCHDB-377
 URL: https://issues.apache.org/jira/browse/COUCHDB-377
 Project: CouchDB
  Issue Type: Improvement
Reporter: Mark Hammond
 Attachments: native_query_servers.patch


 There has been some discussion on IRC etc about how to support 'native' view 
 servers, such as 'erlview' in a generic way.  Currently using erlview 
 requires you to modify couch.
 I'm attaching a patch as a first attempt at supporting this.  In summary, the 
 patch now looks up a new 'native_query_servers' config file section for a 
 list of view_server names with a {Module, Func, Args} style string specifying 
 the entry-point of the view server.  The code now passes an additional atom 
 around indicating if the PID is 'native' or 'external', and map_docs takes 
 advantage of this to avoid the json step.  This patch allows erlview to work 
 for me, but in theory any erlang code could be used here.
 I'm very new at erlang - please let me know if I should make stylistic or 
 other changes, or indeed if I should take a different approach completely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-263) require valid user for all database operations

2009-06-18 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721425#action_12721425
 ] 

Damien Katz commented on COUCHDB-263:
-

This patch looks okay, but we actually need something like this at the database 
level, the ability to say who can and can't access a database, and the ability 
to disallow anonymous access.

 require valid user for all database operations
 --

 Key: COUCHDB-263
 URL: https://issues.apache.org/jira/browse/COUCHDB-263
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.9
 Environment: All platforms.
Reporter: Jack Moffitt
Priority: Minor
 Attachments: couchauth.diff


 Admin accounts currently restrict a few operations, but leave all other 
 operations completely open.  Many use cases will require all operations to be 
 authenticated.   This can certainly be done by overriding the 
 default_authentication_handler, but I think this very common use case can be 
 handled in default_authentication_handler without increasing the complexity 
 much.
 Attached is a patch which adds a new config option, require_valid_user, 
 which restricts all operations to authenticated users only.  Since CouchDB 
 currently only has admins, this means that all operations are restricted to 
 admins.  In a future CouchDB where there are also normal users, the intention 
 is that this would let them pass through as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-263) require valid user for all database operations

2009-06-18 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721435#action_12721435
 ] 

Damien Katz commented on COUCHDB-263:
-

hmmm, on second thought, we do need this both as a server wide setting and at 
the database level.

However, this check and throwing exceptions for authenticated users should not 
be done in the authentication function, but by the caller of the auth function, 
so the setting works with all auth handlers.

Also, it would be nice to have a more complete solution with more settings: 
allowed users, disallowed users and allow anonymous

 require valid user for all database operations
 --

 Key: COUCHDB-263
 URL: https://issues.apache.org/jira/browse/COUCHDB-263
 Project: CouchDB
  Issue Type: Improvement
  Components: HTTP Interface
Affects Versions: 0.9
 Environment: All platforms.
Reporter: Jack Moffitt
Priority: Minor
 Attachments: couchauth.diff


 Admin accounts currently restrict a few operations, but leave all other 
 operations completely open.  Many use cases will require all operations to be 
 authenticated.   This can certainly be done by overriding the 
 default_authentication_handler, but I think this very common use case can be 
 handled in default_authentication_handler without increasing the complexity 
 much.
 Attached is a patch which adds a new config option, require_valid_user, 
 which restricts all operations to authenticated users only.  Since CouchDB 
 currently only has admins, this means that all operations are restricted to 
 admins.  In a future CouchDB where there are also normal users, the intention 
 is that this would let them pass through as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-204) CouchDB stops/crashes/hangs (?) after resume from Mac OS X system hibernation and/or stand-by (sleep)

2009-06-01 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715223#action_12715223
 ] 

Damien Katz commented on COUCHDB-204:
-

It looks like all we need is a special flag passed to the emulator 
http://www.erlang.org/doc/man/erl.html:

+c
Disable compensation for sudden changes of system time.
Normally, erlang:now/0 will not immediately reflect sudden changes in the 
system time, in order to keep timers (including receive-after) working. 
Instead, the time maintained by erlang:now/0 is slowly adjusted towards the new 
system time. (Slowly means in one percent adjustments; if the time is off by 
one minute, the time will be adjusted in 100 minutes.)
When the +c option is given, this slow adjustment will not take place. Instead 
erlang:now/0 will always reflect the current system time. Note that timers are 
based on erlang:now/0. If the system time jumps, timers then time out at the 
wrong time.


 CouchDB stops/crashes/hangs (?) after resume from Mac OS X system hibernation 
 and/or stand-by (sleep)
 ---

 Key: COUCHDB-204
 URL: https://issues.apache.org/jira/browse/COUCHDB-204
 Project: CouchDB
  Issue Type: Bug
  Components: Administration Console, Database Core, HTTP Interface, 
 Infrastructure
Affects Versions: 0.8.1
 Environment: Mac OS X 10.5.6 Leopard
Reporter: Philipp Schumann
Priority: Critical

 I'm running CouchDB 0.8.1 on Mac OS X 10.5.6 Leopard and after resuming 
 from system hibernation (safe sleep -- by closing and reopening the laptop 
 lid in my case, which is the factory default), the process either refuses all 
 incoming connections, including my own Python scripts, web browser and the 
 Futon, or has stopped running altogether. That is, I don't know which exactly 
 is the case here but the fact is that CouchDB cannot be connected to after 
 resuming.
 This issue always appears with smart sleep / safe sleep (standby plus 
 hibernation) but only sometimes appears using fast sleep (hibernation 
 turned off, standby only).
 This isn't a critical issue for server deployments, of course, but one of 
 the core ideas of CouchDB is that eventually it will be deployed even to 
 desktop clients for app  data replication across machines, so in this 
 context this *is* a critical issue since you can't ask ordinary Mac OS X 
 users to change their sleep settings from safe to fast using 
 uncomprehensable terminal commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-370) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed

2009-05-30 Thread Damien Katz (JIRA)
If the CouchDB vm is dies or is killed, view subprocesses (js) are not 
automatically killed
---

 Key: COUCHDB-370
 URL: https://issues.apache.org/jira/browse/COUCHDB-370
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Damien Katz
Priority: Minor


If CouchDB dies or is killed, it's subprocess are not forcefully killed. If the 
subprocesses are in infinite loops, they will never die. We need some kind of 
external watchdog process that kill the subprocess automatically if the CouchDB 
erlang vm dies.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-371) Need a way to limit memory used by a subprocess

2009-05-30 Thread Damien Katz (JIRA)
Need a way to limit memory used by a subprocess
---

 Key: COUCHDB-371
 URL: https://issues.apache.org/jira/browse/COUCHDB-371
 Project: CouchDB
  Issue Type: Improvement
  Components: JavaScript View Server
Affects Versions: 0.9
Reporter: Damien Katz
Assignee: Damien Katz
Priority: Minor


We need a way to limit the total memory used by sub processes, such as a view 
process, so they can not use up all the available memory due to coding error or 
malicious attack. We can probably do this by setting ulimit in the 
couchspawnkillable script.

-Damien

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-370) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed

2009-05-30 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-370:


Component/s: JavaScript View Server
Description: If CouchDB dies or is killed, it's subprocess are not 
forcefully killed. If the subprocesses are in infinite loops, they will never 
die. We need some kind of external watchdog process, or processes that kill the 
subprocess automatically if the CouchDB erlang vm dies.  (was: If CouchDB dies 
or is killed, it's subprocess are not forcefully killed. If the subprocesses 
are in infinite loops, they will never die. We need some kind of external 
watchdog process that kill the subprocess automatically if the CouchDB erlang 
vm dies.)

 If the CouchDB vm is dies or is killed, view subprocesses (js) are not 
 automatically killed
 ---

 Key: COUCHDB-370
 URL: https://issues.apache.org/jira/browse/COUCHDB-370
 Project: CouchDB
  Issue Type: Bug
  Components: JavaScript View Server
Affects Versions: 0.9
Reporter: Damien Katz
Priority: Minor

 If CouchDB dies or is killed, it's subprocess are not forcefully killed. If 
 the subprocesses are in infinite loops, they will never die. We need some 
 kind of external watchdog process, or processes that kill the subprocess 
 automatically if the CouchDB erlang vm dies.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-366) Error Uploading Attachment

2009-05-29 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-366.
---

   Resolution: Fixed
Fix Version/s: 0.10

 Error Uploading Attachment
 --

 Key: COUCHDB-366
 URL: https://issues.apache.org/jira/browse/COUCHDB-366
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.10
 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803
Reporter: Ben Browning
 Fix For: 0.10

 Attachments: attachment_traceback.txt, bespin.zip, 
 couchdb-366-test.patch


 20:21 davisp damienkatz: uploading a large attachment ends up
throwing a function_clause error on split_iolist and
the parameter types are binary(), int(), [binary()]
 Traceback attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-337) attachments from old/conflict revisions are not accessible via standalone API

2009-04-29 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-337:


Attachment: replication_test.diff

Here is a replication test to show the failures of replicating conflicts with 
attachments, which doesn't yet pass with attachment_revisions.diff

 attachments from old/conflict revisions are not accessible via standalone API
 -

 Key: COUCHDB-337
 URL: https://issues.apache.org/jira/browse/COUCHDB-337
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Adam Kocoloski
 Fix For: 0.10

 Attachments: attachment_revisions.diff, replication_test.diff


 Couch ignores rev qs parameter for attachment GETs.  I believe it should not. 
  Attaching proposed patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-334) With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes

2009-04-27 Thread Damien Katz (JIRA)
With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes
---

 Key: COUCHDB-334
 URL: https://issues.apache.org/jira/browse/COUCHDB-334
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9
Reporter: Damien Katz
Assignee: Damien Katz
 Fix For: 0.9.1


By default, CouchDB keeps a maximum of 100 databases open and active. This is 
controlled by the ini setting max_dbs_open in [couchdb] .

This limit controls the number of Erlang server processes that are readily 
available and hold resources, like file handles, and hold state for deferred 
commits. Once CouchDB hits the open database limit, it will always close an 
idle database and files before opening a new database file. The problem is that 
CouchDB would consider instances to be idle even if they still had deferred 
commits pending. It would then close the instance and drop it's deferred 
commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-334) With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes

2009-04-27 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-334:


Fix Version/s: (was: 0.10)
   0.9.1

 With deferred commits and 100+ active dbs, CouchDB can lose uncommitted 
 changes
 ---

 Key: COUCHDB-334
 URL: https://issues.apache.org/jira/browse/COUCHDB-334
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9
Reporter: Damien Katz
Assignee: Damien Katz
 Fix For: 0.9.1


 By default, CouchDB keeps a maximum of 100 databases open and active. This is 
 controlled by the ini setting max_dbs_open in [couchdb] .
 This limit controls the number of Erlang server processes that are readily 
 available and hold resources, like file handles, and hold state for deferred 
 commits. Once CouchDB hits the open database limit, it will always close an 
 idle database and files before opening a new database file. The problem is 
 that CouchDB would consider instances to be idle even if they still had 
 deferred commits pending. It would then close the instance and drop it's 
 deferred commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-240) Replication breaks with large Attachments.

2009-04-15 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699333#action_12699333
 ] 

Damien Katz commented on COUCHDB-240:
-

Adam, I think you are right. If the fix isn't too hairy, we should also add it 
to 0.9.1.

 Replication breaks with large Attachments.
 --

 Key: COUCHDB-240
 URL: https://issues.apache.org/jira/browse/COUCHDB-240
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9
 Environment: r 741265. Debian Linux unknown revision, FreeBSD 7.0. 
 GBit Network connection between the hosts.
Reporter: Maximillian Dornseif
Assignee: Adam Kocoloski
 Fix For: 0.10


 I use the code in http://code.google.com/p/couchdb-python/issues/detail?id=54 
 to do replication between two machines.
 I'm running 741265 on both machines. I have a Database with big attachments 
 (high-res images, 31.1 GB,34026 Docs). Pull replication breaks with 
 following message sent via http:
 couchdb.client.ServerError: (500, ('function_clause', 
 [{lists,map,[#Funcouch_rep.10.28922857,ok]},\n 
 {couch_rep,open_doc_revs,4},\n 
 {couch_rep,'-enum_docs_parallel/3-fun-1-',3},\n 
 {couch_rep,'-spawn_worker/3-fun-0-',3}]))
 With push replication the server just drops the connection  
 (httplib2/__init__.py, line 715, in connect
 socket.error: (61, 'Connection refused') - why refused instead of 
 closed?). I have only been able to replicate the first 100 documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-220) Extreme sparseness in couch files

2009-04-09 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-220.
---

   Resolution: Fixed
Fix Version/s: 0.10

 Extreme sparseness in couch files
 -

 Key: COUCHDB-220
 URL: https://issues.apache.org/jira/browse/COUCHDB-220
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9
 Environment: ubuntu 8.10 64-bit, ext3
Reporter: Robert Newson
 Fix For: 0.10

 Attachments: 220.patch, 220.patch, attachment_sparseness.js, 
 stream.diff


 When adding ten thousand documents, each with a small attachment, the 
 discrepancy between reported file size and actual file size becomes huge;
 ls -lh shard0.couch
 698M 2009-01-23 13:42 shard0.couch
 du -sh shard0.couch
 57M   shard0.couch
 On filesystems that do not support write holes, this will cause an order of 
 magnitude more I/O.
 I think it was introduced by the streaming attachment patch as each 
 attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
 Compacting this database reduced it to 7.8mb, indicating other sparseness 
 besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-290) Include sequence number in update notifications

2009-04-03 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695494#action_12695494
 ] 

Damien Katz commented on COUCHDB-290:
-

I'm actually working on this HTTP functionality right now. Then not only child 
processes, but remote processes will be able to easily register for 
notifications over HTTP, and we can present a richer interface.

Once in place the std io notifications would be removed completely and we could 
then use std io for logging and error reporting of the child process.

 Include sequence number in update notifications
 ---

 Key: COUCHDB-290
 URL: https://issues.apache.org/jira/browse/COUCHDB-290
 Project: CouchDB
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Elliot Murphy
Priority: Minor
 Fix For: 0.10

 Attachments: couchdb-sequences.patch, couchdb-sequences.patch


 Hi! There's been requests to include the sequence number when sending an 
 update notification.  Thanks to the guidance from davisp on #couchdb on March 
 13th, I've been able to put together a little patch that does just that. In 
 the future I'm interested in doing the same for the create notification, and 
 perhaps extending create/delete/update notifications to include a list of 
 affected doc IDs.
 For now though, just this simple patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-300) Update Sequence broken

2009-03-21 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688028#action_12688028
 ] 

Damien Katz commented on COUCHDB-300:
-

Sven, I think you missed the fix checked in for this bug? The fix prevents 
databases from getting into this state.

But if you already have a database in this state, you can fix it by touching 
all the docs (or just the affected one). It's a one time thing.

 Update Sequence broken
 --

 Key: COUCHDB-300
 URL: https://issues.apache.org/jira/browse/COUCHDB-300
 Project: CouchDB
  Issue Type: Bug
 Environment: ubuntu hardy
Reporter: Sven Helmberger
 Fix For: 0.9

 Attachments: all_docs_by_seq.js, update_seq_kaputt.js


 Database gets into a state where there is one document but an empty update 
 sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (COUCHDB-221) Test that validation and authorization work properly with replicated edits.

2009-03-16 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz resolved COUCHDB-221.
-

Resolution: Fixed
  Assignee: Damien Katz

Fixed in trunk as of r753448.

 Test that validation and authorization work properly with replicated edits.
 ---

 Key: COUCHDB-221
 URL: https://issues.apache.org/jira/browse/COUCHDB-221
 Project: CouchDB
  Issue Type: Test
  Components: Test Suite
Reporter: Dean Landolt
Assignee: Damien Katz
Priority: Blocker
 Fix For: 0.9


 Test that the validation and authorization stuff work properly with 
 replicated edits, the same as it does with live edits. This should already 
 work, but its not tested. Also there is a good chance 
 validation/authorization failures might not be handled gracefully by the 
 replicator. It should eat failures, keeping statistics about the failures and 
 maybe a record of the last failure, or last N failures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-275) couch crashes erlang vm under heavy load

2009-03-10 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz updated COUCHDB-275:


Attachment: term_to_binary_fix.diff

This is patch to the Erlang vm for this crash. It fixes a problem with Erlang's 
term_to_binary code where it blows the C stack on deeply nested terms (ie deep 
trees). Example, this will crash any unpatched erlang VM:

 term_to_binary(lists:foldl(fun(E,A) - [E, A] end, [], lists:seq(1, 10))).


This patch fixes the Erlang vm by changing the the term_to_binary code from a 
recursive C implementation to one using it's own stack.

 couch crashes erlang vm under heavy load
 

 Key: COUCHDB-275
 URL: https://issues.apache.org/jira/browse/COUCHDB-275
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 0.9
 Environment: Linux melkjug.com 2.6.23-gentoo-r8 #1 SMP Wed Feb 13 
 14:28:49 EST 2008 x86_64 QEMU Virtual CPU version 0.9.1 GenuineIntel GNU/Linux
Reporter: Joshua Bronson
 Attachments: 2009-03-05-couch.log.snippet, term_to_binary_fix.diff


 I clicked Compact in futon for my 11G database at 9:04 AM EST:
 [Mon, 02 Mar 2009 14:04:32 GMT] [info] [0.59.0] Starting compaction for db 
 melkjug
 An hour and a half later it was 85% finished and then the following was 
 output to stderr:
 heart: Mon Mar  2 10:33:20 2009: heart-beat time-out.
 /usr/bin/couchdb: line 255: echo: write error: Broken pipe
 heart: Mon Mar  2 10:33:22 2009: Executed /usr/bin/couchdb -k. Terminating.
 I am retaining my 4.3G melkjug.couch.compact file in case it's useful in 
 debugging this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-190) _uuid should respond to GET, not POST

2009-02-11 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672666#action_12672666
 ] 

Damien Katz commented on COUCHDB-190:
-

Patch merged to trunk.

 _uuid should respond to GET, not POST
 -

 Key: COUCHDB-190
 URL: https://issues.apache.org/jira/browse/COUCHDB-190
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
Affects Versions: 0.9
Reporter: Matt Goodall
Priority: Blocker
 Fix For: 0.9

 Attachments: COUCH-190.diff


 The /_uuid resource can happily return a response to a GET without being 
 unresty. In fact, supporting POST is probably incorrect as it implies it 
 would change server state.
 Quick summary:
 * _uuid never changes server state
 * calling _uuid multiple times does not impact other clients
 * that the resource returns something different each time it is requested 
 does not mean it cannot be a POST
 * GET with proper cache control (i.e. don't cache it ever) will work equally 
 well
 Full discussion can be found on the user m.l., 
 http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c21939021.1440421230910477169.javamail.serv...@perfora%3e.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (COUCHDB-238) should throw error on creating docs with illegal private names

2009-02-11 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz resolved COUCHDB-238.
-

Resolution: Fixed

Fixes checked into trunk.

 should throw error on creating docs with illegal private names
 --

 Key: COUCHDB-238
 URL: https://issues.apache.org/jira/browse/COUCHDB-238
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Chris Anderson
Priority: Blocker
 Fix For: 0.9

 Attachments: COUCHDB-238.patch


 currently the only legal _ prefixes are _local and _design. We should test 
 for this and return http errors. The applies to PUT and bulk-docs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-247) The log process should be started before any other process

2009-02-11 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672733#action_12672733
 ] 

Damien Katz commented on COUCHDB-247:
-

Changing the logging implementation to something more standardized is a very 
good thing. Right now the log output format is just come random stuff I threw 
together.

 The log process should be started before any other process
 --

 Key: COUCHDB-247
 URL: https://issues.apache.org/jira/browse/COUCHDB-247
 Project: CouchDB
  Issue Type: Improvement
  Components: Database Core
 Environment: Any
Reporter: Ulises Cervino Beresi
Priority: Minor

 Processes should be able to log their operations from the very beginning of 
 their existence to avoid io:format() in case they need to do so. Only 
 processes started after the log process will be able to make use of ?LOG_X. 
 See issue 153 (https://issues.apache.org/jira/browse/COUCHDB-153) for a case 
 scenario where it would be desirable for the log process to have been started 
 before the db process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-215) errors when creating and deleting multiple databases

2009-01-19 Thread Damien Katz (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Katz closed COUCHDB-215.
---

   Resolution: Fixed
Fix Version/s: 0.9

 errors when creating and deleting multiple databases
 

 Key: COUCHDB-215
 URL: https://issues.apache.org/jira/browse/COUCHDB-215
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
 Environment: OS X 10.4 Erlang latest, CouchDB trunk
Reporter: Bob Dionne
 Fix For: 0.9


 creating multiple databases and then deleting them causes couchdb to start 
 throwing exceptions ( http://gist.github.com/49063 ) A cursory debugging 
 session indicates that should_close in couch_file never returns true, the 
 monitors list always contains the pid of the next process. Repeated use of 
 the server in this way eventually makes it unusable.
 The following JS, http://gist.github.com/48465, will also exhibit the issue 
 using the Futon tests. It can appear with as few as 60 dbs depending on the 
 client and level of concurrency
 The database names do have slashes in them but this seems irrelevant on the 
 face of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-197) Replication renders CouchDB unresponsive.

2009-01-16 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12664731#action_12664731
 ] 

Damien Katz commented on COUCHDB-197:
-

Fyi, I just looked into this issue briefly and the no_scheme errors are from 
the inets http client and from attempting operations without the http:|https: 
portion of the uri.


 Replication renders CouchDB unresponsive.
 -

 Key: COUCHDB-197
 URL: https://issues.apache.org/jira/browse/COUCHDB-197
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Reporter: Maximillian Dornseif

 I am quite sure this is not the same issue as in COUCHDB-193.
 Im trying to replicte a somewhat big database 
 {doc_count:541394,doc_del_count:265692,update_seq:2118390,purge_seq:0,compact_running:false,disk_size:16552608803}
  to an other machine. 
 I started replication with this:
 send: 'POST /_replicate HTTP/1.1\r\nHost: 
 couchdb1.local.xxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 
 90\r\ncontent-type: application/json\r\naccept: 
 application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n'
 send: '{source: hulog_events, target: 
 http://couchdb2.local.xxx:5984/hulog_events}'
 reply: ''
 connect: (couchdb1.local.hudora.biz, 5984)
 send: 'POST /_replicate HTTP/1.1\r\nHost: 
 couchdb1.local.:5984\r\nAccept-Encoding: identity\r\ncontent-length: 
 90\r\ncontent-type: application/json\r\naccept: 
 application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n'
 send: '{source: hulog_events, target: 
 http://couchdb2.local.:5984/hulog_events}'
 (no reply so far)
 On the source server (couchdb1) I see following logentries:
 Mon, 05 Jan 2009 19:34:21 GMT] [info] [0.12745.45] 192.168.0.30 - - 'POST' 
 /_replicate 200
 [Mon, 05 Jan 2009 19:35:36 GMT] [info] [0.107.0] Compaction for db 
 hulog_events_test completed.
 [Mon, 05 Jan 2009 19:35:45 GMT] [info] [0.12746.45] 127.0.0.1 - - 'GET' 
 /hulog_events/ 200
 [Mon, 05 Jan 2009 19:35:46 GMT] [info] [0.95.0] Compaction for db eap 
 completed.
 [Mon, 05 Jan 2009 19:42:17 GMT] [error] [0.12765.45] ** Generic server 
 0.12765.45 terminating 
 ** Last message in was {'EXIT',0.12762.45,
 {timeout,
  {gen_server,call,
   [0.12768.45,
{write,
 0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2,
   109,0,0,0,7,112,114,111,100,117,99,116,109,
   0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0,
   0,0,11,116,114,97,110,115,97,99,116,105,111,
   110,109,0,0,0,8,114,101,116,114,105,101,118,
   101,104,2,109,0,0,0,4,116,121,112,101,109,0,
   0,0,4,117,110,105,116,104,2,109,0,0,0,11,97,
   114,99,104,105,118,101,100,95,97,116,109,0,
   0,0,22,50,48,48,56,48,50,50,50,84,49,50,49,
   52,48,53,46,53,50,54,51,56,52,104,2,109,0,0,
   0,10,99,114,101,97,116,101,100,95,97,116,
   109,0,0,0,22,50,48,48,55,49,49,50,56,84,49,
   53,52,50,48,54,46,51,52,52,54,49,56,104,2,
   109,0,0,0,4,112,114,111,112,104,1,108,0,0,0,
   2,104,2,109,0,0,0,8,108,111,99,97,116,105,
   111,110,109,0,0,0,6,65,85,83,76,65,71,104,2,
   109,0,0,0,6,104,101,105,103,104,116,98,0,0,
   7,158,106,104,2,109,0,0,0,3,109,117,105,109,
   0,0,0,18,51,52,48,48,53,57,57,56,49,48,48,
   48,48,51,49,50,53,50,104,2,109,0,0,0,8,113,
   117,97,110,116,105,116,121,97,11,106,106}]}}}
 ** When Server state == {file_descriptor,prim_file,{#Port0.904761,24}}
 ** Reason for termination == 
 ** {timeout,{gen_server,call,
 [0.12768.45,
  {write,0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,
   2,109,0,0,0,7,112,114,111,100,117,99,116,
   109,0,0,0,8,54,53,49,52,48,47,69,75,104,
   2,109,0,0,0,11,116,114,97,110,115,97,99,
   116,105,111,110,109,0,0,0,8,114,101,116,
   114,105,101,118,101,104,2,109,0,0,0,4,
   116,121,112,101,109,0,0,0,4,117,110,105,
   116,104,2,109,0,0,0,11,97,114,99,104,105,
   118,101,100,95,97,116,109,0,0,0,22,50,48,