[jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089512#comment-13089512 ] Damien Katz commented on COUCHDB-1153: -- Robert, Benoit, your issues can still be addressed. You can submit patches that improve upon Filipes work. But telling Filipe to code the patch your way, without code is not how this community works. Filipe's work is a feature people care about, and any objections of correctness have been addressed. Switching the code to an evented model, or any other improvements is welcome from you or any other community member, but users want this feature, and Filipe should not be expected to code it up to everyone else expectations before any check-in can occur. Improvement can, and should, happen continuously. Database and view index compaction daemon - Key: COUCHDB-1153 URL: https://issues.apache.org/jira/browse/COUCHDB-1153 Project: CouchDB Issue Type: New Feature Environment: trunk Reporter: Filipe Manana Assignee: Filipe Manana Priority: Minor Labels: compaction I've recently written an Erlang process to automatically compact databases and they're views based on some configurable parameters. These parameters can be global or per database and are: minimum database fragmentation, minimum view fragmentation, allowed period and strict_window (whether an ongoing compaction should be canceled if it doesn't finish within the allowed period). These fragmentation values are based on the recently added data_size parameter to the database and view group information URIs (COUCHDB-1132). I've documented the .ini configuration, as a comment in default.ini, which I paste here: [compaction_daemon] ; The delay, in seconds, between each check for which database and view indexes ; need to be compacted. check_interval = 60 ; If a database or view index file is smaller then this value (in bytes), ; compaction will not happen. Very small files always have a very high ; fragmentation therefore it's not worth to compact them. min_file_size = 131072 [compactions] ; List of compaction rules for the compaction daemon. ; The daemon compacts databases and they're respective view groups when all the ; condition parameters are satisfied. Configuration can be per database or ; global, and it has the following format: ; ; database_name = parameter=value [, parameter=value]* ; _default = parameter=value [, parameter=value]* ; ; Possible parameters: ; ; * db_fragmentation - If the ratio (as an integer percentage), of the amount ; of old data (and its supporting metadata) over the database ; file size is equal to or greater then this value, this ; database compaction condition is satisfied. ; This value is computed as: ; ; (file_size - data_size) / file_size * 100 ; ; The data_size and file_size values can be obtained when ; querying a database's information URI (GET /dbname/). ; ; * view_fragmentation - If the ratio (as an integer percentage), of the amount ;of old data (and its supporting metadata) over the view ;index (view group) file size is equal to or greater then ;this value, then this view index compaction condition is ;satisfied. This value is computed as: ; ;(file_size - data_size) / file_size * 100 ; ;The data_size and file_size values can be obtained when ;querying a view group's information URI ;(GET /dbname/_design/groupname/_info). ; ; * period - The period for which a database (and its view groups) compaction ;is allowed. This value must obey the following format: ; ;HH:MM - HH:MM (HH in [0..23], MM in [0..59]) ; ; * strict_window - If a compaction is still running after the end of the allowed ; period, it will be canceled if this parameter is set to yes. ; It defaults to no and it's meaningful only if the *period* ; parameter is also specified. ; ; * parallel_view_compaction - If set to yes, the database and its views are ; compacted in parallel. This is only useful on ; certain setups, like for example when the database ; and view index directories point to different ; disks. It defaults to no. ; ; Before a compaction is triggered, an estimation of how much free disk space is
[jira] [Commented] (COUCHDB-1256) Incremental requests to _changes can skip revisions
[ https://issues.apache.org/jira/browse/COUCHDB-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089755#comment-13089755 ] Damien Katz commented on COUCHDB-1256: -- I agree with the fix adam proposes. The code in question is an optimization to prevent the sending/checking of documents we've already examined, but with checkpointing it breaks. Removal of the code is the right fix for now. In the future, we can add the optimization back if the check-pointing can keep note of completed replications vs. checkpointed. Checkpointed records would keep a high water mark of the last completed replication, and the seq num and that high mark for completed replication would both be sent to the _changes handler. The _changes would not send docs with a seq below the checkpoint value. When the replication checkpoints, it saves the current seq and the last high water mark complete. When replication completes. it sets the last seq and high water mark to the same seq, and that is gets sent for the next replication. Also, continuous replication would need a way to signal when a replication is complete as well, so that the high water mark can be set there as well. Incremental requests to _changes can skip revisions --- Key: COUCHDB-1256 URL: https://issues.apache.org/jira/browse/COUCHDB-1256 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 0.10, 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2, 1.1, 1.0.3 Environment: confirmed on Apache CouchDB 1.1.0, bug appears to be present in 1.0.3 and trunk Reporter: Adam Kocoloski Assignee: Adam Kocoloski Priority: Blocker Fix For: 1.0.4, 1.1.1, 1.2 Attachments: jira-1256-test.diff Requests to _changes with style=all_docssince=N (requests made by the replicator) are liable to suppress revisions of a document. The following sequence of curl commands demonstrates the bug: curl -X PUT localhost:5985/revseq {ok:true} curl -X PUT -Hcontent-type:application/json localhost:5985/revseq/foo -d '{a:123}' {ok:true,id:foo,rev:1-0dc33db52a43872b6f3371cef7de0277} curl -X PUT -Hcontent-type:application/json localhost:5985/revseq/bar -d '{a:456}' {ok:true,id:bar,rev:1-cc609831f0ca66e8cd3d4c1e0d98108a} % stick a conflict revision in foo curl -X PUT -Hcontent-type:application/json localhost:5985/revseq/foo?new_edits=false -d '{_rev:1-cc609831f0ca66e8cd3d4c1e0d98108a, a:123}' {ok:true,id:foo,rev:1-cc609831f0ca66e8cd3d4c1e0d98108a} % request without since= gives the expected result curl -Hcontent-type:application/json localhost:5985/revseq/_changes?style=all_docs {results:[ {seq:2,id:bar,changes:[{rev:1-cc609831f0ca66e8cd3d4c1e0d98108a}]}, {seq:3,id:foo,changes:[{rev:1-cc609831f0ca66e8cd3d4c1e0d98108a},{rev:1-0dc33db52a43872b6f3371cef7de0277}]} ], last_seq:3} % request starting from since=2 suppresses revision 1-0dc33db52a43872b6f3371cef7de0277 of foo macbook:~ (master) $ curl localhost:5985/revseq/_changes?style=all_docs\since=2 {results:[ {seq:3,id:foo,changes:[{rev:1-cc609831f0ca66e8cd3d4c1e0d98108a}]} ], last_seq:3} I believe the fix is something like this (though we could refactor further because Style is unused): diff --git a/src/couchdb/couch_db.erl b/src/couchdb/couch_db.erl index e8705be..65aeca3 100644 --- a/src/couchdb/couch_db.erl +++ b/src/couchdb/couch_db.erl @@ -1029,19 +1029,7 @@ changes_since(Db, Style, StartSeq, Fun, Acc) - changes_since(Db, Style, StartSeq, Fun, [], Acc). changes_since(Db, Style, StartSeq, Fun, Options, Acc) - -Wrapper = fun(DocInfo, _Offset, Acc2) - -#doc_info{revs=Revs} = DocInfo, -DocInfo2 = -case Style of -main_only - -DocInfo; -all_docs - -% remove revs before the seq -DocInfo#doc_info{revs=[RevInfo || -#rev_info{seq=RevSeq}=RevInfo - Revs, StartSeq RevSeq]} -end, -Fun(DocInfo2, Acc2) -end, +Wrapper = fun(DocInfo, _Offset, Acc2) - Fun(DocInfo, Acc2) end, {ok, _LastReduction, AccOut} = couch_btree:fold(by_seq_btree(Db), Wrapper, Acc, [{start_key, StartSeq + 1}] ++ Options), {ok, AccOut}. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1243) Compact and copy feature that resets changes
[ https://issues.apache.org/jira/browse/COUCHDB-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081324#comment-13081324 ] Damien Katz commented on COUCHDB-1243: -- I mostly agree with Robert Newsom, that what you are asking for is a dangerous thing for CouchDB replication. However, there is the purge option, which forgets documents, deleted or otherwise, completely removing them from the internal indexes. Once documents are purged, compaction will will completely remove them from the file forever. Unfortunately, I couldn't find actual documentation on the purge functionality, so the best place to figure out how to use the purge is to look at the purge test in the browser test suite, which can be found here: http://svn.apache.org/viewvc/couchdb/trunk/share/www/script/test/purge.js?view=corevision=1086241content-type=text%2Fplain I've often thought a it would be useful to purge docs during compaction, by providing a user defined function to signal to remove unwanted docs/stubs. But no such thing exists, in the meantime you can accomplish it with a purge + compaction. Compact and copy feature that resets changes Key: COUCHDB-1243 URL: https://issues.apache.org/jira/browse/COUCHDB-1243 Project: CouchDB Issue Type: New Feature Components: Database Core Affects Versions: 1.0.1, 1.1 Environment: Ubuntu, but not important Reporter: Henrik Hofmeister Labels: cleanup, compaction Attachments: dump_load.php After running db and view compaction on a 70K doc db with 6+ mio. changes - it takes up 0.8 GB. If copying the same documents to a new db (get and bulk insert) - the same date with 70K changes (only the inserts) takes up 40 mb. That is a huge difference. Has been verified on 2 db's that the difference is more than 65 times the size of data. A Compact and copy feature that copies only documents, and resets the changes for at db would be very nice to try and limit the disk usage a little bit. (Our current test environment takes up nearly 100 GB... ) I've attached the dump load php script for your convenience. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (COUCHDB-1141) Docs deleted via PUT or POST do not have contents removed.
[ https://issues.apache.org/jira/browse/COUCHDB-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz resolved COUCHDB-1141. -- Resolution: Not A Problem Assignee: Damien Katz (was: Robert Newson) This is by design. Deleted documents are supposed be able to contain meta information about who deleted them, etc, because they replicate. The problem might be a documentation issue, as clients need to make sure the document body is empty when bulk deleting. Docs deleted via PUT or POST do not have contents removed. -- Key: COUCHDB-1141 URL: https://issues.apache.org/jira/browse/COUCHDB-1141 Project: CouchDB Issue Type: Bug Components: Database Core Environment: All Reporter: Wendall Cada Assignee: Damien Katz Fix For: 1.2 If a doc is deleted via -X DELETE, the resulting doc contains only id/rev_deleted:true. However, if a doc is deleted via PUT or POST, through adding _deleted:true, the entire contents of the doc remains stored. Even after compaction, the original document contents remain. This issue is causing databases with large docs to become bloated over time, as the original doc contents remain in the database even after being deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1140) fetching _local docs by revision in URL fails
[ https://issues.apache.org/jira/browse/COUCHDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024871#comment-13024871 ] Damien Katz commented on COUCHDB-1140: -- I don't consider this a bug as we don't store previous revisions of local docs like we do for regular docs (and technically, getting docs by older revision wasn't every really supposed to be a feature). The error message could probably be better here though. fetching _local docs by revision in URL fails - Key: COUCHDB-1140 URL: https://issues.apache.org/jira/browse/COUCHDB-1140 Project: CouchDB Issue Type: Bug Components: HTTP Interface Affects Versions: 1.0.2, 1.1 Reporter: Jan Lehnardt Priority: Minor Via dev@ Hi, Seems like a bug. You need to pass the current rev in the body of the document. Passing it as ?rev= does not work at all. B. On 24 April 2011 19:39, Pedro Landeiro lande...@gmail.com wrote: Already tried that but the rev argument does not accept (), returns instead: {error:unknown_error,reason:badarg} On Sun, Apr 24, 2011 at 11:16 AM, Robert Newson robert.new...@gmail.comwrote: try ?rev=0-1 B. On 23 April 2011 22:41, Pedro Landeiro lande...@gmail.com wrote: Hi, I cannot retrieve a local doc by revision. I can get the doc like this: http://127.0.0.1:5984/thisisatempdb/_local/mylocaldoc {_id:_local/mylocaldoc,_rev:0-1,name:pedro,surname:landeiro,islocal:oh yeah} but if I request the some doc with the revision: http://127.0.0.1:5984/thisisatempdb/_local/mylocaldoc?rev=0-1 {error:not_found,reason:missing} Am i doing some wrong? Thanks. -- Pedro Landeiro http://www.linkedin.com/in/pedrolandeiro -- Pedro Landeiro http://www.linkedin.com/in/pedrolandeiro -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1124) Refactor couch_btree.erl
[ https://issues.apache.org/jira/browse/COUCHDB-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020554#comment-13020554 ] Damien Katz commented on COUCHDB-1124: -- I have't looked closely at the patch, but with this module it's most important to not lose performance. One thing that jumps out at me is the cmp_keys function. I'd be sure to benchmark the the view indexing with large, complex keys, and the less comparisons will likely be happening more often and we've seen that be a performance bottleneck in the past. Refactor couch_btree.erl Key: COUCHDB-1124 URL: https://issues.apache.org/jira/browse/COUCHDB-1124 Project: CouchDB Issue Type: Improvement Reporter: Paul Joseph Davis Attachments: 0001-Refactor-couch_btree.erl.patch I've completely refactored couch_btree.erl in an attempt to make it more palatable for people that want to learn it. The current version is quite organic in its nature and this cleans up the code to be more consumable. Most everyone that's seen this patch has wanted it in trunk but I never got around to committing it. The patch I'm about to attach is quite gnarly as it's basically deleting and recreating the entire file. I find it quite a bit more helpful to read the end result which you can do at [1]. Also, if we do commit this then the code in COUCHDB-1084 will be quite broken for the btree section. If that patch still applies cleanly to the other files I'm going to try and update the btree code for it tonight. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1118) Adding a NIF based JSON decoding/encoding module
[ https://issues.apache.org/jira/browse/COUCHDB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015085#comment-13015085 ] Damien Katz commented on COUCHDB-1118: -- Looks good to me. Check it in! Adding a NIF based JSON decoding/encoding module Key: COUCHDB-1118 URL: https://issues.apache.org/jira/browse/COUCHDB-1118 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Filipe Manana Fix For: 1.2 Currently, all the Erlang based JSON encoders and decoders are very slow, and decoding and encoding JSON is something that we do basically everywhere. Via IRC, it recently discussed about adding a JSON NIF encoder/decoder. Damien also started a thread at the development mailing list about adding NIFs to trunk. The patch/branch at [1] adds such a JSON encoder/decoder. It is based on Paul Davis' eep0018 project [2]. Damien made some modifications [3] to it mostly to add support for big numbers (Paul's eep0018 limits the precision to 32/64 bits) and a few optimizations. I made a few corrections and minor enhancements on top of Damien's fork as well [4]. Finally Benoît identified some missing capabilities compared to mochijson2 (on encoding, allow atoms as strings and strings as object properties). Also, the version added in the patch at [1] uses mochijson2 when the C NIF is not loaded. Autotools configuration was adapted to compile the NIF only when we're using an OTP release = R13B04 (R13B03 NIF API is too limited and suffered many changes compared to R13B04 and R14) - therefore it should work on any OTP release R13B at least. I successfully tested this on R13B03, R13B04 and R14B02 in an Ubuntu environment. I'm not sure if it builds at all on Windows - would appreciate if someone could verify it. Also, I'm far from being good with the autotools, so I probably missed something important or I'm doing something in a not very standard way. This NIF encoder/decoder is about one order of magnitude faster compared to mochijson2 and other Erlang-only solutions such as jsx. A read and writes test with relaximation shows this has a very positive impact, specially on reads (the EJSON encoding is more expensive than JSON decoding) - http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef634052381 @Paul, since this is based on your eep0018 effort, do you think any other missing files should be added (README, etap tests, etc)? Also, should we put somewhere a note this is based on your project? [1] - https://github.com/fdmanana/couchdb/compare/json_nif [2] - https://github.com/davisp/eep0018 [3] - https://github.com/Damienkatz/eep0018/commits/master [4] - https://github.com/fdmanana/eep0018/commits/final_damien -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-1092) Storing documents bodies as raw JSON binaries instead of serialized JSON terms
[ https://issues.apache.org/jira/browse/COUCHDB-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007262#comment-13007262 ] Damien Katz commented on COUCHDB-1092: -- Great work Filipe! The size win alone is enough to make this patch compelling. I think most of the perf gain is coming from keeping the json in binary format, so that as they get passed around from erlang process to process, only pointers to the doc bodies are copied, not the actual complex json terms themselves. The dramatic gains in the indexer is evidence of this, as the pipelined processer passes documents and views rows from process to process. This new work makes that much more efficient. Storing documents bodies as raw JSON binaries instead of serialized JSON terms -- Key: COUCHDB-1092 URL: https://issues.apache.org/jira/browse/COUCHDB-1092 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Filipe Manana Assignee: Filipe Manana Currently we store documents as Erlang serialized (via the term_to_binary/1 BIF) EJSON. The proposed patch changes the database file format so that instead of storing serialized EJSON document bodies, it stores raw JSON binaries. The github branch is at: https://github.com/fdmanana/couchdb/tree/raw_json_docs Advantages: * what we write to disk is much smaller - a raw JSON binary can easily get up to 50% smaller (at least according to the tests I did) * when serving documents to a client we no longer need to JSON encode the document body read from the disk - this applies to individual document requests, view queries with ?include_docs=true, pull and push replications, and possibly other use cases. We just grab its body and prepend the _id, _rev and all the necessary metadata fields (this is via simple Erlang binary operations) * we avoid the EJSON term copying between request handlers and the db updater processes, between the work queues and the view updater process, between replicator processes, etc * before sending a document to the JavaScript view server, we no longer need to convert it from EJSON to JSON The changes done to the document write workflow are minimalist - after JSON decoding the document's JSON into EJSON and removing the metadata top level fields (_id, _rev, etc), it JSON encodes the resulting EJSON body into a binary - this consumes CPU of course but it brings 2 advantages: 1) we avoid the EJSON copy between the request process and the database updater process - for any realistic document size (4kb or more) this can be very expensive, specially when there are many nested structures (lists inside objects inside lists, etc) 2) before writing anything to the file, we do a term_to_binary([Len, Md5, TheThingToWrite]) and then write the result to the file. A term_to_binary call with a binary as the input is very fast compared to a term_to_binary call with EJSON as input (or some other nested structure) I think both compensate the JSON encoding after the separation of meta data fields and non-meta data fields. The following relaximation graph, for documents with sizes of 4Kb, shows a significant performance increase both for writes and reads - especially reads. http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400b94f I've also made a few tests to see how much the improvement is when querying a view, for the first time, without ?stale=ok. The size difference of the databases (after compaction) is also very significant - this change can reduce the size at least 50% in common cases. The test databases were created in an instance built from that experimental branch. Then they were replicated into a CouchDB instance built from the current trunk. At the end both databases were compacted (to fairly compare their final sizes). The databases contain the following view: { _id: _design/test, language: javascript, views: { simple: { map: function(doc) { emit(doc.float1, doc.strings[1]); } } } } ## Database with 500 000 docs of 2.5Kb each Document template is at: https://github.com/fdmanana/couchdb/blob/raw_json_docs/doc_2_5k.json Sizes (branch vs trunk): $ du -m couchdb/tmp/lib/disk_json_test.couch 1996 couchdb/tmp/lib/disk_json_test.couch $ du -m couchdb-trunk/tmp/lib/disk_ejson_test.couch 2693 couchdb-trunk/tmp/lib/disk_ejson_test.couch Time, from a user's perpective, to build the view index from scratch: $ time curl http://localhost:5984/disk_json_test/_design/test/_view/simple?limit=1 {total_rows:50,offset:0,rows:[ {id:076a-c1ae-4999-b508-c03f4d0620c5,key:null,value:wfxuF3N8XEK6} ]} real 6m6.740s
[jira] Commented: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater
[ https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004089#comment-13004089 ] Damien Katz commented on COUCHDB-1084: -- Thanks for the feedback on the code style, we definitely want to clean it up before commiting. Right now I'm more interested on the performance impact and how fruitful removing the btree lookup is. I'm hoping this patch will improve performance for all writes, both inserts and updates, but I don't have time set up benchmarks right now. Remove unnecessary btree lookup inside couch_db_updater --- Key: COUCHDB-1084 URL: https://issues.apache.org/jira/browse/COUCHDB-1084 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 1.2 Reporter: Damien Katz Assignee: Damien Katz Attachments: remove_btree_lookup.patch The CouchDB update process has an unnecessary btree lookup, where it reads the values in bulks, checks for conflicts, writes the docs to disk, updates the values appropriately and writes them to the btree out in a second step. It's possible to avoid this second step, and instead do all the checking, doc writing and value transformation in a single btree lookup, thereby reducing the number of btree traversals and disk IO. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater
[ https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004376#comment-13004376 ] Damien Katz commented on COUCHDB-1084: -- This isn't the first time i've seen weird unexpected results from relaxamation. I'm really thinking the benchmarks need some work to be more usable and useful. I can't explain wonder how the the improvements on the write path would cause this read impact. Perhaps more comprehensive tests would give a clearer picture what's going on, or maybe there is a bug in the tests themselves. I've also had a hard time interpreting the graphs, I think they need some smoothing or something to make it easier to visualize the differences. Remove unnecessary btree lookup inside couch_db_updater --- Key: COUCHDB-1084 URL: https://issues.apache.org/jira/browse/COUCHDB-1084 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 1.2 Reporter: Damien Katz Assignee: Damien Katz Attachments: remove_btree_lookup.patch The CouchDB update process has an unnecessary btree lookup, where it reads the values in bulks, checks for conflicts, writes the docs to disk, updates the values appropriately and writes them to the btree out in a second step. It's possible to avoid this second step, and instead do all the checking, doc writing and value transformation in a single btree lookup, thereby reducing the number of btree traversals and disk IO. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater
Remove unnecessary btree lookup inside couch_db_updater --- Key: COUCHDB-1084 URL: https://issues.apache.org/jira/browse/COUCHDB-1084 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 1.2 Reporter: Damien Katz Assignee: Damien Katz The CouchDB update process has an unnecessary btree lookup, where it reads the values in bulks, checks for conflicts, writes the docs to disk, updates the values appropriately and writes them to the btree out in a second step. It's possible to avoid this second step, and instead do all the checking, doc writing and value transformation in a single btree lookup, thereby reducing the number of btree traversals and disk IO. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater
[ https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz updated COUCHDB-1084: - Attachment: remove_btree_lookup.patch Applies to couchdb trunk revision 1078680 Remove unnecessary btree lookup inside couch_db_updater --- Key: COUCHDB-1084 URL: https://issues.apache.org/jira/browse/COUCHDB-1084 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 1.2 Reporter: Damien Katz Assignee: Damien Katz Attachments: remove_btree_lookup.patch The CouchDB update process has an unnecessary btree lookup, where it reads the values in bulks, checks for conflicts, writes the docs to disk, updates the values appropriately and writes them to the btree out in a second step. It's possible to avoid this second step, and instead do all the checking, doc writing and value transformation in a single btree lookup, thereby reducing the number of btree traversals and disk IO. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-1084) Remove unnecessary btree lookup inside couch_db_updater
[ https://issues.apache.org/jira/browse/COUCHDB-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003794#comment-13003794 ] Damien Katz commented on COUCHDB-1084: -- The attached patch might have stability issues, but should give an idea of the performance impact of the change. Would like to see some bench markmarks to see if it actually helps. Remove unnecessary btree lookup inside couch_db_updater --- Key: COUCHDB-1084 URL: https://issues.apache.org/jira/browse/COUCHDB-1084 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 1.2 Reporter: Damien Katz Assignee: Damien Katz Attachments: remove_btree_lookup.patch The CouchDB update process has an unnecessary btree lookup, where it reads the values in bulks, checks for conflicts, writes the docs to disk, updates the values appropriately and writes them to the btree out in a second step. It's possible to avoid this second step, and instead do all the checking, doc writing and value transformation in a single btree lookup, thereby reducing the number of btree traversals and disk IO. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-864) multipart/related PUT's always close the connection.
[ https://issues.apache.org/jira/browse/COUCHDB-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903044#action_12903044 ] Damien Katz commented on COUCHDB-864: - Last patch looks good. It's got my ok to check in to trunk and 1.0.x. multipart/related PUT's always close the connection. Key: COUCHDB-864 URL: https://issues.apache.org/jira/browse/COUCHDB-864 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Robert Newson Attachments: chunked.erl, mp_doc_put_http_pipeline.patch, mp_pipeline.patch I noticed that mochiweb always closes the connection when doing a multipart/related PUT (to insert the JSON document and accompanying attachments in one call). Ultimately it's because we call recv(0) and not recv_body, thus consuming more data than we actually process. Mochiweb notices that there is unread data on the socket and closes the connection. This impacts replication with attachments, as I believe they go through this code path (and, thus, are forever reconnecting). The code below demonstrates a fix for this issue but isn't good enough for trunk. Adam provided the important process dictionary fix. --- src/couchdb/couch_doc.erl |1 + src/couchdb/couch_httpd_db.erl | 13 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/src/couchdb/couch_doc.erl b/src/couchdb/couch_doc.erl index 5009f8f..f8c874b 100644 --- a/src/couchdb/couch_doc.erl +++ b/src/couchdb/couch_doc.erl @@ -455,6 +455,7 @@ doc_from_multi_part_stream(ContentType, DataFun) - Parser ! {get_doc_bytes, self()}, receive {doc_bytes, DocBytes} - +erlang:put(mochiweb_request_recv, true), Doc = from_json_obj(?JSON_DECODE(DocBytes)), % go through the attachments looking for 'follows' in the data, % replace with function that reads the data from MIME stream. diff --git a/src/couchdb/couch_httpd_db.erl b/src/couchdb/couch_httpd_db.erl index b0fbe8d..eff7d67 100644 --- a/src/couchdb/couch_httpd_db.erl +++ b/src/couchdb/couch_httpd_db.erl @@ -651,12 +651,13 @@ db_doc_req(#httpd{method='PUT'}=Req, Db, DocId) - } = parse_doc_query(Req), couch_doc:validate_docid(DocId), +Len = couch_httpd:header_value(Req,Content-Length), Loc = absolute_uri(Req, / ++ ?b2l(Db#db.name) ++ / ++ ?b2l(DocId)), RespHeaders = [{Location, Loc}], case couch_util:to_list(couch_httpd:header_value(Req, Content-Type)) of (multipart/related; ++ _) = ContentType - {ok, Doc0} = couch_doc:doc_from_multi_part_stream(ContentType, -fun() - receive_request_data(Req) end), +fun() - receive_request_data(Req, Len) end), Doc = couch_doc_from_req(Req, DocId, Doc0), update_doc(Req, Db, DocId, Doc, RespHeaders, UpdateType); _Else - @@ -775,9 +776,13 @@ send_docs_multipart(Req, Results, Options) - couch_httpd:send_chunk(Resp, --), couch_httpd:last_chunk(Resp). -receive_request_data(Req) - -{couch_httpd:recv(Req, 0), fun() - receive_request_data(Req) end}. - +receive_request_data(Req, undefined) - +receive_request_data(Req, 0); +receive_request_data(Req, Len) when is_list(Len)- +Remaining = list_to_integer(Len), +Bin = couch_httpd:recv(Req, Remaining), +{Bin, fun() - receive_request_data(Req, Remaining - iolist_size(Bin)) end}. + update_doc_result_to_json({{Id, Rev}, Error}) - {_Code, Err, Msg} = couch_httpd:error_info(Error), {[{id, Id}, {rev, couch_doc:rev_to_str(Rev)}, -- 1.7.2.2 Umbra -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-863) be quiet about dropping invalid references
[ https://issues.apache.org/jira/browse/COUCHDB-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900706#action_12900706 ] Damien Katz commented on COUCHDB-863: - This sounds like a deeper bug. Under what circumstances does it attempt to drop references and it's already closed?? be quiet about dropping invalid references -- Key: COUCHDB-863 URL: https://issues.apache.org/jira/browse/COUCHDB-863 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 1.0.1 Reporter: Randall Leeds Priority: Trivial couch_ref_counter:drop will complain, dying with noproc, if the reference counter does not exist. Since dropping a reference to a non-existent process isn't exactly an error I think we should squelch this one. I hate log noise and I've noticed this pop up in the logs a bunch, especially running the test suite. Extra noise doesn't make debugging easier and it could confuse people trying to solve real problems. Trivial, trivial patch unless I'm missing something really silly. I'll save everyone the extra emails from JIRA and just paste it here. diff --git a/src/couchdb/couch_ref_counter.erl b/src/couchdb/couch_ref_counter.erl index 5a111ab..1edc474 100644 --- a/src/couchdb/couch_ref_counter.erl +++ b/src/couchdb/couch_ref_counter.erl @@ -24,7 +24,9 @@ drop(RefCounterPid) - drop(RefCounterPid, self()). drop(RefCounterPid, Pid) - -gen_server:call(RefCounterPid, {drop, Pid}). +try gen_server:call(RefCounterPid, {drop, Pid}) +catch exit:{noproc, _} - ok +end. add(RefCounterPid) - -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-844) Documents missing after CouchDB restart
[ https://issues.apache.org/jira/browse/COUCHDB-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896209#action_12896209 ] Damien Katz commented on COUCHDB-844: - Hello Sascha. What file system are you running? Can you run a consistency check on it? It's strange. It looks like your file was either truncated or the header was never written. There is a bunch of data after the last header, and it contains your missing data, but none of it looks like a header for it. All the interval markers are set for data. This is consistent with a file that's been truncated. Still doing a bit more investigation to check the data regions to see if they might actually have a header. We have seen instances in the past (0.8.0 and earlier) where file systems have truncated the db file, making recovery difficult, which is why we switched to pure tail append format. As I recall, those reports were associated with the file system running out of space. Barring a physical corruption or truncation, the other only possibility I can think of is somehow there is bug where the couchdb isn't writing the header. I don't know of any other instances of that happening, but if that's what it is, it's a very serious bug. Documents missing after CouchDB restart --- Key: COUCHDB-844 URL: https://issues.apache.org/jira/browse/COUCHDB-844 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 1.0 Environment: Debian Version 5.0.5, Linux *** 2.6.29-xs5.5.0.17 #1 SMP Mon Aug 3 17:37:37 UTC 2009 i686 GNU/Linux, XenServer Guest Reporter: Sascha Reuter Priority: Critical After a CouchDB restart, recently added/changed documents+designdocuments (min. 2 weeks timeline!) are missing and cant be accessed trough REST Calls / Futon. All documents that are still available trought REST/Futon only exist in old revisions. All documents/revisions can be found doing a manual search (less/egrep/...) in the datafile (/var/lib/couchdb/database.couch) Example: strings dtap.couch | grep -i 226b2e6c-24b7-4336-92c7-257abf923b11 $226b2e6c-24b7-4336-92c7-257abf923b11h $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11l $226b2e6c-24b7-4336-92c7-257abf923b11h $226b2e6c-24b7-4336-92c7-257abf923b11h curl http://localhost:5984/dtap/226b2e6c-24b7-4336-92c7-257abf923b11 {error:not_found,reason:missing} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-812) implement randomization in views resultset
[ https://issues.apache.org/jira/browse/COUCHDB-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883212#action_12883212 ] Damien Katz commented on COUCHDB-812: - I think this is a fairly useful feature. Many moons ago I needs something similar in Lotus Notes, to randomly display a document from the database. It was difficult to get working. It should be possible to do a random view get by randomly navigating btree nodes until you reach a leaf node. Though there will be some bias when the tree is unbalanced. implement randomization in views resultset -- Key: COUCHDB-812 URL: https://issues.apache.org/jira/browse/COUCHDB-812 Project: CouchDB Issue Type: Wish Components: Database Core Affects Versions: 0.11 Environment: CouchDB Reporter: Mickael Bailly Priority: Minor This is a proposal for a new feature in CouchDB : allow a randomization of rows in a view response. We can for example add a randomize query parameter... This request would probably not return the same results for the same request. As an example : GET /db/_design/doc/_view/example : { .. rows: [ {key: 1 ...}, {key: 2 ...}, {key: 3 ...} ] } GET /db/_design/doc/_view/example?randomize=true : { .. rows: [ {key: 2 ...}, {key: 3 ...}, {key: 1 ...} ] } GET /db/_design/doc/_view/example?randomize=true : { .. rows: [ {key: 1 ...}, {key: 3 ...}, {key: 2 ...} ] } This is a feature hard to implement client-side (but by reading all doc ids and use client-side random function). It's implemented by the RDBMS from ages, probably for the very same reasons : if we should read all the rows client-side to random-select some of them, performances are awful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-86) (CouchDB on Windows) compaction can not be done.
[ https://issues.apache.org/jira/browse/COUCHDB-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882329#action_12882329 ] Damien Katz commented on COUCHDB-86: Can anyone verify if the patch in COUCHDB-780 fixes the problem on windows with the latest Erlang? (CouchDB on Windows) compaction can not be done. Key: COUCHDB-86 URL: https://issues.apache.org/jira/browse/COUCHDB-86 Project: CouchDB Issue Type: Bug Components: Build System Affects Versions: 0.8 Environment: Windows XP,Erlang/OTP R12B-3 Reporter: Li Zhengji Assignee: Paul Joseph Davis Priority: Blocker Fix For: 1.0 Attachments: windows_file_fix_2.patch Original Estimate: 5h Remaining Estimate: 5h During compacting, rename the current DB file to a .old file is not allowed on Windows. A possible workaround for this could be: 1. Close current DB file (.couch); 2. Send db_updated to update to use .compact; 3. After 5sec, delete the .couch file; This is done in a linked process, after that, this process send a message to update_loop; 4. After received the message in update_loop, close current DB file which is a .compact file, then rename it to .couch; 5. Finally, db_updated again to use this new .couch file. Maybe, there would be a pause in service? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (COUCHDB-780) Don't block the updater process while compaction deletes old files
[ https://issues.apache.org/jira/browse/COUCHDB-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz resolved COUCHDB-780. - Fix Version/s: 1.0 (was: 1.1) Resolution: Fixed Don't block the updater process while compaction deletes old files -- Key: COUCHDB-780 URL: https://issues.apache.org/jira/browse/COUCHDB-780 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.10.2, 0.11 Reporter: Randall Leeds Fix For: 1.0 Attachments: 0001-async-file-deletions.-COUCHDB-780.patch, async_compact_delete.patch I have what I think is a simple patch I'll attach. I don't see any reason not to include it unless rename operations can be seriously slow on some filesystems (but I expect this is not the case). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-807) authentication cache (user docs cache)
[ https://issues.apache.org/jira/browse/COUCHDB-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-807. --- Fix Version/s: 1.0 Resolution: Fixed authentication cache (user docs cache) -- Key: COUCHDB-807 URL: https://issues.apache.org/jira/browse/COUCHDB-807 Project: CouchDB Issue Type: Improvement Environment: trunk Reporter: Filipe Manana Assignee: Filipe Manana Fix For: 1.0 Attachments: auth_cache.patch, auth_cache_2.patch Currently, in order to authenticate an incoming request, each authentication handler will read a user doc from the _users DB. By default, 3 authentication handlers are defined (default.ini), which means we can have 3 _users DB lookups (besides 3 DB open and close operations). Taking into account that this is done for each incoming HTTP request, for very busy servers this current behaviour might be overkill. The following patch adds a new gen_server which implements an authentication cache and keeps the _users DB open all the time, so that cache misses and refreshes are as quick as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-800) Problem when writing larger than 4kb file headers
Problem when writing larger than 4kb file headers - Key: COUCHDB-800 URL: https://issues.apache.org/jira/browse/COUCHDB-800 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.11 Reporter: Damien Katz Assignee: Damien Katz Fix For: 0.11.1, 0.12 From Andrey Somov: Hi, while reading the CouchDB source I found a question in couch_file.erl, I am not sure whether it is a bug or not. Lines 297-311: handle_call({write_header, Bin}, _From, #file{fd=Fd, eof=Pos}=File) - BinSize = size(Bin), case Pos rem ?SIZE_BLOCK of 0 - Padding = ; BlockOffset - Padding = 0:(8*(?SIZE_BLOCK-BlockOffset)) end, FinalBin = [Padding, 1, BinSize:32/integer | make_blocks(1, [Bin])], case file:write(Fd, FinalBin) of ok - {reply, ok, File#file{eof=Pos+iolist_size(FinalBin)}}; Error - {reply, Error, File} end; Because 1, BinSize:32/integer occupies 5 bytes make_blocks() shall use offset=5, but the offset is only 1. (it should be make_blocks(5, [Bin])) Since the header is smaller then 4k there is no difference and it works (the tests succeed with both 1 and 5). But it makes it more difficult to understand the code for those who study the source to understand how it works. - Thank you, Andrey -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-800) Problem when writing larger than 4kb file headers
[ https://issues.apache.org/jira/browse/COUCHDB-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-800. --- Resolution: Fixed Problem when writing larger than 4kb file headers - Key: COUCHDB-800 URL: https://issues.apache.org/jira/browse/COUCHDB-800 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.11 Reporter: Damien Katz Assignee: Damien Katz Fix For: 0.11.1, 0.12 From Andrey Somov: Hi, while reading the CouchDB source I found a question in couch_file.erl, I am not sure whether it is a bug or not. Lines 297-311: handle_call({write_header, Bin}, _From, #file{fd=Fd, eof=Pos}=File) - BinSize = size(Bin), case Pos rem ?SIZE_BLOCK of 0 - Padding = ; BlockOffset - Padding = 0:(8*(?SIZE_BLOCK-BlockOffset)) end, FinalBin = [Padding, 1, BinSize:32/integer | make_blocks(1, [Bin])], case file:write(Fd, FinalBin) of ok - {reply, ok, File#file{eof=Pos+iolist_size(FinalBin)}}; Error - {reply, Error, File} end; Because 1, BinSize:32/integer occupies 5 bytes make_blocks() shall use offset=5, but the offset is only 1. (it should be make_blocks(5, [Bin])) Since the header is smaller then 4k there is no difference and it works (the tests succeed with both 1 and 5). But it makes it more difficult to understand the code for those who study the source to understand how it works. - Thank you, Andrey -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-791) Changes not written if server shutdown during delayed_commits period
[ https://issues.apache.org/jira/browse/COUCHDB-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878667#action_12878667 ] Damien Katz commented on COUCHDB-791: - Sleeping while waiting doesn't give a guarantee. On a heavily loaded server, it could take many seconds to completely flush everything. If you want to ensure your data is to disk, use full commits. Nothing else gives any guarantees. Changes not written if server shutdown during delayed_commits period Key: COUCHDB-791 URL: https://issues.apache.org/jira/browse/COUCHDB-791 Project: CouchDB Issue Type: Bug Affects Versions: 0.11.1 Environment: Linux (Ubuntu 10.04) Reporter: Matt Goodall If the couchdb server is shutdown (couchdb -d, Ctrl+C at the console, etc) during the delayed commits period then buffered updates are lost. Simple script to demonstrate the problem is: db=http://localhost:5984/scratch curl $db -X DELETE curl $db -X PUT curl $db -X POST -d '{}' /path/to/couchdb/bin/couchdb -d When couchdb is started again the database is empty. Affects 0.11.x and trunk branches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-780) Don't block the updater process while compaction deletes old files
[ https://issues.apache.org/jira/browse/COUCHDB-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878327#action_12878327 ] Damien Katz commented on COUCHDB-780: - Haven't looked at the patch, but this sounds similar to mark hammonds patch for fixing the windows file problems (which requires a yet unreleased version of Erlang I think). Maybe Marks patch also fixes this problem, or could with a little more work. https://issues.apache.org/jira/browse/COUCHDB-86 Don't block the updater process while compaction deletes old files -- Key: COUCHDB-780 URL: https://issues.apache.org/jira/browse/COUCHDB-780 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.10.2, 0.11 Reporter: Randall Leeds Fix For: 1.1 Attachments: 0001-async-file-deletions.-COUCHDB-780.patch, async_compact_delete.patch I have what I think is a simple patch I'll attach. I don't see any reason not to include it unless rename operations can be seriously slow on some filesystems (but I expect this is not the case). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-767) do a non-blocking file:sync
[ https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876722#action_12876722 ] Damien Katz commented on COUCHDB-767: - The Fsync on a separate thread/process might not work. Definitely load test the patch to ensure its giving you what your expect. http://antirez.com/post/fsync-different-thread-useless.html do a non-blocking file:sync --- Key: COUCHDB-767 URL: https://issues.apache.org/jira/browse/COUCHDB-767 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.11 Reporter: Adam Kocoloski Fix For: 1.1 Attachments: 767-async-fsync.patch, async_fsync.patch I've been taking a close look at couch_file performance in our production systems. One of things I've noticed is that reads are occasionally blocked for a long time by a slow call to file:sync. I think this is unnecessary. I think we could do something like handle_call(sync, From, #file{name=Name}=File) - spawn_link(fun() - sync_file(Name, From) end), {noreply, File}; and then sync_file(Name, From) - {ok, Fd} = file:open(Name, [read, raw]), gen_server:reply(From, file:sync(Fd)), file:close(Fd). Does anyone see a downside to this? Individual clients of couch_file still see exactly the same behavior as before, only readers are not blocked by syncs initiated in the db_updater process. When data needs to be flushed file:sync is _much_ slower than spawning a local process and opening the file again -- in the neighborhood of 1000x slower even on Linux with its less-than-durable use of vanilla fsync. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-782) Restarting replication
[ https://issues.apache.org/jira/browse/COUCHDB-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875190#action_12875190 ] Damien Katz commented on COUCHDB-782: - Per database UUIDs have the problem of databases being copied around on the file system, or restored from backup. A better option is convert the URIs to a canonical format so they always look the same. Restarting replication --- Key: COUCHDB-782 URL: https://issues.apache.org/jira/browse/COUCHDB-782 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 0.10 Environment: Ubuntu, 9.10 Reporter: Till Klampaeckel So we had to restart replication on a server and here's something I noticed. At first I restarted the replication via the following command from localhost: curl -X POST -d '{source:http://localhost:5984/foo;, target:http://remote:5984/foo}' http://localhost:5984/_replicate In response, futon stats: W Processed source update #176841152 That part is great. Last night I did not have immediate access to the shell so I restarted replication from remote (through curl on my mobile): curl -X POST -d '{source:http://user:p...@public.host:5984/foo;, target:http://remote:5984/foo}' http://user:p...@pubic.host:5984/_replicate The response in futon this morning: W Processed source update #1066 ... and it kept sitting there like it was stalled and only continued in smaller increments. I restarted CouchDB and restarted from localhost - instant jump to 176 million. I'm just wondering what might be different accept for that one is against the public interface, vs. localhost. I'd assume that replication behaves the same regardless. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-763) duplicate and or missing revisions in changes feed
[ https://issues.apache.org/jira/browse/COUCHDB-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868454#action_12868454 ] Damien Katz commented on COUCHDB-763: - This looks to be the same issue to what Simon Eisenmann encountered. I believe the problem was 2 databases servers running at the same time for a short time during an internal auto-restart. This problem has been fixed in trunk. duplicate and or missing revisions in changes feed -- Key: COUCHDB-763 URL: https://issues.apache.org/jira/browse/COUCHDB-763 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.10.1 Reporter: Randall Leeds Priority: Critical I have no idea if this is unique to 0.10.1 or if it shows up on 0.11/trunk since I have no clue how to repro. If we can identify why this happens we should work to be very sure it's fixed. I see something like the following in my changes feed (taken from consecutive lines of an actually changes feed): {seq:36527,id:anonymized_docid,changes:[{rev:2186-967dbcd9d960b77955fcf6048fb219cc}]}, {seq:36530,id:anonymized_docid,changes:[{rev:2188-ae8481b29fd3a42d5190aba7c13a522b}]}, I was under the impression that _changes only showed the newest revision for any document. Furthermore, the first of these two is actually missing. Querying the document with ?revs_info=true shows it as such and this is confirmed by trying to query for ?rev=2186-967dbcd9d960b77955fcf6048fb219cc 1) Missing revisions should never show up in changes 2) Changes shouldn't list a document twice 3) This makes replication impossible since the reader tries to open missing revisions. Mostly for number (3) I'm marking this as critical. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)
[ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867197#action_12867197 ] Damien Katz commented on COUCHDB-738: - I've been thinking about this issue, and I think storing the whole rev tree in the by_seq index is a bad idea. I'm also thinking of ways to make the compaction faster. Store the full_doc_info outside the by_id btree, but store the doc_info instead (with it's pointers to the main doc and conflicts) with a pointer to the full_doc_info as well. On reads, this avoids the overhead of loading up the ful_doc_info just to get the main doc. Updates and replications reads (that request the rev_info) will have to load up the full_doc_info with an extra read, but unlikely to be an extra disk io since it will be close the most recent doc revision. The by_seq index will also have the doc_info and a pointer to the full_doc_info. Then on compaction, scan the by_seq index, copying over the full_doc_info and the documents and attachments into a new file. Each full_doc_info should be linked to the next with an offset to next's file pos. Then scan the newly written full_doc_info, converting them to doc_infos and pointers to the full_doc_info, and writing them out consecutively to the end of the file. Then sort just this portion of the file, on disk, by the id in the doc_info. This is the most expensive part of the compaction, but sorting things on disk is common problem with lots of open libraries out there that are highly optimized. Then convert this id sorted portion of the file to btree leaf nodes. Then rescan leaf nodes and build up the inner nodes, recurse until you have single root node left. This is now your by_id index. Then rescan the full_doc_infos, and write out to the end of the file the doc_infos and pointers back to the full_doc_infos. This is already sorted by_seq. Then convert this seq sorted portion of the file to btree leaf nodes. Then rescan leaf nodes and build up the inner nodes, recurse until you have single root node left. This is now your by_seq index. You now a fully compacted database with no wasted btree nodes. I think this will be a lot faster. With the exception of the by_id sorting phase, this eliminates random disk seeks. more efficient DB compaction (fewer seeks) -- Key: COUCHDB-738 URL: https://issues.apache.org/jira/browse/COUCHDB-738 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.9.2, 0.10.1, 0.11 Reporter: Adam Kocoloski Assignee: Adam Kocoloski Fix For: 1.1 Attachments: 738-efficient-compaction-v1.patch, 738-efficient-compaction-v2.patch CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database. It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree. I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively. The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file. If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests). On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge. Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS. The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster. Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator. This patch might have downsides when documents have a large number of edits. These include an increase in the size of the database and slower view indexing. I expect both to be small effects. The patch can be applied directly to tr...@934272. Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)
[ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863632#action_12863632 ] Damien Katz commented on COUCHDB-738: - Definitely I'd like to see performance metrics of view building on heaviily editted documents before committing this. more efficient DB compaction (fewer seeks) -- Key: COUCHDB-738 URL: https://issues.apache.org/jira/browse/COUCHDB-738 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.9.2, 0.10.1, 0.11 Reporter: Adam Kocoloski Assignee: Adam Kocoloski Fix For: 1.1 Attachments: 738-efficient-compaction-v1.patch CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database. It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree. I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively. The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file. If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests). On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge. Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS. The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster. Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator. This patch might have downsides when documents have a large number of edits. These include an increase in the size of the database and slower view indexing. I expect both to be small effects. The patch can be applied directly to tr...@934272. Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-623) File format for views is space and time inefficient - use a better one
[ https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-623. --- Resolution: Invalid Assignee: Damien Katz Closing as Invalid this has no objective criteria for being resolved. File format for views is space and time inefficient - use a better one -- Key: COUCHDB-623 URL: https://issues.apache.org/jira/browse/COUCHDB-623 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.10 Reporter: Roger Binns Assignee: Damien Katz This was discussed on the dev mailing list over the last few days and noted here so it isn't forgotten. The main database file format is optimised for data integrity - not losing or mangling documents - and rightly so. That same append-only format is also used for views where it is a poor fit. The more random the ordering of data supplied, the larger the btree. The larger the keys (in bytes) the larger the btree. As an example my 2GB of raw JSON data turns into a 3.9GB CouchDB database but a 27GB view file (before compacting to 900MB). Since views are not replicated, this requires a disproportionate amount of disk space on each receiving server (not to mention I/O load). The format also affects view generation performance. By loading my documents into CouchDB in an order by the most emitted value in views I was able to reduce load time from 75 minutes to 40 minutes with the view file size being 15GB instead of 27GB, but still very distant from the 900MB post compaction. Views are a performance enhancement. They save you from having to visit every document when doing some queries. The data within in a view is generated and hence the only consequence of losing view data is a performance one and the view can be regenerated anyway. Consequently the file format should be one that is optimised for performance and size. The only integrity feature needed is the ability to tell that the view is potentially corrupt (eg the power failed while it was being generated/updated). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client
[ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799937#action_12799937 ] Damien Katz commented on COUCHDB-583: - I haven't looked at the patch, but I agree with most of Paul comments, except for figuring out when to compress files. Lots of compressed files might have uncompressed headers in the file, leading to unnecessary compression. MP3s with id3v2 tags immediately come to mind. storing attachments in compressed form and serving them in compressed form if accepted by the client Key: COUCHDB-583 URL: https://issues.apache.org/jira/browse/COUCHDB-583 Project: CouchDB Issue Type: New Feature Components: Database Core, HTTP Interface Environment: CouchDB trunk Reporter: Filipe Manana Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form. When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header Accept-Encoding). Otherwise couch decompresses the attachment before sending it back to the client. Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file. This follows Damien's suggestion from 30 November: Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do? Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read. Patch attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-604) _changes feed with ?feed=continuous does not return valid JSON
[ https://issues.apache.org/jira/browse/COUCHDB-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793257#action_12793257 ] Damien Katz commented on COUCHDB-604: - Wow, lots of comments on this one. I originally implemented this as a single JSON stream, it was switched to newline separated json objects for ease parsing by clients. I don't have an opinion one way or the other, but the thing starts to bothers me is the culture offering more options so that everyone can have it exactly as they want it. The stems from the increasing texture of the API, and what must get documented and tested, and the burden of what must get implemented for those who want to make compatible CouchDB implementations. I tend to favor simpler APIs to the point of occasionally pushing some of the complexity to the client to ensure the server itself isn't completely overloaded with complexity and options. _changes feed with ?feed=continuous does not return valid JSON -- Key: COUCHDB-604 URL: https://issues.apache.org/jira/browse/COUCHDB-604 Project: CouchDB Issue Type: Improvement Components: HTTP Interface Affects Versions: 0.10 Reporter: Joscha Feth Priority: Trivial When using the _changes interface via ?feed=continuous the JSON returned is rather a stream of JSON documents than a valid JSON file itself: {seq:38,id:f473fe61a8a53778d91c38b23ed6e20f,changes:[{rev:9-d3e71c7f5f991b26fe014d884a27087f}]} {seq:68,id:2a574814d61d9ec8a0ebbf43fa03d75b,changes:[{rev:6-67179f215e42d63092dc6b2199a3bf51}],deleted:true} {seq:70,id:75dbdacca8e475f5909e3cc298905ef8,changes:[{rev:1-0dee261a2bd4c7fb7f2abd811974d3f8}]} {seq:71,id:09fb03236f80ea0680a3909c2d788e43,changes:[{rev:1-a9646389608c13a5c26f4c14c6863753}]} to be valid there needs to be a root element (and then an array with commata) like in the non-continuous feed: {results:[ {seq:38,id:f473fe61a8a53778d91c38b23ed6e20f,changes:[{rev:9-d3e71c7f5f991b26fe014d884a27087f}]}, {seq:68,id:2a574814d61d9ec8a0ebbf43fa03d75b,changes:[{rev:6-67179f215e42d63092dc6b2199a3bf51}],deleted:true}, {seq:70,id:75dbdacca8e475f5909e3cc298905ef8,changes:[{rev:1-0dee261a2bd4c7fb7f2abd811974d3f8}]}, {seq:71,id:09fb03236f80ea0680a3909c2d788e43,changes:[{rev:1-a9646389608c13a5c26f4c14c6863753}]}, in short this means that if someone does not parse the change events in an object like manner (e.g. waiting for a line-ending and then parsing the line), but using a SAX-like parser (throwing events of each new object, etc.) and expecting the response to be JSON (which it is not, because its not {x:[{},{},{}]} but {}{}{} which is not valid) there is an error thrown. I can see, that people doing this line by line might be okay with the above approach, but the response is not valid JSON and it would be nice if there were a flag to make the response valid JSON. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API
[ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783732#action_12783732 ] Damien Katz commented on COUCHDB-583: - One problem I think I see with the patch is that we are compressing regardless of mime type. For already compressed files (image, music and video), it does nothing but add CPU overhead. Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do? Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read. adding ?compression=(gzip|deflate) optional parameter to the attachment download API Key: COUCHDB-583 URL: https://issues.apache.org/jira/browse/COUCHDB-583 Project: CouchDB Issue Type: New Feature Components: HTTP Interface Environment: CouchDB trunk revision 885240 Reporter: Filipe Manana Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch Original Estimate: 24h Remaining Estimate: 24h The following new feature is added in the patch following this ticket creation. A new optional http query parameter compression is added to the attachments API. This parameter can have one of the values: gzip or deflate. When asking for an attachment (GET http request), if the query parameter compression is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate). Further, it adds a new config option treshold_for_chunking_comp_responses (httpd section) that specifies an attachment length threshold. If an attachment has a length = than this threshold, the http response will be chunked (besides compressed). Note that using non chunked compressed body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary evil, as we only know the length of the compressed body after compressing all the body, and we need to set the Content-Length header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory. Examples: $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate $ curl http://localhost:5984/testdb/testdoc1/readme.txt # attachment will not be compressed $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar # will give a 500 error code Etap test case included. Feedback would be very welcome. cheers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-292) A deleted document may be resaved with an old revision and is then considered undeleted
[ https://issues.apache.org/jira/browse/COUCHDB-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-292. --- Resolution: Fixed Assignee: Damien Katz Fix with tests in svn r883494. A deleted document may be resaved with an old revision and is then considered undeleted --- Key: COUCHDB-292 URL: https://issues.apache.org/jira/browse/COUCHDB-292 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.9 Reporter: Paul Carey Assignee: Damien Katz Fix For: 0.11 If a document is deleted, a PUT request may be issued with the same revision that was passed to the DELETE request. When this happens the previously deleted document is assigned a new revision and is no longer considered deleted. This behaviour is new within the last few weeks. The following curl session illustrates the issue. 08:18 : ~ $ curl -X PUT -d '{_id:foo}' localhost:5984/scratch/foo {ok:true,id:foo,rev:1-3690485448} 08:19 : ~ $ curl -X PUT -d '{_id:foo,_rev:1-3690485448}' localhost:5984/scratch/foo {ok:true,id:foo,rev:2-966942539} 08:19 : ~ $ curl -X DELETE localhost:5984/scratch/foo?rev=2-966942539 {ok:true,id:foo,rev:3-421182311} 08:20 : ~ $ curl -X GET localhost:5984/scratch/foo {error:not_found,reason:deleted} 08:20 : ~ $ curl -X PUT -d '{_id:foo,_rev:2-966942539}' localhost:5984/scratch/foo {ok:true,id:foo,rev:3-1867999175} 08:20 : ~ $ curl -X GET localhost:5984/scratch/foo {_id:foo,_rev:3-1867999175} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-558) Validate Content-MD5 request headers on uploads
[ https://issues.apache.org/jira/browse/COUCHDB-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772742#action_12772742 ] Damien Katz commented on COUCHDB-558: - Robert is correct, MD5 is fine for validating integrity, and as far as I know, it's the only hash function that's standardized in HTTP. For fully secure, unspoofable transmission, SSL is the way to go anyway. Validate Content-MD5 request headers on uploads --- Key: COUCHDB-558 URL: https://issues.apache.org/jira/browse/COUCHDB-558 Project: CouchDB Issue Type: Improvement Components: Database Core, HTTP Interface Reporter: Adam Kocoloski Fix For: 0.11 We could detect in-flight data corruption if a client sends a Content-MD5 header along with the data and Couch validates the MD5 on arrival. RFC1864 - The Content-MD5 Header Field http://www.faqs.org/rfcs/rfc1864.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-517) changing uuid algorithm causes client errors
[ https://issues.apache.org/jira/browse/COUCHDB-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761906#action_12761906 ] Damien Katz commented on COUCHDB-517: - I've reviewed the patch and it looks good. I'd commit it myself but I'm not sure of the patch flags I need for this diff. changing uuid algorithm causes client errors Key: COUCHDB-517 URL: https://issues.apache.org/jira/browse/COUCHDB-517 Project: CouchDB Issue Type: Bug Reporter: Robert Newson Attachments: couchdb-517.patch When changing the uuid configuration (by PUT to _config/uuids/algorithm), a client attempting an operation at the same time experiences a transitory connection refused problem. Attached is a patch that changes couch_uuid.erl so that it changes its internal state when the configuration changes rather than the current behavior of stopping and then being restarted by the supervisor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-448) Support Gzip encoding for replicating over slow connections
[ https://issues.apache.org/jira/browse/COUCHDB-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755753#action_12755753 ] Damien Katz commented on COUCHDB-448: - Ideally we'll store attachments gzipped, and then just stream them unchanged for clients that can handle, decompress for clients that can't. We'll probably need a config file to avoid mime types that aren't compressable, like images and movies. Support Gzip encoding for replicating over slow connections --- Key: COUCHDB-448 URL: https://issues.apache.org/jira/browse/COUCHDB-448 Project: CouchDB Issue Type: Improvement Components: HTTP Interface Reporter: Jason Davies Assignee: Adam Kocoloski This shouldn't be too hard to add, we should support it in general for all HTTP requests to the server and also allow it to be enabled in the replicator client for pull/push replication. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-495) Make views twice as fast
[ https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-495. --- Resolution: Fixed We now have a raw collation option, and regular json collation is much faster too. Make views twice as fast Key: COUCHDB-495 URL: https://issues.apache.org/jira/browse/COUCHDB-495 Project: CouchDB Issue Type: Improvement Components: JavaScript View Server Reporter: Chris Anderson Fix For: 0.11 Attachments: binary_collate.diff, couch_perf.py, less_json.patch, numbers-davisp.txt, outputv.patch, perf.py, R13B1-uca-bif.patch, term_collate.diff Devs, Damien's identified view collation as the most significant bottleneck for the view generation. We've done some testing, and some preliminary patches, and the upshot seems to be that even removing ICU from the collator is not a significant boost. What does speed things up greatly is using raw Erlang term comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A B end. provides a roughly 2x speedup. However, the patch is challenging for a few reasons: Making the collation strategy switchable at all is tough. It's actually quite easy to get an alternate less function into the btree writer (all you've got to do is set it in couch_view_group:init_group). The hard part is propagating the same less function to the PassedEndFun. There's a secondary problem that when you use raw term comparison, a lot of terms turn out to come before nil, and after {}, which we use as artificial first and last terms in the less_json function. So just switching to raw collation alone will leave you with a view with unreachable rows. I tried two different approaches to the problem last night, and both of them led to (instructive) dead ends. I'll attach them for illustration purposes. The next line of attack we think should be tried is this: First - remove _all_docs_by_seq, as it is just adding complexity to the problem, and has been deprecated by _changes anyway. Along the same lines, _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has completely different collation needs than make_view_fold_fun. We'll end up duplicating a little code in the _all_docs implementation, but it should be worth it because it will make the other work much simpler. Once those changes have laid the groundwork, the next step is to change make_view_fold_fun and couch_view:fold, so that rather than make_view_fold_fun being responsible for detecting when we've passed the endkey. That means make_passed_end_fun and all references to PassedEnd and PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree. couch_view:fold (and the underlying btree) will need to accept not just a start, but also an endkey. This will make it much easier to use the less fun that is stored on View#view.btree#btree.less to determine PassedEnd funs. This will move some complexity to the btree code from the view code, but will keep the concerns more aligned. This also means that the btree will need to accept not only an endkey for folds, but also an inclusive_end parameter. Once we have all these refactorings done, it will be easy to make the less fun for an index configurable, as both the index writer and the index reader will look for it in the same place (on the #btree record). My aim is to start a discussion and get someone excited to work on this patch. Think of all the fast-views glory you'll get! Please ask questions and otherwise force me to clarify the above discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-495) Make views twice as fast
[ https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz updated COUCHDB-495: Attachment: couch_perf.py Make views twice as fast Key: COUCHDB-495 URL: https://issues.apache.org/jira/browse/COUCHDB-495 Project: CouchDB Issue Type: Improvement Components: JavaScript View Server Reporter: Chris Anderson Fix For: 0.11 Attachments: binary_collate.diff, couch_perf.py, term_collate.diff Devs, Damien's identified view collation as the most significant bottleneck for the view generation. We've done some testing, and some preliminary patches, and the upshot seems to be that even removing ICU from the collator is not a significant boost. What does speed things up greatly is using raw Erlang term comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A B end. provides a roughly 2x speedup. However, the patch is challenging for a few reasons: Making the collation strategy switchable at all is tough. It's actually quite easy to get an alternate less function into the btree writer (all you've got to do is set it in couch_view_group:init_group). The hard part is propagating the same less function to the PassedEndFun. There's a secondary problem that when you use raw term comparison, a lot of terms turn out to come before nil, and after {}, which we use as artificial first and last terms in the less_json function. So just switching to raw collation alone will leave you with a view with unreachable rows. I tried two different approaches to the problem last night, and both of them led to (instructive) dead ends. I'll attach them for illustration purposes. The next line of attack we think should be tried is this: First - remove _all_docs_by_seq, as it is just adding complexity to the problem, and has been deprecated by _changes anyway. Along the same lines, _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has completely different collation needs than make_view_fold_fun. We'll end up duplicating a little code in the _all_docs implementation, but it should be worth it because it will make the other work much simpler. Once those changes have laid the groundwork, the next step is to change make_view_fold_fun and couch_view:fold, so that rather than make_view_fold_fun being responsible for detecting when we've passed the endkey. That means make_passed_end_fun and all references to PassedEnd and PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree. couch_view:fold (and the underlying btree) will need to accept not just a start, but also an endkey. This will make it much easier to use the less fun that is stored on View#view.btree#btree.less to determine PassedEnd funs. This will move some complexity to the btree code from the view code, but will keep the concerns more aligned. This also means that the btree will need to accept not only an endkey for folds, but also an inclusive_end parameter. Once we have all these refactorings done, it will be easy to make the less fun for an index configurable, as both the index writer and the index reader will look for it in the same place (on the #btree record). My aim is to start a discussion and get someone excited to work on this patch. Think of all the fast-views glory you'll get! Please ask questions and otherwise force me to clarify the above discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-495) Make views twice as fast
[ https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751215#action_12751215 ] Damien Katz commented on COUCHDB-495: - I've attach the file I'm using for performance benchmarking couch_perf.py. In my tests, the majority of the time was spent inside the less comparator function. Part of the problem is the expense of the callout the ICU for collation, which is copying the strings to buffers before comparing them. That can be fixed by using a more efficient method of sending data to Erlang native ports, which is something I'm working on. However, our json comparison function is also far more expensive than the built-in Erlang term comparison operators. So the easy solution is to just do a native Erlang term collation option. This is the option used by views that don't need collation, just performance. A better, but not really possible yet solution, is to code our own comparison function in C, to be on par with Erlangs built-in comparison. I think this isn't yet possible with Erlang and C code without hacking the core VM. Make views twice as fast Key: COUCHDB-495 URL: https://issues.apache.org/jira/browse/COUCHDB-495 Project: CouchDB Issue Type: Improvement Components: JavaScript View Server Reporter: Chris Anderson Fix For: 0.11 Attachments: binary_collate.diff, couch_perf.py, term_collate.diff Devs, Damien's identified view collation as the most significant bottleneck for the view generation. We've done some testing, and some preliminary patches, and the upshot seems to be that even removing ICU from the collator is not a significant boost. What does speed things up greatly is using raw Erlang term comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A B end. provides a roughly 2x speedup. However, the patch is challenging for a few reasons: Making the collation strategy switchable at all is tough. It's actually quite easy to get an alternate less function into the btree writer (all you've got to do is set it in couch_view_group:init_group). The hard part is propagating the same less function to the PassedEndFun. There's a secondary problem that when you use raw term comparison, a lot of terms turn out to come before nil, and after {}, which we use as artificial first and last terms in the less_json function. So just switching to raw collation alone will leave you with a view with unreachable rows. I tried two different approaches to the problem last night, and both of them led to (instructive) dead ends. I'll attach them for illustration purposes. The next line of attack we think should be tried is this: First - remove _all_docs_by_seq, as it is just adding complexity to the problem, and has been deprecated by _changes anyway. Along the same lines, _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has completely different collation needs than make_view_fold_fun. We'll end up duplicating a little code in the _all_docs implementation, but it should be worth it because it will make the other work much simpler. Once those changes have laid the groundwork, the next step is to change make_view_fold_fun and couch_view:fold, so that rather than make_view_fold_fun being responsible for detecting when we've passed the endkey. That means make_passed_end_fun and all references to PassedEnd and PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree. couch_view:fold (and the underlying btree) will need to accept not just a start, but also an endkey. This will make it much easier to use the less fun that is stored on View#view.btree#btree.less to determine PassedEnd funs. This will move some complexity to the btree code from the view code, but will keep the concerns more aligned. This also means that the btree will need to accept not only an endkey for folds, but also an inclusive_end parameter. Once we have all these refactorings done, it will be easy to make the less fun for an index configurable, as both the index writer and the index reader will look for it in the same place (on the #btree record). My aim is to start a discussion and get someone excited to work on this patch. Think of all the fast-views glory you'll get! Please ask questions and otherwise force me to clarify the above discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-486) Better separation between httpd and core through api layer
[ https://issues.apache.org/jira/browse/COUCHDB-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748022#action_12748022 ] Damien Katz commented on COUCHDB-486: - I like the ideas behind this patch, but I think I don't like everything dumped into a single module. I think I'd prefer instead to have the same module names and the wrapper calls in the same modules, with the implementation code in a new file. So the public wrapper calls for couch_db would remain in couch_db, but the code is now moved to couch_db_imp or couch_db_priv. I think export_all is fine if everything is going to be exported anyway. Better separation between httpd and core through api layer -- Key: COUCHDB-486 URL: https://issues.apache.org/jira/browse/COUCHDB-486 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Adam Kocoloski Fix For: 0.11 Attachments: couch_api.patch I'm attaching a patch that routes non-purely-functional calls into core CouchDB modules through a new couch_api module. I also went ahead and wrote down dialyzer specs for everything in couch_api. I think this will be a useful reference, will make the codebase a bit more accessible to newcomers, and will help us maintain better separation between the purely functional httpd layer and the core (useful in e.g. partitioning work). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-464) Allow POST to _log for external processes
[ https://issues.apache.org/jira/browse/COUCHDB-464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742341#action_12742341 ] Damien Katz commented on COUCHDB-464: - Why not just write the log messages to a db? Allow POST to _log for external processes - Key: COUCHDB-464 URL: https://issues.apache.org/jira/browse/COUCHDB-464 Project: CouchDB Issue Type: New Feature Reporter: Robert Newson Attachments: 0001-Add-POST-support-to-_log.patch, 0001-Add-POST-support-to-_log.patch, 0001-Add-POST-support-to-_log.patch Add POST support to _log so that external processes can also log to couch.log. This would allow couchdb-lucene (to pick a random example) to log consistently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-462) built-in conflicts view
[ https://issues.apache.org/jira/browse/COUCHDB-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742542#action_12742542 ] Damien Katz commented on COUCHDB-462: - I think we should reconsider this patch. For one thing, it's expensive at runtime, it requires doing a linear scan on the full doc index. If you have millions of docs and no conflicts, it will must scan through every doc meta record just to tell you that. Another problem is you don't get any filtering or formatting. Using couchdb view, a user can construct a view that shows conflicts by author, customer, area, etc and format the results for display. Using this facility, you get no formatting or collation options. I favor backing this change out. built-in conflicts view --- Key: COUCHDB-462 URL: https://issues.apache.org/jira/browse/COUCHDB-462 Project: CouchDB Issue Type: Improvement Components: HTTP Interface Reporter: Adam Kocoloski Fix For: 0.10 Attachments: 462-jan-2.patch, conflicts_view.diff, COUCHDB-462-adam-updated.patch, COUCHDB-462-jan.patch This patch adds a built-in _conflicts view indexed by document ID that looks like GET /dbname/_conflicts {rows:[ {id:foo, rev:1-1aa8851c9bb2777e11ba56e0bf768649, conflicts:[1-bdc15320c0850d4ee90ff43d1d298d5d]} ]} GET /dbname/_conflicts?deleted=true {rows:[ {id:bar, rev:5-dd31186f5aa11ebd47eb664fb342f1b1, conflicts:[5-a0efbb1990c961a078dc5308d03b7044], deleted_conflicts:[3-bdc15320c0850d4ee90ff43d1d298d5d,2-cce334eeeb02d04870e37dac6d33198a]}, {id:baz, rev:2-eec205a9d413992850a6e32678485900, deleted:true, deleted_conflicts:[2-10009b36e28478b213e04e71c1e08beb]} ]} As the HTTPd and view layers are a bit outside my specialty I figured I should ask for a Review before Commit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (COUCHDB-420) OAuth authentication support (2-legged initially) and cookie-based authentication
[ https://issues.apache.org/jira/browse/COUCHDB-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz resolved COUCHDB-420. - Resolution: Fixed OAuth authentication support (2-legged initially) and cookie-based authentication - Key: COUCHDB-420 URL: https://issues.apache.org/jira/browse/COUCHDB-420 Project: CouchDB Issue Type: New Feature Components: HTTP Interface Reporter: Jason Davies Priority: Blocker Fix For: 0.10 Attachments: oauth.1.diff, oauth.2.patch, oauth.3.patch This patch adds two-legged OAuth support to CouchDB. 1. In order to do this, a couple of changes have been made to the way auth handlers are used. Essentially, the patch allows multiple handlers to be specified in a comma-separated list in the following in the [httpd] section of your .ini config e.g. authentication_handlers = {couch_httpd_oauth, oauth_authentication_handler}, {couch_httpd_auth, default_authentication_handler} The handlers are tried in order until one of them successfully authenticates and sets user_ctx on the request. Then the request is passed to the main handler. 2. Now for the OAuth consumer keys and secrets: as Ubuntu need to be able to bootstrap this i.e. add tokens without a running CouchDB, I have advised creating a new config file in $PREFIX/etc/couchdb/default.d/ called oauth.ini or similar. This should get read by CouchDB's startup script when it loads its config files (e.g. default.ini and local.ini as well). There are three sections available: i. [oauth_consumer_secrets] consumer_key = consumer_secret ii. [oauth_token_secrets] oauth_token = oauth_token_secret iii. [oauth_token_users] oauth_token = username The format I've used above is [section name] followed by how the keys and values for that section will look on subsequent lines. The secrets are a way for the consumer to prove that it owns the corresponding consumer key or access token. The mapping of auth tokens to usernames is a way to specify which user/roles to give to a consumer with a given access token. In the future we will also store tokens in the user database (see below). 3. OAuth replication. I've extended the JSON sent via POST when initiating a replication as follows: { source: { url: url, auth: { oauth: { consumer_key: oauth_consumer_key, consumer_secret: oauth_consumer_secret, token_secret: oauth_token_secret, token: oauth_token } } }, target: /* same syntax as source, or string for a URL with no auth info, or string for local database name */ } 4. This patch also includes cookie-authentication support to CouchDB. I've covered this here: http://www.jasondavies.com/blog/2009/05/27/secure-cookie-authentication-couchdb/ The cookie-authentication branch is being used on a couple of live sites and the branch has also been worked on by jchris and benoitc. As well as cookie auth it includes the beginnings of support for a per-node user database, with APIs for creating/deleting users etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-370) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed
[ https://issues.apache.org/jira/browse/COUCHDB-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738390#action_12738390 ] Damien Katz commented on COUCHDB-370: - While I agree we should prevent unnecessary VM exits, we can't prevent them all and fortunately the VM is designed to terminate and restart quickly. This is part of CouchDB's design too, restarts are always fast. A correct solution is one or more watchdog processes that watch the VM and the subprocess, and if the VM dies, it kills all the subprocesses and then itself. If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed --- Key: COUCHDB-370 URL: https://issues.apache.org/jira/browse/COUCHDB-370 Project: CouchDB Issue Type: Bug Components: JavaScript View Server Affects Versions: 0.9 Reporter: Damien Katz Priority: Minor If CouchDB dies or is killed, it's subprocess are not forcefully killed. If the subprocesses are in infinite loops, they will never die. We need some kind of external watchdog process, or processes that kill the subprocess automatically if the CouchDB erlang vm dies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-434) 500 error when working with deleted bulk docs
[ https://issues.apache.org/jira/browse/COUCHDB-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-434. --- Resolution: Fixed Fix Version/s: 0.10 Fixed in trunk. 500 error when working with deleted bulk docs - Key: COUCHDB-434 URL: https://issues.apache.org/jira/browse/COUCHDB-434 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Mark Hammond Assignee: Damien Katz Fix For: 0.10 Attachments: bulk_save_500.patch, bulk_save_500.patch When upgrading our app from 0.9 to trunk, I encountered a 500 error attempting to update previously deleted documents. I've hacked together a patch to the test suite which demonstrates an almost identical error, but note: * The test code is misplaced due to that test otherwise failing for me when attempting to compact. It should go either at the end of the test, or into its own test. * As the comments note, the attempt to delete the documents appears to fail with conflict errors - which it probably shouldn't. * If you ignore these conflicts (as the test does), the next attempt to 'resurrect' these docs causes a 500 error with a badmatch exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-421) add longpolling for _changes
[ https://issues.apache.org/jira/browse/COUCHDB-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733361#action_12733361 ] Damien Katz commented on COUCHDB-421: - Style wise this patch looks good, however you must change the indention tabs to spaces. I recommend instead of adding the longpolling_changes call in the code, just reuse the keep_sending_changes call, and after the call to send_changes, add a check if EndSeq StartSeq and the long poll option is on, stop, Also I'm not sure about calling it long_poll, but I don't have a better name myself. add longpolling for _changes Key: COUCHDB-421 URL: https://issues.apache.org/jira/browse/COUCHDB-421 Project: CouchDB Issue Type: New Feature Affects Versions: 0.10 Reporter: Benoit Chesneau Attachments: longpoll.diff Implement longpolling on _changes. Instead of continuous, longpolling hold request until an update is available then close it. The client will have to ask a new connection. Should solve problem for XHR's that don't have that status changed (on ie, opera..) . I've put all the code in my github repo : http://github.com/benoitc/couchdb/tree/longpoll diff against trunk is attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-285) tail append headers
[ https://issues.apache.org/jira/browse/COUCHDB-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-285. --- Resolution: Fixed Fix Version/s: (was: 1.0) 0.10 Assignee: Damien Katz tail append headers --- Key: COUCHDB-285 URL: https://issues.apache.org/jira/browse/COUCHDB-285 Project: CouchDB Issue Type: Improvement Components: Database Core Reporter: Chris Anderson Assignee: Damien Katz Fix For: 0.10 this will make .couch files resilient even when truncated (data-loss but still usable). also cuts down on the # of disk seeks. [3:02pm] jchris damienkatz: the offset in a header would corresponds to number of bytes from the front of the file? [3:03pm] andysky joined the chat room. [3:03pm] damienkatz jchris: yes [3:03pm] jchris because my offset seems to suggest that just over a MB of the file is missing [3:03pm] » jchris blames not couchdb [3:03pm] jchris but the streaming writes you've talked about would make this more resilent, eh? [3:03pm] jchris where the header is also appended each time [3:04pm] jchris there could be data lost but the db would still be usable [3:04pm] damienkatz yes, a file truncation just gives you and earlier version of the file [3:05pm] jchris now's not a good time for me to work on that, but after Amsterdam I may want to pick it up [3:05pm] damienkatz the hardest part is finding the header again [3:06pm] jan hu? isn't the header the firs 4k? [3:06pm] jan t [3:06pm] jchris it would only really change couch_file:read_header and write_header I think [3:06pm] jchris jan: we're talking about moving it to the end [3:06pm] jchris so it never gets overwritten [3:06pm] damienkatz jan: this is for tail append headers [3:06pm] jan duh [3:06pm] jan futuretalk [3:06pm] jan n/m me [3:07pm] damienkatz jchris: so one way is to sign the header regions, but you need to make it unforgable. [3:08pm] jchris basically a boundary problem... [3:08pm] damienkatz because if a client wrote a binary that looked like it had a header, they could do bad things. [3:08pm] jchris like for instance an attachment that's a .couch file :) [3:08pm] damienkatz right [3:09pm] damienkatz so you can salt the db file on creation with a key in the header. And use that key to sign and verify headers. [3:09pm] tlrobinson joined the chat room. [3:09pm] jchris doesn't sound too tough [3:10pm] jan damienkatz: I looked into adding conflict inducing bulk docs in rep_security. would this work: POST /db/_bulk_docs?allow_conflicts=true could do a regular bulk save but grab the error responses and do a update_docs() call with the replicated_changes option for all errors from the first bulk save while assigning new _revs for new docs? [3:10pm] damienkatz the key is crypto-random, and must stay hidden from clients. [3:10pm] jchris if you have the file, you could forge headers... [3:10pm] jchris but under normal operation, it sounds like not a big deal [3:10pm] Qaexl joined the chat room. [3:11pm] jchris so we just give the db an internal secret uuid [3:11pm] mmalone left the chat room. (Connection reset by peer) [3:11pm] peritus_ joined the chat room. [3:11pm] damienkatz I'm not sure I like this approach. [3:11pm] jchris damienkatz: drawbacks? [3:11pm] damienkatz if a client can see a file share with the db, they can attack it. [3:12pm] mmalone joined the chat room. [3:12pm] mmalone left the chat room. (Read error: 104 (Connection reset by peer)) [3:12pm] damienkatz how about this approach. every 4k, we write a NULL byte. [3:13pm] damienkatz we always write headers at the 4k boundary [3:13pm] mmalone joined the chat room. [3:13pm] damienkatz and make that byte 1 [3:13pm] jan grr [3:13pm] jan did my bulk-docs proposal get through? [3:13pm] jchris the attacker could still get lucky [3:13pm] jan (or got it shot down? :) [3:13pm] damienkatz jan: sorry. [3:13pm] jan damienkatz: I couldn't read the backlog [3:13pm] damienkatz Let me think about the conflict stuff a little bit. [3:13pm] jan sure [3:13pm] jan no baclog then [3:14pm] jan +k [3:14pm] jchris jan: your paragraph is dense there - [3:14pm] damienkatz jchris: no, this is immune from attack [3:14pm] jchris because you'd write an attachment marker after the null byte for attachments? [3:14pm] damienkatz every 4k, we just write a 0 byte, we skip that byte. [3:15pm] jan jchris: yeah, sorry, will let you finish the file stuff [3:15pm] damienkatz no matter what, we never write anything into that byte. [3:15pm] jan wasting all these 0 bytes [3:15pm] damienkatz a big file right whil write all the surrounding bytes, but not
[jira] Closed: (COUCHDB-391) Restoring the couchDB (restore Documents Views on different Servers)
[ https://issues.apache.org/jira/browse/COUCHDB-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-391. --- Resolution: Invalid Please ask questions in the appropriate CouchDB mailing list, not in the bug system: http://couchdb.apache.org/community/lists.html Restoring the couchDB (restore Documents Views on different Servers) Key: COUCHDB-391 URL: https://issues.apache.org/jira/browse/COUCHDB-391 Project: CouchDB Issue Type: Question Environment: Microsoft Windows XP Professional version 2002 service Pack2 erlang : 5.6.5 couchDb : 0.9.0 Reporter: Ajay jagdish Pawaskar i want to restore the CouchDB Documents Views from One server to another (on clients server) there is option for replication (remote server)..but i don't have access to that server...(so i can't Replicate DB on that server using Replicator) ..but i need to deploy the Db on that server .. also in future maybe i will be needing the documents from client server for debugging purpose how i can do this? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-377) allow native view servers
[ https://issues.apache.org/jira/browse/COUCHDB-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722757#action_12722757 ] Damien Katz commented on COUCHDB-377: - I think we should not try to avoid the JSON term conversion, as it's the most stable API and it's a low cost conversion, everything is still Erlang terms. It's the serialization to a string and back that's expensive. I think we should go ahead and change the get os_process call to return a record, instead of a tuple, something like: -record(proc, { pid, lang, type, prompt_fun }}. The prompt_fun replaces and wraps couch_os_process:prompt and now we call the prompt_fun with the same args. If it's an OS process, it's the same call to couch_os_process:prompt, otherwise it's a function that calls apply(Mod, Fun, [Pid, Args]) or something like that. The patch otherwise looks good, I don't see any obvious bugs or style problems. allow native view servers - Key: COUCHDB-377 URL: https://issues.apache.org/jira/browse/COUCHDB-377 Project: CouchDB Issue Type: Improvement Reporter: Mark Hammond Attachments: native_query_servers.patch There has been some discussion on IRC etc about how to support 'native' view servers, such as 'erlview' in a generic way. Currently using erlview requires you to modify couch. I'm attaching a patch as a first attempt at supporting this. In summary, the patch now looks up a new 'native_query_servers' config file section for a list of view_server names with a {Module, Func, Args} style string specifying the entry-point of the view server. The code now passes an additional atom around indicating if the PID is 'native' or 'external', and map_docs takes advantage of this to avoid the json step. This patch allows erlview to work for me, but in theory any erlang code could be used here. I'm very new at erlang - please let me know if I should make stylistic or other changes, or indeed if I should take a different approach completely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-263) require valid user for all database operations
[ https://issues.apache.org/jira/browse/COUCHDB-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721425#action_12721425 ] Damien Katz commented on COUCHDB-263: - This patch looks okay, but we actually need something like this at the database level, the ability to say who can and can't access a database, and the ability to disallow anonymous access. require valid user for all database operations -- Key: COUCHDB-263 URL: https://issues.apache.org/jira/browse/COUCHDB-263 Project: CouchDB Issue Type: Improvement Components: HTTP Interface Affects Versions: 0.9 Environment: All platforms. Reporter: Jack Moffitt Priority: Minor Attachments: couchauth.diff Admin accounts currently restrict a few operations, but leave all other operations completely open. Many use cases will require all operations to be authenticated. This can certainly be done by overriding the default_authentication_handler, but I think this very common use case can be handled in default_authentication_handler without increasing the complexity much. Attached is a patch which adds a new config option, require_valid_user, which restricts all operations to authenticated users only. Since CouchDB currently only has admins, this means that all operations are restricted to admins. In a future CouchDB where there are also normal users, the intention is that this would let them pass through as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-263) require valid user for all database operations
[ https://issues.apache.org/jira/browse/COUCHDB-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721435#action_12721435 ] Damien Katz commented on COUCHDB-263: - hmmm, on second thought, we do need this both as a server wide setting and at the database level. However, this check and throwing exceptions for authenticated users should not be done in the authentication function, but by the caller of the auth function, so the setting works with all auth handlers. Also, it would be nice to have a more complete solution with more settings: allowed users, disallowed users and allow anonymous require valid user for all database operations -- Key: COUCHDB-263 URL: https://issues.apache.org/jira/browse/COUCHDB-263 Project: CouchDB Issue Type: Improvement Components: HTTP Interface Affects Versions: 0.9 Environment: All platforms. Reporter: Jack Moffitt Priority: Minor Attachments: couchauth.diff Admin accounts currently restrict a few operations, but leave all other operations completely open. Many use cases will require all operations to be authenticated. This can certainly be done by overriding the default_authentication_handler, but I think this very common use case can be handled in default_authentication_handler without increasing the complexity much. Attached is a patch which adds a new config option, require_valid_user, which restricts all operations to authenticated users only. Since CouchDB currently only has admins, this means that all operations are restricted to admins. In a future CouchDB where there are also normal users, the intention is that this would let them pass through as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-204) CouchDB stops/crashes/hangs (?) after resume from Mac OS X system hibernation and/or stand-by (sleep)
[ https://issues.apache.org/jira/browse/COUCHDB-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715223#action_12715223 ] Damien Katz commented on COUCHDB-204: - It looks like all we need is a special flag passed to the emulator http://www.erlang.org/doc/man/erl.html: +c Disable compensation for sudden changes of system time. Normally, erlang:now/0 will not immediately reflect sudden changes in the system time, in order to keep timers (including receive-after) working. Instead, the time maintained by erlang:now/0 is slowly adjusted towards the new system time. (Slowly means in one percent adjustments; if the time is off by one minute, the time will be adjusted in 100 minutes.) When the +c option is given, this slow adjustment will not take place. Instead erlang:now/0 will always reflect the current system time. Note that timers are based on erlang:now/0. If the system time jumps, timers then time out at the wrong time. CouchDB stops/crashes/hangs (?) after resume from Mac OS X system hibernation and/or stand-by (sleep) --- Key: COUCHDB-204 URL: https://issues.apache.org/jira/browse/COUCHDB-204 Project: CouchDB Issue Type: Bug Components: Administration Console, Database Core, HTTP Interface, Infrastructure Affects Versions: 0.8.1 Environment: Mac OS X 10.5.6 Leopard Reporter: Philipp Schumann Priority: Critical I'm running CouchDB 0.8.1 on Mac OS X 10.5.6 Leopard and after resuming from system hibernation (safe sleep -- by closing and reopening the laptop lid in my case, which is the factory default), the process either refuses all incoming connections, including my own Python scripts, web browser and the Futon, or has stopped running altogether. That is, I don't know which exactly is the case here but the fact is that CouchDB cannot be connected to after resuming. This issue always appears with smart sleep / safe sleep (standby plus hibernation) but only sometimes appears using fast sleep (hibernation turned off, standby only). This isn't a critical issue for server deployments, of course, but one of the core ideas of CouchDB is that eventually it will be deployed even to desktop clients for app data replication across machines, so in this context this *is* a critical issue since you can't ask ordinary Mac OS X users to change their sleep settings from safe to fast using uncomprehensable terminal commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-370) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed
If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed --- Key: COUCHDB-370 URL: https://issues.apache.org/jira/browse/COUCHDB-370 Project: CouchDB Issue Type: Bug Affects Versions: 0.9 Reporter: Damien Katz Priority: Minor If CouchDB dies or is killed, it's subprocess are not forcefully killed. If the subprocesses are in infinite loops, they will never die. We need some kind of external watchdog process that kill the subprocess automatically if the CouchDB erlang vm dies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-371) Need a way to limit memory used by a subprocess
Need a way to limit memory used by a subprocess --- Key: COUCHDB-371 URL: https://issues.apache.org/jira/browse/COUCHDB-371 Project: CouchDB Issue Type: Improvement Components: JavaScript View Server Affects Versions: 0.9 Reporter: Damien Katz Assignee: Damien Katz Priority: Minor We need a way to limit the total memory used by sub processes, such as a view process, so they can not use up all the available memory due to coding error or malicious attack. We can probably do this by setting ulimit in the couchspawnkillable script. -Damien -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-370) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed
[ https://issues.apache.org/jira/browse/COUCHDB-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz updated COUCHDB-370: Component/s: JavaScript View Server Description: If CouchDB dies or is killed, it's subprocess are not forcefully killed. If the subprocesses are in infinite loops, they will never die. We need some kind of external watchdog process, or processes that kill the subprocess automatically if the CouchDB erlang vm dies. (was: If CouchDB dies or is killed, it's subprocess are not forcefully killed. If the subprocesses are in infinite loops, they will never die. We need some kind of external watchdog process that kill the subprocess automatically if the CouchDB erlang vm dies.) If the CouchDB vm is dies or is killed, view subprocesses (js) are not automatically killed --- Key: COUCHDB-370 URL: https://issues.apache.org/jira/browse/COUCHDB-370 Project: CouchDB Issue Type: Bug Components: JavaScript View Server Affects Versions: 0.9 Reporter: Damien Katz Priority: Minor If CouchDB dies or is killed, it's subprocess are not forcefully killed. If the subprocesses are in infinite loops, they will never die. We need some kind of external watchdog process, or processes that kill the subprocess automatically if the CouchDB erlang vm dies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-366) Error Uploading Attachment
[ https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-366. --- Resolution: Fixed Fix Version/s: 0.10 Error Uploading Attachment -- Key: COUCHDB-366 URL: https://issues.apache.org/jira/browse/COUCHDB-366 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.10 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803 Reporter: Ben Browning Fix For: 0.10 Attachments: attachment_traceback.txt, bespin.zip, couchdb-366-test.patch 20:21 davisp damienkatz: uploading a large attachment ends up throwing a function_clause error on split_iolist and the parameter types are binary(), int(), [binary()] Traceback attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-337) attachments from old/conflict revisions are not accessible via standalone API
[ https://issues.apache.org/jira/browse/COUCHDB-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz updated COUCHDB-337: Attachment: replication_test.diff Here is a replication test to show the failures of replicating conflicts with attachments, which doesn't yet pass with attachment_revisions.diff attachments from old/conflict revisions are not accessible via standalone API - Key: COUCHDB-337 URL: https://issues.apache.org/jira/browse/COUCHDB-337 Project: CouchDB Issue Type: Bug Affects Versions: 0.9 Reporter: Adam Kocoloski Fix For: 0.10 Attachments: attachment_revisions.diff, replication_test.diff Couch ignores rev qs parameter for attachment GETs. I believe it should not. Attaching proposed patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-334) With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes
With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes --- Key: COUCHDB-334 URL: https://issues.apache.org/jira/browse/COUCHDB-334 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.9 Reporter: Damien Katz Assignee: Damien Katz Fix For: 0.9.1 By default, CouchDB keeps a maximum of 100 databases open and active. This is controlled by the ini setting max_dbs_open in [couchdb] . This limit controls the number of Erlang server processes that are readily available and hold resources, like file handles, and hold state for deferred commits. Once CouchDB hits the open database limit, it will always close an idle database and files before opening a new database file. The problem is that CouchDB would consider instances to be idle even if they still had deferred commits pending. It would then close the instance and drop it's deferred commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-334) With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes
[ https://issues.apache.org/jira/browse/COUCHDB-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz updated COUCHDB-334: Fix Version/s: (was: 0.10) 0.9.1 With deferred commits and 100+ active dbs, CouchDB can lose uncommitted changes --- Key: COUCHDB-334 URL: https://issues.apache.org/jira/browse/COUCHDB-334 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.9 Reporter: Damien Katz Assignee: Damien Katz Fix For: 0.9.1 By default, CouchDB keeps a maximum of 100 databases open and active. This is controlled by the ini setting max_dbs_open in [couchdb] . This limit controls the number of Erlang server processes that are readily available and hold resources, like file handles, and hold state for deferred commits. Once CouchDB hits the open database limit, it will always close an idle database and files before opening a new database file. The problem is that CouchDB would consider instances to be idle even if they still had deferred commits pending. It would then close the instance and drop it's deferred commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-240) Replication breaks with large Attachments.
[ https://issues.apache.org/jira/browse/COUCHDB-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699333#action_12699333 ] Damien Katz commented on COUCHDB-240: - Adam, I think you are right. If the fix isn't too hairy, we should also add it to 0.9.1. Replication breaks with large Attachments. -- Key: COUCHDB-240 URL: https://issues.apache.org/jira/browse/COUCHDB-240 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.9 Environment: r 741265. Debian Linux unknown revision, FreeBSD 7.0. GBit Network connection between the hosts. Reporter: Maximillian Dornseif Assignee: Adam Kocoloski Fix For: 0.10 I use the code in http://code.google.com/p/couchdb-python/issues/detail?id=54 to do replication between two machines. I'm running 741265 on both machines. I have a Database with big attachments (high-res images, 31.1 GB,34026 Docs). Pull replication breaks with following message sent via http: couchdb.client.ServerError: (500, ('function_clause', [{lists,map,[#Funcouch_rep.10.28922857,ok]},\n {couch_rep,open_doc_revs,4},\n {couch_rep,'-enum_docs_parallel/3-fun-1-',3},\n {couch_rep,'-spawn_worker/3-fun-0-',3}])) With push replication the server just drops the connection (httplib2/__init__.py, line 715, in connect socket.error: (61, 'Connection refused') - why refused instead of closed?). I have only been able to replicate the first 100 documents. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-220) Extreme sparseness in couch files
[ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-220. --- Resolution: Fixed Fix Version/s: 0.10 Extreme sparseness in couch files - Key: COUCHDB-220 URL: https://issues.apache.org/jira/browse/COUCHDB-220 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.9 Environment: ubuntu 8.10 64-bit, ext3 Reporter: Robert Newson Fix For: 0.10 Attachments: 220.patch, 220.patch, attachment_sparseness.js, stream.diff When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge; ls -lh shard0.couch 698M 2009-01-23 13:42 shard0.couch du -sh shard0.couch 57M shard0.couch On filesystems that do not support write holes, this will cause an order of magnitude more I/O. I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'. Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-290) Include sequence number in update notifications
[ https://issues.apache.org/jira/browse/COUCHDB-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695494#action_12695494 ] Damien Katz commented on COUCHDB-290: - I'm actually working on this HTTP functionality right now. Then not only child processes, but remote processes will be able to easily register for notifications over HTTP, and we can present a richer interface. Once in place the std io notifications would be removed completely and we could then use std io for logging and error reporting of the child process. Include sequence number in update notifications --- Key: COUCHDB-290 URL: https://issues.apache.org/jira/browse/COUCHDB-290 Project: CouchDB Issue Type: Improvement Affects Versions: 0.9 Reporter: Elliot Murphy Priority: Minor Fix For: 0.10 Attachments: couchdb-sequences.patch, couchdb-sequences.patch Hi! There's been requests to include the sequence number when sending an update notification. Thanks to the guidance from davisp on #couchdb on March 13th, I've been able to put together a little patch that does just that. In the future I'm interested in doing the same for the create notification, and perhaps extending create/delete/update notifications to include a list of affected doc IDs. For now though, just this simple patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-300) Update Sequence broken
[ https://issues.apache.org/jira/browse/COUCHDB-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688028#action_12688028 ] Damien Katz commented on COUCHDB-300: - Sven, I think you missed the fix checked in for this bug? The fix prevents databases from getting into this state. But if you already have a database in this state, you can fix it by touching all the docs (or just the affected one). It's a one time thing. Update Sequence broken -- Key: COUCHDB-300 URL: https://issues.apache.org/jira/browse/COUCHDB-300 Project: CouchDB Issue Type: Bug Environment: ubuntu hardy Reporter: Sven Helmberger Fix For: 0.9 Attachments: all_docs_by_seq.js, update_seq_kaputt.js Database gets into a state where there is one document but an empty update sequence. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (COUCHDB-221) Test that validation and authorization work properly with replicated edits.
[ https://issues.apache.org/jira/browse/COUCHDB-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz resolved COUCHDB-221. - Resolution: Fixed Assignee: Damien Katz Fixed in trunk as of r753448. Test that validation and authorization work properly with replicated edits. --- Key: COUCHDB-221 URL: https://issues.apache.org/jira/browse/COUCHDB-221 Project: CouchDB Issue Type: Test Components: Test Suite Reporter: Dean Landolt Assignee: Damien Katz Priority: Blocker Fix For: 0.9 Test that the validation and authorization stuff work properly with replicated edits, the same as it does with live edits. This should already work, but its not tested. Also there is a good chance validation/authorization failures might not be handled gracefully by the replicator. It should eat failures, keeping statistics about the failures and maybe a record of the last failure, or last N failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-275) couch crashes erlang vm under heavy load
[ https://issues.apache.org/jira/browse/COUCHDB-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz updated COUCHDB-275: Attachment: term_to_binary_fix.diff This is patch to the Erlang vm for this crash. It fixes a problem with Erlang's term_to_binary code where it blows the C stack on deeply nested terms (ie deep trees). Example, this will crash any unpatched erlang VM: term_to_binary(lists:foldl(fun(E,A) - [E, A] end, [], lists:seq(1, 10))). This patch fixes the Erlang vm by changing the the term_to_binary code from a recursive C implementation to one using it's own stack. couch crashes erlang vm under heavy load Key: COUCHDB-275 URL: https://issues.apache.org/jira/browse/COUCHDB-275 Project: CouchDB Issue Type: Bug Affects Versions: 0.9 Environment: Linux melkjug.com 2.6.23-gentoo-r8 #1 SMP Wed Feb 13 14:28:49 EST 2008 x86_64 QEMU Virtual CPU version 0.9.1 GenuineIntel GNU/Linux Reporter: Joshua Bronson Attachments: 2009-03-05-couch.log.snippet, term_to_binary_fix.diff I clicked Compact in futon for my 11G database at 9:04 AM EST: [Mon, 02 Mar 2009 14:04:32 GMT] [info] [0.59.0] Starting compaction for db melkjug An hour and a half later it was 85% finished and then the following was output to stderr: heart: Mon Mar 2 10:33:20 2009: heart-beat time-out. /usr/bin/couchdb: line 255: echo: write error: Broken pipe heart: Mon Mar 2 10:33:22 2009: Executed /usr/bin/couchdb -k. Terminating. I am retaining my 4.3G melkjug.couch.compact file in case it's useful in debugging this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-190) _uuid should respond to GET, not POST
[ https://issues.apache.org/jira/browse/COUCHDB-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672666#action_12672666 ] Damien Katz commented on COUCHDB-190: - Patch merged to trunk. _uuid should respond to GET, not POST - Key: COUCHDB-190 URL: https://issues.apache.org/jira/browse/COUCHDB-190 Project: CouchDB Issue Type: Improvement Components: Database Core Affects Versions: 0.9 Reporter: Matt Goodall Priority: Blocker Fix For: 0.9 Attachments: COUCH-190.diff The /_uuid resource can happily return a response to a GET without being unresty. In fact, supporting POST is probably incorrect as it implies it would change server state. Quick summary: * _uuid never changes server state * calling _uuid multiple times does not impact other clients * that the resource returns something different each time it is requested does not mean it cannot be a POST * GET with proper cache control (i.e. don't cache it ever) will work equally well Full discussion can be found on the user m.l., http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c21939021.1440421230910477169.javamail.serv...@perfora%3e. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (COUCHDB-238) should throw error on creating docs with illegal private names
[ https://issues.apache.org/jira/browse/COUCHDB-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz resolved COUCHDB-238. - Resolution: Fixed Fixes checked into trunk. should throw error on creating docs with illegal private names -- Key: COUCHDB-238 URL: https://issues.apache.org/jira/browse/COUCHDB-238 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Chris Anderson Priority: Blocker Fix For: 0.9 Attachments: COUCHDB-238.patch currently the only legal _ prefixes are _local and _design. We should test for this and return http errors. The applies to PUT and bulk-docs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-247) The log process should be started before any other process
[ https://issues.apache.org/jira/browse/COUCHDB-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672733#action_12672733 ] Damien Katz commented on COUCHDB-247: - Changing the logging implementation to something more standardized is a very good thing. Right now the log output format is just come random stuff I threw together. The log process should be started before any other process -- Key: COUCHDB-247 URL: https://issues.apache.org/jira/browse/COUCHDB-247 Project: CouchDB Issue Type: Improvement Components: Database Core Environment: Any Reporter: Ulises Cervino Beresi Priority: Minor Processes should be able to log their operations from the very beginning of their existence to avoid io:format() in case they need to do so. Only processes started after the log process will be able to make use of ?LOG_X. See issue 153 (https://issues.apache.org/jira/browse/COUCHDB-153) for a case scenario where it would be desirable for the log process to have been started before the db process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-215) errors when creating and deleting multiple databases
[ https://issues.apache.org/jira/browse/COUCHDB-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Katz closed COUCHDB-215. --- Resolution: Fixed Fix Version/s: 0.9 errors when creating and deleting multiple databases Key: COUCHDB-215 URL: https://issues.apache.org/jira/browse/COUCHDB-215 Project: CouchDB Issue Type: Bug Components: Database Core Environment: OS X 10.4 Erlang latest, CouchDB trunk Reporter: Bob Dionne Fix For: 0.9 creating multiple databases and then deleting them causes couchdb to start throwing exceptions ( http://gist.github.com/49063 ) A cursory debugging session indicates that should_close in couch_file never returns true, the monitors list always contains the pid of the next process. Repeated use of the server in this way eventually makes it unusable. The following JS, http://gist.github.com/48465, will also exhibit the issue using the Futon tests. It can appear with as few as 60 dbs depending on the client and level of concurrency The database names do have slashes in them but this seems irrelevant on the face of it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-197) Replication renders CouchDB unresponsive.
[ https://issues.apache.org/jira/browse/COUCHDB-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12664731#action_12664731 ] Damien Katz commented on COUCHDB-197: - Fyi, I just looked into this issue briefly and the no_scheme errors are from the inets http client and from attempting operations without the http:|https: portion of the uri. Replication renders CouchDB unresponsive. - Key: COUCHDB-197 URL: https://issues.apache.org/jira/browse/COUCHDB-197 Project: CouchDB Issue Type: Bug Components: Database Core Reporter: Maximillian Dornseif I am quite sure this is not the same issue as in COUCHDB-193. Im trying to replicte a somewhat big database {doc_count:541394,doc_del_count:265692,update_seq:2118390,purge_seq:0,compact_running:false,disk_size:16552608803} to an other machine. I started replication with this: send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.xxx:5984\r\nAccept-Encoding: identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n' send: '{source: hulog_events, target: http://couchdb2.local.xxx:5984/hulog_events}' reply: '' connect: (couchdb1.local.hudora.biz, 5984) send: 'POST /_replicate HTTP/1.1\r\nHost: couchdb1.local.:5984\r\nAccept-Encoding: identity\r\ncontent-length: 90\r\ncontent-type: application/json\r\naccept: application/json\r\nuser-agent: couchdb-python 0.5dev-r127\r\n\r\n' send: '{source: hulog_events, target: http://couchdb2.local.:5984/hulog_events}' (no reply so far) On the source server (couchdb1) I see following logentries: Mon, 05 Jan 2009 19:34:21 GMT] [info] [0.12745.45] 192.168.0.30 - - 'POST' /_replicate 200 [Mon, 05 Jan 2009 19:35:36 GMT] [info] [0.107.0] Compaction for db hulog_events_test completed. [Mon, 05 Jan 2009 19:35:45 GMT] [info] [0.12746.45] 127.0.0.1 - - 'GET' /hulog_events/ 200 [Mon, 05 Jan 2009 19:35:46 GMT] [info] [0.95.0] Compaction for db eap completed. [Mon, 05 Jan 2009 19:42:17 GMT] [error] [0.12765.45] ** Generic server 0.12765.45 terminating ** Last message in was {'EXIT',0.12762.45, {timeout, {gen_server,call, [0.12768.45, {write, 0,0,1,36,131,104,2,104,1,108,0,0,0,8,104,2, 109,0,0,0,7,112,114,111,100,117,99,116,109, 0,0,0,8,54,53,49,52,48,47,69,75,104,2,109,0, 0,0,11,116,114,97,110,115,97,99,116,105,111, 110,109,0,0,0,8,114,101,116,114,105,101,118, 101,104,2,109,0,0,0,4,116,121,112,101,109,0, 0,0,4,117,110,105,116,104,2,109,0,0,0,11,97, 114,99,104,105,118,101,100,95,97,116,109,0, 0,0,22,50,48,48,56,48,50,50,50,84,49,50,49, 52,48,53,46,53,50,54,51,56,52,104,2,109,0,0, 0,10,99,114,101,97,116,101,100,95,97,116, 109,0,0,0,22,50,48,48,55,49,49,50,56,84,49, 53,52,50,48,54,46,51,52,52,54,49,56,104,2, 109,0,0,0,4,112,114,111,112,104,1,108,0,0,0, 2,104,2,109,0,0,0,8,108,111,99,97,116,105, 111,110,109,0,0,0,6,65,85,83,76,65,71,104,2, 109,0,0,0,6,104,101,105,103,104,116,98,0,0, 7,158,106,104,2,109,0,0,0,3,109,117,105,109, 0,0,0,18,51,52,48,48,53,57,57,56,49,48,48, 48,48,51,49,50,53,50,104,2,109,0,0,0,8,113, 117,97,110,116,105,116,121,97,11,106,106}]}}} ** When Server state == {file_descriptor,prim_file,{#Port0.904761,24}} ** Reason for termination == ** {timeout,{gen_server,call, [0.12768.45, {write,0,0,1,36,131,104,2,104,1,108,0,0,0,8,104, 2,109,0,0,0,7,112,114,111,100,117,99,116, 109,0,0,0,8,54,53,49,52,48,47,69,75,104, 2,109,0,0,0,11,116,114,97,110,115,97,99, 116,105,111,110,109,0,0,0,8,114,101,116, 114,105,101,118,101,104,2,109,0,0,0,4, 116,121,112,101,109,0,0,0,4,117,110,105, 116,104,2,109,0,0,0,11,97,114,99,104,105, 118,101,100,95,97,116,109,0,0,0,22,50,48,