couch_httpd inconsistency ?
Hi, I notice that we register the http server as `couch_httpd` instead of `http` in processes while for https we are registering it as `https`. Are teher any reason for that? It's a little inconsistent I propose this patch to fix that: diff --git a/src/couchdb/couch_httpd.erl b/src/couchdb/couch_httpd.erl index 97475c5..51e2a11 100644 --- a/src/couchdb/couch_httpd.erl +++ b/src/couchdb/couch_httpd.erl @@ -35,7 +35,7 @@ start_link() -> start_link(http). start_link(http) -> Port = couch_config:get("httpd", "port", "5984"), -start_link(?MODULE, [{port, Port}]); +start_link(http, [{port, Port}]); start_link(https) -> Port = couch_config:get("ssl", "port", "6984"), CertFile = couch_config:get("ssl", "cert_file", nil), thoughts? - benoƮt
Re: Understanding the CouchDB file format
Thank you Robert, fixed. On Wed, Dec 21, 2011 at 1:42 PM, Robert Dionne wrote: > Riyad, > > Your welcome. At a quick glance your post has one error, internal nodes do > contain values (from the reductions). The appendix in the couchdb book also > makes this error[1] which I've opened a ticket for. > > Cheers, > > Bob > > > [1] https://github.com/oreilly/couchdb-guide/issues/450 > > > > > On Dec 21, 2011, at 3:28 PM, Riyad Kalla wrote: > > > Bob, > > > > Really appreciate the link; Rick has a handful of articles that helped a > > lot. > > > > Along side all the CouchDB reading I've been looking at SSD-optimized > data > > storage mechanisms and tried to coalesce all of this information into > this > > post on Couch's file storage format: > > https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv > > > > It is uncanny how many things Couch seems to have gotten right with > regard > > to existing storage systems and future flash-based storage systems. I'd > > appreciate any corrections, additions or feedback to the post for anyone > > interested. > > > > Best, > > R > > > > On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne < > > dio...@dionne-associates.com> wrote: > > > >> I think this is largely correct Riyad, I dug out an old article[1] by > Rick > >> Ho that you may also find helpful though it might be slightly dated. > >> Generally the best performance will be had if the ids are sequential and > >> updates are done in bulk. Write heavy applications will eat up a lot of > >> space and require compaction. At the leaf nodes what are stored are > either > >> full_doc_info records or doc_info records which store pointers to the > data > >> so the main thing that impacts the branching at each level are the key > size > >> and in the case of views the sizes of the reductions as these are stored > >> with the intermediate nodes. > >> > >> All in all it works pretty well but as always you need to test and > >> evaluate it for you specific case to see what the limits are. > >> > >> Regards, > >> > >> Bob > >> > >> > >> [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html > >> > >> > >> > >> > >> On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote: > >> > >>> Adding to this conversation, I found this set of slides by Chris > >> explaining > >>> the append-only index update format: > >>> http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed > >>> > >>> Specifically slides 16, 17 and 18. > >>> > >>> Using this example tree, rewriting the updated path (in reverse order) > >>> appended to the end of the file makes sense... you see how index > queries > >>> can simply read backwards from the end of the file and not only find > the > >>> latest revisions of docs, but also every other doc that wasn't touched > >> (it > >>> will just seek into the existing inner nodes of the b+ tree for > >> searching). > >>> > >>> What I am hoping for clarification on is the following pain-point that > I > >>> perceive with this approach: > >>> > >>> 1. In a sufficiently shallow B+ tree (like CouchDB), the paths > themselves > >>> to elements are short (typically no more than 3 to 5 levels deep) as > >>> opposed to a trie or some other construct that would have much longer > >> paths > >>> to elements. > >>> > >>> 2. Because the depth of the tree is so shallow, the breadth of it > becomes > >>> large to compensate... more specifically, each internal node can have > >> 100s, > >>> 1000s or more children. Using the example slides, consider the nodes > >>> [A...M] and [R...Z] -- in a good sized CouchDB database, those internal > >>> index nodes would have 100s (or more) elements in them pointing at > deeper > >>> internal nodes that themselves had thousands of elements; instead of > the > >> 13 > >>> or so as implied by [A...M]. > >>> > >>> 3. Looking at slide 17 and 18, where you see the direct B+ tree path to > >> the > >>> update node getting appended to the end of the file after the revision > is > >>> written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies > that > >>> those internal nodes with *all* their child elements are getting > >> rewritten > >>> as well. > >>> > >>> In this example tree, it is isn't such a big issue... but in a > >> sufficiently > >>> large CouchDB database, these nodes denoted by [A...M] and [A...Z] > could > >> be > >>> quite large... I don't know the format of the node elements in the B+ > >> tree, > >>> but it would be whatever the size of a node is times however many > >> elements > >>> are contained at each level (1 for root, say 100 for level 2, 1000 for > >>> level 3 and 10,000 for level 4 -- there is a lot of hand-waving going > on > >>> here, of course it depends on the size of the data store). > >>> > >>> Am I missing something or is CouchDB really rewriting that much index > >>> information between document revisions on every update? > >>> > >>> What was previously confusing me is I thought it was *only* rewriting a > >>> direct path to the updated revision
Re: Understanding the CouchDB file format
Riyad, Your welcome. At a quick glance your post has one error, internal nodes do contain values (from the reductions). The appendix in the couchdb book also makes this error[1] which I've opened a ticket for. Cheers, Bob [1] https://github.com/oreilly/couchdb-guide/issues/450 On Dec 21, 2011, at 3:28 PM, Riyad Kalla wrote: > Bob, > > Really appreciate the link; Rick has a handful of articles that helped a > lot. > > Along side all the CouchDB reading I've been looking at SSD-optimized data > storage mechanisms and tried to coalesce all of this information into this > post on Couch's file storage format: > https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv > > It is uncanny how many things Couch seems to have gotten right with regard > to existing storage systems and future flash-based storage systems. I'd > appreciate any corrections, additions or feedback to the post for anyone > interested. > > Best, > R > > On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne < > dio...@dionne-associates.com> wrote: > >> I think this is largely correct Riyad, I dug out an old article[1] by Rick >> Ho that you may also find helpful though it might be slightly dated. >> Generally the best performance will be had if the ids are sequential and >> updates are done in bulk. Write heavy applications will eat up a lot of >> space and require compaction. At the leaf nodes what are stored are either >> full_doc_info records or doc_info records which store pointers to the data >> so the main thing that impacts the branching at each level are the key size >> and in the case of views the sizes of the reductions as these are stored >> with the intermediate nodes. >> >> All in all it works pretty well but as always you need to test and >> evaluate it for you specific case to see what the limits are. >> >> Regards, >> >> Bob >> >> >> [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html >> >> >> >> >> On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote: >> >>> Adding to this conversation, I found this set of slides by Chris >> explaining >>> the append-only index update format: >>> http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed >>> >>> Specifically slides 16, 17 and 18. >>> >>> Using this example tree, rewriting the updated path (in reverse order) >>> appended to the end of the file makes sense... you see how index queries >>> can simply read backwards from the end of the file and not only find the >>> latest revisions of docs, but also every other doc that wasn't touched >> (it >>> will just seek into the existing inner nodes of the b+ tree for >> searching). >>> >>> What I am hoping for clarification on is the following pain-point that I >>> perceive with this approach: >>> >>> 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves >>> to elements are short (typically no more than 3 to 5 levels deep) as >>> opposed to a trie or some other construct that would have much longer >> paths >>> to elements. >>> >>> 2. Because the depth of the tree is so shallow, the breadth of it becomes >>> large to compensate... more specifically, each internal node can have >> 100s, >>> 1000s or more children. Using the example slides, consider the nodes >>> [A...M] and [R...Z] -- in a good sized CouchDB database, those internal >>> index nodes would have 100s (or more) elements in them pointing at deeper >>> internal nodes that themselves had thousands of elements; instead of the >> 13 >>> or so as implied by [A...M]. >>> >>> 3. Looking at slide 17 and 18, where you see the direct B+ tree path to >> the >>> update node getting appended to the end of the file after the revision is >>> written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that >>> those internal nodes with *all* their child elements are getting >> rewritten >>> as well. >>> >>> In this example tree, it is isn't such a big issue... but in a >> sufficiently >>> large CouchDB database, these nodes denoted by [A...M] and [A...Z] could >> be >>> quite large... I don't know the format of the node elements in the B+ >> tree, >>> but it would be whatever the size of a node is times however many >> elements >>> are contained at each level (1 for root, say 100 for level 2, 1000 for >>> level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on >>> here, of course it depends on the size of the data store). >>> >>> Am I missing something or is CouchDB really rewriting that much index >>> information between document revisions on every update? >>> >>> What was previously confusing me is I thought it was *only* rewriting a >>> direct path to the updated revision, like [B]>[E]>[J'] and Couch was >>> some-how patching in that updated path info to the B+ index at runtime. >>> >>> If couch is rewriting entire node paths with all their elements then I am >>> no longer confused about the B+ index updates, but am curious about the >>> on-disk cost of this. >>> >>> In my own rough inserti
Re: Understanding the CouchDB file format
Bob, Really appreciate the link; Rick has a handful of articles that helped a lot. Along side all the CouchDB reading I've been looking at SSD-optimized data storage mechanisms and tried to coalesce all of this information into this post on Couch's file storage format: https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv It is uncanny how many things Couch seems to have gotten right with regard to existing storage systems and future flash-based storage systems. I'd appreciate any corrections, additions or feedback to the post for anyone interested. Best, R On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne < dio...@dionne-associates.com> wrote: > I think this is largely correct Riyad, I dug out an old article[1] by Rick > Ho that you may also find helpful though it might be slightly dated. > Generally the best performance will be had if the ids are sequential and > updates are done in bulk. Write heavy applications will eat up a lot of > space and require compaction. At the leaf nodes what are stored are either > full_doc_info records or doc_info records which store pointers to the data > so the main thing that impacts the branching at each level are the key size > and in the case of views the sizes of the reductions as these are stored > with the intermediate nodes. > > All in all it works pretty well but as always you need to test and > evaluate it for you specific case to see what the limits are. > > Regards, > > Bob > > > [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html > > > > > On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote: > > > Adding to this conversation, I found this set of slides by Chris > explaining > > the append-only index update format: > > http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed > > > > Specifically slides 16, 17 and 18. > > > > Using this example tree, rewriting the updated path (in reverse order) > > appended to the end of the file makes sense... you see how index queries > > can simply read backwards from the end of the file and not only find the > > latest revisions of docs, but also every other doc that wasn't touched > (it > > will just seek into the existing inner nodes of the b+ tree for > searching). > > > > What I am hoping for clarification on is the following pain-point that I > > perceive with this approach: > > > > 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves > > to elements are short (typically no more than 3 to 5 levels deep) as > > opposed to a trie or some other construct that would have much longer > paths > > to elements. > > > > 2. Because the depth of the tree is so shallow, the breadth of it becomes > > large to compensate... more specifically, each internal node can have > 100s, > > 1000s or more children. Using the example slides, consider the nodes > > [A...M] and [R...Z] -- in a good sized CouchDB database, those internal > > index nodes would have 100s (or more) elements in them pointing at deeper > > internal nodes that themselves had thousands of elements; instead of the > 13 > > or so as implied by [A...M]. > > > > 3. Looking at slide 17 and 18, where you see the direct B+ tree path to > the > > update node getting appended to the end of the file after the revision is > > written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that > > those internal nodes with *all* their child elements are getting > rewritten > > as well. > > > > In this example tree, it is isn't such a big issue... but in a > sufficiently > > large CouchDB database, these nodes denoted by [A...M] and [A...Z] could > be > > quite large... I don't know the format of the node elements in the B+ > tree, > > but it would be whatever the size of a node is times however many > elements > > are contained at each level (1 for root, say 100 for level 2, 1000 for > > level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on > > here, of course it depends on the size of the data store). > > > > Am I missing something or is CouchDB really rewriting that much index > > information between document revisions on every update? > > > > What was previously confusing me is I thought it was *only* rewriting a > > direct path to the updated revision, like [B]>[E]>[J'] and Couch was > > some-how patching in that updated path info to the B+ index at runtime. > > > > If couch is rewriting entire node paths with all their elements then I am > > no longer confused about the B+ index updates, but am curious about the > > on-disk cost of this. > > > > In my own rough insertion testing, that would explain why I see my > > collections absolutely explode in size until they are compacted (not > using > > bulk insert, but intentionally doing single inserts for a million(s) of > > docs to see what kind of cost the index path duplication would be like). > > > > Can anyone confirm/deny/correct this assessment? I want to make sure I am > > on the right track understanding this. > > > > Best wishes, > > Riyad > >
Re: Understanding the CouchDB file format
I think this is largely correct Riyad, I dug out an old article[1] by Rick Ho that you may also find helpful though it might be slightly dated. Generally the best performance will be had if the ids are sequential and updates are done in bulk. Write heavy applications will eat up a lot of space and require compaction. At the leaf nodes what are stored are either full_doc_info records or doc_info records which store pointers to the data so the main thing that impacts the branching at each level are the key size and in the case of views the sizes of the reductions as these are stored with the intermediate nodes. All in all it works pretty well but as always you need to test and evaluate it for you specific case to see what the limits are. Regards, Bob [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote: > Adding to this conversation, I found this set of slides by Chris explaining > the append-only index update format: > http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed > > Specifically slides 16, 17 and 18. > > Using this example tree, rewriting the updated path (in reverse order) > appended to the end of the file makes sense... you see how index queries > can simply read backwards from the end of the file and not only find the > latest revisions of docs, but also every other doc that wasn't touched (it > will just seek into the existing inner nodes of the b+ tree for searching). > > What I am hoping for clarification on is the following pain-point that I > perceive with this approach: > > 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves > to elements are short (typically no more than 3 to 5 levels deep) as > opposed to a trie or some other construct that would have much longer paths > to elements. > > 2. Because the depth of the tree is so shallow, the breadth of it becomes > large to compensate... more specifically, each internal node can have 100s, > 1000s or more children. Using the example slides, consider the nodes > [A...M] and [R...Z] -- in a good sized CouchDB database, those internal > index nodes would have 100s (or more) elements in them pointing at deeper > internal nodes that themselves had thousands of elements; instead of the 13 > or so as implied by [A...M]. > > 3. Looking at slide 17 and 18, where you see the direct B+ tree path to the > update node getting appended to the end of the file after the revision is > written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that > those internal nodes with *all* their child elements are getting rewritten > as well. > > In this example tree, it is isn't such a big issue... but in a sufficiently > large CouchDB database, these nodes denoted by [A...M] and [A...Z] could be > quite large... I don't know the format of the node elements in the B+ tree, > but it would be whatever the size of a node is times however many elements > are contained at each level (1 for root, say 100 for level 2, 1000 for > level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on > here, of course it depends on the size of the data store). > > Am I missing something or is CouchDB really rewriting that much index > information between document revisions on every update? > > What was previously confusing me is I thought it was *only* rewriting a > direct path to the updated revision, like [B]>[E]>[J'] and Couch was > some-how patching in that updated path info to the B+ index at runtime. > > If couch is rewriting entire node paths with all their elements then I am > no longer confused about the B+ index updates, but am curious about the > on-disk cost of this. > > In my own rough insertion testing, that would explain why I see my > collections absolutely explode in size until they are compacted (not using > bulk insert, but intentionally doing single inserts for a million(s) of > docs to see what kind of cost the index path duplication would be like). > > Can anyone confirm/deny/correct this assessment? I want to make sure I am > on the right track understanding this. > > Best wishes, > Riyad > > On Tue, Dec 20, 2011 at 6:13 PM, Riyad Kalla wrote: > >> @Filipe - I was just not clear on how CouchDB operated; you and Robert >> cleared that up for me. Thank you. >> >> @Robert - The writeup is excellent so far (I am not familiar with erlang, >> so there is a bit of stickiness there), thank you for taking the time to >> put this together! >> >> At this point I am curious how the _id and _seq indices are read as their >> data is continually appended to the end of the data file in small >> diff-trees for every updated doc. >> >> If CouchDB kept all the indices in-memory and simply patched-in the >> updated paths at runtime (maybe something akin to memory-mapped indices in >> MongoDB) I would be fairly clear on the operation... but as I understand >> it, CouchDB keeps such a small memory footprint by doing no in-memory >> ca
Re: Understanding the CouchDB file format
Adding to this conversation, I found this set of slides by Chris explaining the append-only index update format: http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed Specifically slides 16, 17 and 18. Using this example tree, rewriting the updated path (in reverse order) appended to the end of the file makes sense... you see how index queries can simply read backwards from the end of the file and not only find the latest revisions of docs, but also every other doc that wasn't touched (it will just seek into the existing inner nodes of the b+ tree for searching). What I am hoping for clarification on is the following pain-point that I perceive with this approach: 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves to elements are short (typically no more than 3 to 5 levels deep) as opposed to a trie or some other construct that would have much longer paths to elements. 2. Because the depth of the tree is so shallow, the breadth of it becomes large to compensate... more specifically, each internal node can have 100s, 1000s or more children. Using the example slides, consider the nodes [A...M] and [R...Z] -- in a good sized CouchDB database, those internal index nodes would have 100s (or more) elements in them pointing at deeper internal nodes that themselves had thousands of elements; instead of the 13 or so as implied by [A...M]. 3. Looking at slide 17 and 18, where you see the direct B+ tree path to the update node getting appended to the end of the file after the revision is written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that those internal nodes with *all* their child elements are getting rewritten as well. In this example tree, it is isn't such a big issue... but in a sufficiently large CouchDB database, these nodes denoted by [A...M] and [A...Z] could be quite large... I don't know the format of the node elements in the B+ tree, but it would be whatever the size of a node is times however many elements are contained at each level (1 for root, say 100 for level 2, 1000 for level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on here, of course it depends on the size of the data store). Am I missing something or is CouchDB really rewriting that much index information between document revisions on every update? What was previously confusing me is I thought it was *only* rewriting a direct path to the updated revision, like [B]>[E]>[J'] and Couch was some-how patching in that updated path info to the B+ index at runtime. If couch is rewriting entire node paths with all their elements then I am no longer confused about the B+ index updates, but am curious about the on-disk cost of this. In my own rough insertion testing, that would explain why I see my collections absolutely explode in size until they are compacted (not using bulk insert, but intentionally doing single inserts for a million(s) of docs to see what kind of cost the index path duplication would be like). Can anyone confirm/deny/correct this assessment? I want to make sure I am on the right track understanding this. Best wishes, Riyad On Tue, Dec 20, 2011 at 6:13 PM, Riyad Kalla wrote: > @Filipe - I was just not clear on how CouchDB operated; you and Robert > cleared that up for me. Thank you. > > @Robert - The writeup is excellent so far (I am not familiar with erlang, > so there is a bit of stickiness there), thank you for taking the time to > put this together! > > At this point I am curious how the _id and _seq indices are read as their > data is continually appended to the end of the data file in small > diff-trees for every updated doc. > > If CouchDB kept all the indices in-memory and simply patched-in the > updated paths at runtime (maybe something akin to memory-mapped indices in > MongoDB) I would be fairly clear on the operation... but as I understand > it, CouchDB keeps such a small memory footprint by doing no in-memory > caching and relying on the intelligence of the OS and filesystem (and/or > drives) to cache frequently accessed data. > > I am trying to understand the logic used by CouchDB to answer a query > using the index once updates to the tree have been appended to the data > file... for example, consider a CouchDB datastore like the one Filipe > has... 10 million documents and let's say it is freshly compacted. > > If I send in a request to that Couch instance, it hits the header of the > data file along with the index and walks the B+ tree to the leaf node, > where it finds the offset into the data file where the actual doc lives... > let's say 1,000,000 bytes away. > > These B+ trees are shallow, so it might look something like this: > > Level 1: 1 node, root node. > Level 2: 100 nodes, inner child nodes > Level 3: 10,000 nodes, inner child nodes > Level 4: 1,000,000, leaf nodes... all with pointers to the data offsets in > the data file. > > Now let's say I write 10 updates to documents in that file. There are 10 > new revisions append
[jira] [Updated] (COUCHDB-1347) Filtered replication does not work when a target document is purged
[ https://issues.apache.org/jira/browse/COUCHDB-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin ter Kuile updated COUCHDB-1347: Attachment: passing_replication.log failing_replication.log Running the test script. Difference between failing and passing is specifying the filter in the replication command > Filtered replication does not work when a target document is purged > --- > > Key: COUCHDB-1347 > URL: https://issues.apache.org/jira/browse/COUCHDB-1347 > Project: CouchDB > Issue Type: Bug > Components: Replication >Affects Versions: 1.0.1, 1.1 > Environment: OS X Lion: {"couchdb":"Welcome","version":"1.1.0"} (brew > installation) > Ubuntu 11.04 {"couchdb":"Welcome","version":"1.0.1"} >Reporter: Benjamin ter Kuile > Labels: purge, replication > Attachments: failing_replication.log, passing_replication.log > > > When a document with an id is deleted and purged, and a replication process > tries to create a document with that id, it does not happen. The replication > without the filter works. Ruby test script: > require 'rubygems' > require 'couchrest' > # setup > server = CouchRest.new("http://localhost:5984";) > a = server.database('a') > b = server.database('b') > a.recreate! > b.recreate! > # Add a document doc1 to database a and b > a.save_doc("_id" => 'doc1') > b_doc1 = b.save_doc("_id" => 'doc1') > # Delete and purge doc1 from b > b.delete_doc("_id" => 'doc1', "_rev" => b_doc1['rev']) > RestClient.post(File.join(b.root, '_purge'), {'doc1' => > [b_doc1['rev']]}.to_json, :content_type => :json ) > # Add design with filter > design = a.save_doc("_id" => "_design/temp", "filters" => {"test" => > %|function(doc, req){if(['doc1'].indexOf(doc['_id']) >= 0){return > true;}{return false;}}|}) > # Replicate and wait for finish > RestClient.post("http://localhost:5984/_replicate";, {:source => a.root, > :filter => "temp/test", :target => b.root}.to_json, :content_type => :json) > sleep(0.01) while > JSON.parse(RestClient.get("http://localhost:5984/_active_tasks";)).size > 0 > abort "oops" unless b.all_docs['total_rows'] == 1 > puts "Test successful" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira