couch_httpd inconsistency ?

2011-12-21 Thread Benoit Chesneau
Hi,

I notice that we register the http server as `couch_httpd` instead of
`http` in processes while for https we are registering it as `https`.
Are teher any reason for that? It's a little inconsistent

I propose this patch to fix that:

diff --git a/src/couchdb/couch_httpd.erl b/src/couchdb/couch_httpd.erl
index 97475c5..51e2a11 100644
--- a/src/couchdb/couch_httpd.erl
+++ b/src/couchdb/couch_httpd.erl
@@ -35,7 +35,7 @@ start_link() ->
 start_link(http).
 start_link(http) ->
 Port = couch_config:get("httpd", "port", "5984"),
-start_link(?MODULE, [{port, Port}]);
+start_link(http, [{port, Port}]);
 start_link(https) ->
 Port = couch_config:get("ssl", "port", "6984"),
 CertFile = couch_config:get("ssl", "cert_file", nil),


thoughts?

- benoƮt


Re: Understanding the CouchDB file format

2011-12-21 Thread Riyad Kalla
Thank you Robert, fixed.

On Wed, Dec 21, 2011 at 1:42 PM, Robert Dionne  wrote:

> Riyad,
>
> Your welcome. At a quick glance your post has one error, internal nodes do
> contain values (from the reductions). The appendix in the couchdb book also
> makes this error[1] which I've opened a ticket for.
>
> Cheers,
>
> Bob
>
>
> [1] https://github.com/oreilly/couchdb-guide/issues/450
>
>
>
>
> On Dec 21, 2011, at 3:28 PM, Riyad Kalla wrote:
>
> > Bob,
> >
> > Really appreciate the link; Rick has a handful of articles that helped a
> > lot.
> >
> > Along side all the CouchDB reading I've been looking at SSD-optimized
> data
> > storage mechanisms and tried to coalesce all of this information into
> this
> > post on Couch's file storage format:
> > https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv
> >
> > It is uncanny how many things Couch seems to have gotten right with
> regard
> > to existing storage systems and future flash-based storage systems. I'd
> > appreciate any corrections, additions or feedback to the post for anyone
> > interested.
> >
> > Best,
> > R
> >
> > On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne <
> > dio...@dionne-associates.com> wrote:
> >
> >> I think this is largely correct Riyad, I dug out an old article[1] by
> Rick
> >> Ho that you may also find helpful though it might be slightly dated.
> >> Generally the best performance will be had if the ids are sequential and
> >> updates are done in bulk. Write heavy applications will eat up a lot of
> >> space and require compaction. At the leaf nodes what are stored are
> either
> >> full_doc_info records or doc_info records which store pointers to the
> data
> >> so the main thing that impacts the branching at each level are the key
> size
> >> and in the case of views the sizes of the reductions as these are stored
> >> with the intermediate nodes.
> >>
> >> All in all it works pretty well but as always you need to test and
> >> evaluate it for you specific case to see what the limits are.
> >>
> >> Regards,
> >>
> >> Bob
> >>
> >>
> >> [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html
> >>
> >>
> >>
> >>
> >> On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote:
> >>
> >>> Adding to this conversation, I found this set of slides by Chris
> >> explaining
> >>> the append-only index update format:
> >>> http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed
> >>>
> >>> Specifically slides 16, 17 and 18.
> >>>
> >>> Using this example tree, rewriting the updated path (in reverse order)
> >>> appended to the end of the file makes sense... you see how index
> queries
> >>> can simply read backwards from the end of the file and not only find
> the
> >>> latest revisions of docs, but also every other doc that wasn't touched
> >> (it
> >>> will just seek into the existing inner nodes of the b+ tree for
> >> searching).
> >>>
> >>> What I am hoping for clarification on is the following pain-point that
> I
> >>> perceive with this approach:
> >>>
> >>> 1. In a sufficiently shallow B+ tree (like CouchDB), the paths
> themselves
> >>> to elements are short (typically no more than 3 to 5 levels deep) as
> >>> opposed to a trie or some other construct that would have much longer
> >> paths
> >>> to elements.
> >>>
> >>> 2. Because the depth of the tree is so shallow, the breadth of it
> becomes
> >>> large to compensate... more specifically, each internal node can have
> >> 100s,
> >>> 1000s or more children. Using the example slides, consider the nodes
> >>> [A...M] and [R...Z] -- in a good sized CouchDB database, those internal
> >>> index nodes would have 100s (or more) elements in them pointing at
> deeper
> >>> internal nodes that themselves had thousands of elements; instead of
> the
> >> 13
> >>> or so as implied by [A...M].
> >>>
> >>> 3. Looking at slide 17 and 18, where you see the direct B+ tree path to
> >> the
> >>> update node getting appended to the end of the file after the revision
> is
> >>> written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies
> that
> >>> those internal nodes with *all* their child elements are getting
> >> rewritten
> >>> as well.
> >>>
> >>> In this example tree, it is isn't such a big issue... but in a
> >> sufficiently
> >>> large CouchDB database, these nodes denoted by [A...M] and [A...Z]
> could
> >> be
> >>> quite large... I don't know the format of the node elements in the B+
> >> tree,
> >>> but it would be whatever the size of a node is times however many
> >> elements
> >>> are contained at each level (1 for root, say 100 for level 2, 1000 for
> >>> level 3 and 10,000 for level 4 -- there is a lot of hand-waving going
> on
> >>> here, of course it depends on the size of the data store).
> >>>
> >>> Am I missing something or is CouchDB really rewriting that much index
> >>> information between document revisions on every update?
> >>>
> >>> What was previously confusing me is I thought it was *only* rewriting a
> >>> direct path to the updated revision

Re: Understanding the CouchDB file format

2011-12-21 Thread Robert Dionne
Riyad,

Your welcome. At a quick glance your post has one error, internal nodes do 
contain values (from the reductions). The appendix in the couchdb book also 
makes this error[1] which I've opened a ticket for.

Cheers,

Bob


[1] https://github.com/oreilly/couchdb-guide/issues/450




On Dec 21, 2011, at 3:28 PM, Riyad Kalla wrote:

> Bob,
> 
> Really appreciate the link; Rick has a handful of articles that helped a
> lot.
> 
> Along side all the CouchDB reading I've been looking at SSD-optimized data
> storage mechanisms and tried to coalesce all of this information into this
> post on Couch's file storage format:
> https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv
> 
> It is uncanny how many things Couch seems to have gotten right with regard
> to existing storage systems and future flash-based storage systems. I'd
> appreciate any corrections, additions or feedback to the post for anyone
> interested.
> 
> Best,
> R
> 
> On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne <
> dio...@dionne-associates.com> wrote:
> 
>> I think this is largely correct Riyad, I dug out an old article[1] by Rick
>> Ho that you may also find helpful though it might be slightly dated.
>> Generally the best performance will be had if the ids are sequential and
>> updates are done in bulk. Write heavy applications will eat up a lot of
>> space and require compaction. At the leaf nodes what are stored are either
>> full_doc_info records or doc_info records which store pointers to the data
>> so the main thing that impacts the branching at each level are the key size
>> and in the case of views the sizes of the reductions as these are stored
>> with the intermediate nodes.
>> 
>> All in all it works pretty well but as always you need to test and
>> evaluate it for you specific case to see what the limits are.
>> 
>> Regards,
>> 
>> Bob
>> 
>> 
>> [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html
>> 
>> 
>> 
>> 
>> On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote:
>> 
>>> Adding to this conversation, I found this set of slides by Chris
>> explaining
>>> the append-only index update format:
>>> http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed
>>> 
>>> Specifically slides 16, 17 and 18.
>>> 
>>> Using this example tree, rewriting the updated path (in reverse order)
>>> appended to the end of the file makes sense... you see how index queries
>>> can simply read backwards from the end of the file and not only find the
>>> latest revisions of docs, but also every other doc that wasn't touched
>> (it
>>> will just seek into the existing inner nodes of the b+ tree for
>> searching).
>>> 
>>> What I am hoping for clarification on is the following pain-point that I
>>> perceive with this approach:
>>> 
>>> 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves
>>> to elements are short (typically no more than 3 to 5 levels deep) as
>>> opposed to a trie or some other construct that would have much longer
>> paths
>>> to elements.
>>> 
>>> 2. Because the depth of the tree is so shallow, the breadth of it becomes
>>> large to compensate... more specifically, each internal node can have
>> 100s,
>>> 1000s or more children. Using the example slides, consider the nodes
>>> [A...M] and [R...Z] -- in a good sized CouchDB database, those internal
>>> index nodes would have 100s (or more) elements in them pointing at deeper
>>> internal nodes that themselves had thousands of elements; instead of the
>> 13
>>> or so as implied by [A...M].
>>> 
>>> 3. Looking at slide 17 and 18, where you see the direct B+ tree path to
>> the
>>> update node getting appended to the end of the file after the revision is
>>> written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that
>>> those internal nodes with *all* their child elements are getting
>> rewritten
>>> as well.
>>> 
>>> In this example tree, it is isn't such a big issue... but in a
>> sufficiently
>>> large CouchDB database, these nodes denoted by [A...M] and [A...Z] could
>> be
>>> quite large... I don't know the format of the node elements in the B+
>> tree,
>>> but it would be whatever the size of a node is times however many
>> elements
>>> are contained at each level (1 for root, say 100 for level 2, 1000 for
>>> level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on
>>> here, of course it depends on the size of the data store).
>>> 
>>> Am I missing something or is CouchDB really rewriting that much index
>>> information between document revisions on every update?
>>> 
>>> What was previously confusing me is I thought it was *only* rewriting a
>>> direct path to the updated revision, like [B]>[E]>[J'] and Couch was
>>> some-how patching in that updated path info to the B+ index at runtime.
>>> 
>>> If couch is rewriting entire node paths with all their elements then I am
>>> no longer confused about the B+ index updates, but am curious about the
>>> on-disk cost of this.
>>> 
>>> In my own rough inserti

Re: Understanding the CouchDB file format

2011-12-21 Thread Riyad Kalla
Bob,

Really appreciate the link; Rick has a handful of articles that helped a
lot.

Along side all the CouchDB reading I've been looking at SSD-optimized data
storage mechanisms and tried to coalesce all of this information into this
post on Couch's file storage format:
https://plus.google.com/u/0/107397941677313236670/posts/CyvwRcvh4vv

It is uncanny how many things Couch seems to have gotten right with regard
to existing storage systems and future flash-based storage systems. I'd
appreciate any corrections, additions or feedback to the post for anyone
interested.

Best,
R

On Wed, Dec 21, 2011 at 12:53 PM, Robert Dionne <
dio...@dionne-associates.com> wrote:

> I think this is largely correct Riyad, I dug out an old article[1] by Rick
> Ho that you may also find helpful though it might be slightly dated.
> Generally the best performance will be had if the ids are sequential and
> updates are done in bulk. Write heavy applications will eat up a lot of
> space and require compaction. At the leaf nodes what are stored are either
> full_doc_info records or doc_info records which store pointers to the data
> so the main thing that impacts the branching at each level are the key size
> and in the case of views the sizes of the reductions as these are stored
> with the intermediate nodes.
>
> All in all it works pretty well but as always you need to test and
> evaluate it for you specific case to see what the limits are.
>
> Regards,
>
> Bob
>
>
> [1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html
>
>
>
>
> On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote:
>
> > Adding to this conversation, I found this set of slides by Chris
> explaining
> > the append-only index update format:
> > http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed
> >
> > Specifically slides 16, 17 and 18.
> >
> > Using this example tree, rewriting the updated path (in reverse order)
> > appended to the end of the file makes sense... you see how index queries
> > can simply read backwards from the end of the file and not only find the
> > latest revisions of docs, but also every other doc that wasn't touched
> (it
> > will just seek into the existing inner nodes of the b+ tree for
> searching).
> >
> > What I am hoping for clarification on is the following pain-point that I
> > perceive with this approach:
> >
> > 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves
> > to elements are short (typically no more than 3 to 5 levels deep) as
> > opposed to a trie or some other construct that would have much longer
> paths
> > to elements.
> >
> > 2. Because the depth of the tree is so shallow, the breadth of it becomes
> > large to compensate... more specifically, each internal node can have
> 100s,
> > 1000s or more children. Using the example slides, consider the nodes
> > [A...M] and [R...Z] -- in a good sized CouchDB database, those internal
> > index nodes would have 100s (or more) elements in them pointing at deeper
> > internal nodes that themselves had thousands of elements; instead of the
> 13
> > or so as implied by [A...M].
> >
> > 3. Looking at slide 17 and 18, where you see the direct B+ tree path to
> the
> > update node getting appended to the end of the file after the revision is
> > written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that
> > those internal nodes with *all* their child elements are getting
> rewritten
> > as well.
> >
> > In this example tree, it is isn't such a big issue... but in a
> sufficiently
> > large CouchDB database, these nodes denoted by [A...M] and [A...Z] could
> be
> > quite large... I don't know the format of the node elements in the B+
> tree,
> > but it would be whatever the size of a node is times however many
> elements
> > are contained at each level (1 for root, say 100 for level 2, 1000 for
> > level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on
> > here, of course it depends on the size of the data store).
> >
> > Am I missing something or is CouchDB really rewriting that much index
> > information between document revisions on every update?
> >
> > What was previously confusing me is I thought it was *only* rewriting a
> > direct path to the updated revision, like [B]>[E]>[J'] and Couch was
> > some-how patching in that updated path info to the B+ index at runtime.
> >
> > If couch is rewriting entire node paths with all their elements then I am
> > no longer confused about the B+ index updates, but am curious about the
> > on-disk cost of this.
> >
> > In my own rough insertion testing, that would explain why I see my
> > collections absolutely explode in size until they are compacted (not
> using
> > bulk insert, but intentionally doing single inserts for a million(s) of
> > docs to see what kind of cost the index path duplication would be like).
> >
> > Can anyone confirm/deny/correct this assessment? I want to make sure I am
> > on the right track understanding this.
> >
> > Best wishes,
> > Riyad
> >

Re: Understanding the CouchDB file format

2011-12-21 Thread Robert Dionne
I think this is largely correct Riyad, I dug out an old article[1] by Rick Ho 
that you may also find helpful though it might be slightly dated. Generally the 
best performance will be had if the ids are sequential and updates are done in 
bulk. Write heavy applications will eat up a lot of space and require 
compaction. At the leaf nodes what are stored are either full_doc_info records 
or doc_info records which store pointers to the data so the main thing that 
impacts the branching at each level are the key size and in the case of views 
the sizes of the reductions as these are stored with the intermediate nodes.

All in all it works pretty well but as always you need to test and evaluate it 
for you specific case to see what the limits are.

Regards,

Bob


[1] http://horicky.blogspot.com/2008/10/couchdb-implementation.html




On Dec 21, 2011, at 2:17 PM, Riyad Kalla wrote:

> Adding to this conversation, I found this set of slides by Chris explaining
> the append-only index update format:
> http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed
> 
> Specifically slides 16, 17 and 18.
> 
> Using this example tree, rewriting the updated path (in reverse order)
> appended to the end of the file makes sense... you see how index queries
> can simply read backwards from the end of the file and not only find the
> latest revisions of docs, but also every other doc that wasn't touched (it
> will just seek into the existing inner nodes of the b+ tree for searching).
> 
> What I am hoping for clarification on is the following pain-point that I
> perceive with this approach:
> 
> 1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves
> to elements are short (typically no more than 3 to 5 levels deep) as
> opposed to a trie or some other construct that would have much longer paths
> to elements.
> 
> 2. Because the depth of the tree is so shallow, the breadth of it becomes
> large to compensate... more specifically, each internal node can have 100s,
> 1000s or more children. Using the example slides, consider the nodes
> [A...M] and [R...Z] -- in a good sized CouchDB database, those internal
> index nodes would have 100s (or more) elements in them pointing at deeper
> internal nodes that themselves had thousands of elements; instead of the 13
> or so as implied by [A...M].
> 
> 3. Looking at slide 17 and 18, where you see the direct B+ tree path to the
> update node getting appended to the end of the file after the revision is
> written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that
> those internal nodes with *all* their child elements are getting rewritten
> as well.
> 
> In this example tree, it is isn't such a big issue... but in a sufficiently
> large CouchDB database, these nodes denoted by [A...M] and [A...Z] could be
> quite large... I don't know the format of the node elements in the B+ tree,
> but it would be whatever the size of a node is times however many elements
> are contained at each level (1 for root, say 100 for level 2, 1000 for
> level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on
> here, of course it depends on the size of the data store).
> 
> Am I missing something or is CouchDB really rewriting that much index
> information between document revisions on every update?
> 
> What was previously confusing me is I thought it was *only* rewriting a
> direct path to the updated revision, like [B]>[E]>[J'] and Couch was
> some-how patching in that updated path info to the B+ index at runtime.
> 
> If couch is rewriting entire node paths with all their elements then I am
> no longer confused about the B+ index updates, but am curious about the
> on-disk cost of this.
> 
> In my own rough insertion testing, that would explain why I see my
> collections absolutely explode in size until they are compacted (not using
> bulk insert, but intentionally doing single inserts for a million(s) of
> docs to see what kind of cost the index path duplication would be like).
> 
> Can anyone confirm/deny/correct this assessment? I want to make sure I am
> on the right track understanding this.
> 
> Best wishes,
> Riyad
> 
> On Tue, Dec 20, 2011 at 6:13 PM, Riyad Kalla  wrote:
> 
>> @Filipe - I was just not clear on how CouchDB operated; you and Robert
>> cleared that up for me. Thank you.
>> 
>> @Robert - The writeup is excellent so far (I am not familiar with erlang,
>> so there is a bit of stickiness there), thank you for taking the time to
>> put this together!
>> 
>> At this point I am curious how the _id and _seq indices are read as their
>> data is continually appended to the end of the data file in small
>> diff-trees for every updated doc.
>> 
>> If CouchDB kept all the indices in-memory and simply patched-in the
>> updated paths at runtime (maybe something akin to memory-mapped indices in
>> MongoDB) I would be fairly clear on the operation... but as I understand
>> it, CouchDB keeps such a small memory footprint by doing no in-memory
>> ca

Re: Understanding the CouchDB file format

2011-12-21 Thread Riyad Kalla
Adding to this conversation, I found this set of slides by Chris explaining
the append-only index update format:
http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed

Specifically slides 16, 17 and 18.

Using this example tree, rewriting the updated path (in reverse order)
appended to the end of the file makes sense... you see how index queries
can simply read backwards from the end of the file and not only find the
latest revisions of docs, but also every other doc that wasn't touched (it
will just seek into the existing inner nodes of the b+ tree for searching).

What I am hoping for clarification on is the following pain-point that I
perceive with this approach:

1. In a sufficiently shallow B+ tree (like CouchDB), the paths themselves
to elements are short (typically no more than 3 to 5 levels deep) as
opposed to a trie or some other construct that would have much longer paths
to elements.

2. Because the depth of the tree is so shallow, the breadth of it becomes
large to compensate... more specifically, each internal node can have 100s,
1000s or more children. Using the example slides, consider the nodes
[A...M] and [R...Z] -- in a good sized CouchDB database, those internal
index nodes would have 100s (or more) elements in them pointing at deeper
internal nodes that themselves had thousands of elements; instead of the 13
or so as implied by [A...M].

3. Looking at slide 17 and 18, where you see the direct B+ tree path to the
update node getting appended to the end of the file after the revision is
written (leaf to root ordering: [J' M] -> [A M] -> [A Z]) it implies that
those internal nodes with *all* their child elements are getting rewritten
as well.

In this example tree, it is isn't such a big issue... but in a sufficiently
large CouchDB database, these nodes denoted by [A...M] and [A...Z] could be
quite large... I don't know the format of the node elements in the B+ tree,
but it would be whatever the size of a node is times however many elements
are contained at each level (1 for root, say 100 for level 2, 1000 for
level 3 and 10,000 for level 4 -- there is a lot of hand-waving going on
here, of course it depends on the size of the data store).

Am I missing something or is CouchDB really rewriting that much index
information between document revisions on every update?

What was previously confusing me is I thought it was *only* rewriting a
direct path to the updated revision, like [B]>[E]>[J'] and Couch was
some-how patching in that updated path info to the B+ index at runtime.

If couch is rewriting entire node paths with all their elements then I am
no longer confused about the B+ index updates, but am curious about the
on-disk cost of this.

In my own rough insertion testing, that would explain why I see my
collections absolutely explode in size until they are compacted (not using
bulk insert, but intentionally doing single inserts for a million(s) of
docs to see what kind of cost the index path duplication would be like).

Can anyone confirm/deny/correct this assessment? I want to make sure I am
on the right track understanding this.

Best wishes,
Riyad

On Tue, Dec 20, 2011 at 6:13 PM, Riyad Kalla  wrote:

> @Filipe - I was just not clear on how CouchDB operated; you and Robert
> cleared that up for me. Thank you.
>
> @Robert - The writeup is excellent so far (I am not familiar with erlang,
> so there is a bit of stickiness there), thank you for taking the time to
> put this together!
>
> At this point I am curious how the _id and _seq indices are read as their
> data is continually appended to the end of the data file in small
> diff-trees for every updated doc.
>
> If CouchDB kept all the indices in-memory and simply patched-in the
> updated paths at runtime (maybe something akin to memory-mapped indices in
> MongoDB) I would be fairly clear on the operation... but as I understand
> it, CouchDB keeps such a small memory footprint by doing no in-memory
> caching and relying on the intelligence of the OS and filesystem (and/or
> drives) to cache frequently accessed data.
>
> I am trying to understand the logic used by CouchDB to answer a query
> using the index once updates to the tree have been appended to the data
> file... for example, consider a CouchDB datastore like the one Filipe
> has... 10 million documents and let's say it is freshly compacted.
>
> If I send in a request to that Couch instance, it hits the header of the
> data file along with the index and walks the B+ tree to the leaf node,
> where it finds the offset into the data file where the actual doc lives...
> let's say 1,000,000 bytes away.
>
> These B+ trees are shallow, so it might look something like this:
>
> Level 1: 1 node, root node.
> Level 2: 100 nodes, inner child nodes
> Level 3: 10,000 nodes, inner child nodes
> Level 4: 1,000,000, leaf nodes... all with pointers to the data offsets in
> the data file.
>
> Now let's say I write 10 updates to documents in that file. There are 10
> new revisions append

[jira] [Updated] (COUCHDB-1347) Filtered replication does not work when a target document is purged

2011-12-21 Thread Benjamin ter Kuile (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin ter Kuile updated COUCHDB-1347:


Attachment: passing_replication.log
failing_replication.log

Running the test script. Difference between failing and passing is specifying 
the filter in the replication command

> Filtered replication does not work when a target document is purged
> ---
>
> Key: COUCHDB-1347
> URL: https://issues.apache.org/jira/browse/COUCHDB-1347
> Project: CouchDB
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.0.1, 1.1
> Environment: OS X Lion: {"couchdb":"Welcome","version":"1.1.0"} (brew 
> installation)
> Ubuntu 11.04 {"couchdb":"Welcome","version":"1.0.1"}
>Reporter: Benjamin ter Kuile
>  Labels: purge, replication
> Attachments: failing_replication.log, passing_replication.log
>
>
> When a document with an id is deleted and purged, and a replication process 
> tries to create a document with that id, it does not happen. The replication 
> without the filter works. Ruby test script:
> require 'rubygems'
> require 'couchrest'
> # setup
> server = CouchRest.new("http://localhost:5984";)
> a = server.database('a')
> b = server.database('b')
> a.recreate!
> b.recreate!
> # Add a document doc1 to database a and b
> a.save_doc("_id" => 'doc1')
> b_doc1 = b.save_doc("_id" => 'doc1')
> # Delete and purge doc1 from b
> b.delete_doc("_id" => 'doc1', "_rev" => b_doc1['rev'])
> RestClient.post(File.join(b.root, '_purge'), {'doc1' => 
> [b_doc1['rev']]}.to_json, :content_type => :json )
> # Add design with filter
> design = a.save_doc("_id" => "_design/temp", "filters" => {"test" => 
> %|function(doc, req){if(['doc1'].indexOf(doc['_id']) >= 0){return 
> true;}{return false;}}|})
> # Replicate and wait for finish
> RestClient.post("http://localhost:5984/_replicate";, {:source => a.root, 
> :filter => "temp/test", :target => b.root}.to_json, :content_type => :json)
> sleep(0.01) while 
> JSON.parse(RestClient.get("http://localhost:5984/_active_tasks";)).size > 0
> abort "oops" unless b.all_docs['total_rows'] == 1
> puts "Test successful"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira