[jira] Updated: (COUCHDB-260) Support for reduce views in _list

2009-02-23 Thread Jason Davies (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Davies updated COUCHDB-260:
-

Attachment: list_reduce_views.3.diff

Added fix and tests for bug when _list is empty (i.e. head, row and tail are 
"").  Also started refactoring map and reduce code to remove code duplication.

> Support for reduce views in _list
> -
>
> Key: COUCHDB-260
> URL: https://issues.apache.org/jira/browse/COUCHDB-260
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Reporter: Jason Davies
>Priority: Blocker
> Fix For: 0.9
>
> Attachments: list_reduce_views.2.diff, list_reduce_views.3.diff, 
> list_reduce_views.diff
>
>
> The awesomeness of _list needs the awesomeness of reduce views.  Patch to 
> follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Jan Lehnardt


On 22 Feb 2009, at 23:30, Noah Slater wrote:


On Sun, Feb 22, 2009 at 05:47:00PM +0100, Jan Lehnardt wrote:

It looks like we have a draw with weigh-in from the community
on a +1 to accept the patch.

We need more discussion here.


Oh wow, I was very confused by this.

I actually announced the results on the 2009-02-01, and wondered why  
you were
ignoring these... but a careful check of my mail archives reveals I  
managed to

have an entire subthread replying to my own emails. What a looser.


Now I'm confused, you wrote, but didn't send the RESULT mail? :)

In any case, this needs discussion.

Cheers
Jan
--



Bellow is what I had originally written:



The results are in, and my conclusions are:

 * The community and PMC have decided to open this issue back up for
   discussion, with the proviso that we complete our final decision  
before

   releasing 0.9 -- which means another vote in a week or so. Heh.

 * The community was strongly in favour of accepting the patch, but  
the PMC was
   almost completely split down the middle, with a slightly  
preference for not

   accepting the patch.

Over the course of the vote, we had a little discussion, but maybe  
not enough.


Is there anything else any wants to add? Nows the time!

I know this is an annoying issue, and you wouldn't believe how  
tremendously
boring compiling this vote result has been (learnt some new stuff  
about how to
tot-up vote results though) -- but we need to resolve it at some  
point. Our

ability to handle our very first bikeshed colour might be seen as an
opportunity to demonstrate the awesome power of the CouchDB  
community. Heh heh.


Go CouchDB!


 * Accept the patch (or a modified version) and add newline chars


 Community: 7, PMC: 1

   +1 Noah Slater (Binding)
   +1 Antony Blakey
   +1 Dean Landolt
   +1 Paul Davis
   +1 Christopher Lenz (Binding)
   +0 Damien Katz (Binding)
   -1 Chris Anderson (binding)
   -1 Jan Lehnardt (binding)

 * Reject the patch (and any modified version) and do not add  
newlines chars


 Community: -4, PMC: 2

   -1 Noah Slater (Binding)
   -1 Antony Blakey
   -1 Dean Landolt
   -1 Paul Davis
   -0 Christopher Lenz (Binding)
   +0 Damien Katz (Binding)
   +1 Chris Anderson (binding)
   +1 Jan Lehnardt (binding)


 * Further discussion, to be decided before we release 0.9


 Community: 5, PMC: 5

   +0 Noah Slater (Binding)
   +0 Antony Blakey
   +0 Dean Landolt
   -1 Paul Davis
   +0 Christopher Lenz (Binding)
   +1 Damien Katz (Binding)
   +0 Chris Anderson (binding)

 (Jan Lehnardt voted "0" which I am discounting)


 * Further discussion, to be decided after we release 0.9


 Community: -3, PMC: -3

   -0 Noah Slater (Binding)
   +0 Antony Blakey
   +0 Dean Landolt
   -1 Paul Davis
   +0 Christopher Lenz (Binding)
   -0 Chris Anderson (binding)
   -1 Jan Lehnardt (binding)

 (Damien Katz didn't vote on this option)

Forgot to mention my super-duper vote toting-up scheme.

 +1 = +2
 +0 = +1
 -0 = -1
 -1 = -2

I added up these scores in total for each voting option.




I created "couchcurl", a stupid-simple wrapper around curl that adds
a newline:

 http://github.com/janl/couchcurl/tree/master

I wanted this tool for a long time and as you can see in the TODO
section, this is indeed quite useful if you can specify default  
server

names and port numbers so you could run


I appreciate your input, and your contributions to this, but that we  
have had to
make a WRAPPER for such a popular tool, just to improve user  
friendliness,

should in and of itself, be an indication that something is wrong.

I understand that this does not counter all objections in the +1  
camp,

but maybe we can get more support on the -1 camp through the
community :)


Actually, going by my counting:

 * The community strongly wants to accept the patch

 * The PMC was split and strongly wants to discuss it further

... which is an odd outcome, to say the least!

--
Noah Slater, http://tumbolia.org/nslater





Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Christopher Lenz

On 23.02.2009, at 10:20, Jan Lehnardt wrote:

On 22 Feb 2009, at 23:30, Noah Slater wrote:


On Sun, Feb 22, 2009 at 05:47:00PM +0100, Jan Lehnardt wrote:

It looks like we have a draw with weigh-in from the community
on a +1 to accept the patch.

We need more discussion here.


Oh wow, I was very confused by this.

I actually announced the results on the 2009-02-01, and wondered  
why you were
ignoring these... but a careful check of my mail archives reveals I  
managed to

have an entire subthread replying to my own emails. What a looser.


Now I'm confused, you wrote, but didn't send the RESULT mail? :)

In any case, this needs discussion.


Providing a reason for your -1 on accepting the patch would be a good  
start ;)


Personally, I don't think this whole thing is very important, but I  
don't see any harm in adding the trailing newline right now.


Cheers,
--
Christopher Lenz
  cmlenz at gmx.de
  http://www.cmlenz.net/



[jira] Created: (COUCHDB-264) Evaluate if we'd like to track heart-restarts and stop completely after X restarts in Y time.

2009-02-23 Thread Jan Lehnardt (JIRA)
Evaluate if we'd like to track heart-restarts and stop completely after X 
restarts in Y time.
-

 Key: COUCHDB-264
 URL: https://issues.apache.org/jira/browse/COUCHDB-264
 Project: CouchDB
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Jan Lehnardt
Priority: Minor


See http://steve.vinoski.net/blog/2009/02/22/controlling-erlangs-heart/ for a 
discussion. We could decide if we want to add the same to CouchDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Jan Lehnardt

Hi Christopher,

On 23 Feb 2009, at 10:31, Christopher Lenz wrote:

Personally, I don't think this whole thing is very important, but I  
don't see any harm in adding the trailing newline right now.


Me neither. The only thing is that it changes the API and we agreed to
solve these issues within the next one or two releases. If we decide  
that
we can change this particular behaviour of the API later without  
breaking

clients, we can drop the 'blocking' bit and move on.


Providing a reason for your -1 on accepting the patch would be a  
good start ;)


http://mail-archives.apache.org/mod_mbox/couchdb-dev/200901.mbox/%3ce282921e0901261309y51bc9519v6583f2b0a9d7e...@mail.gmail.com%3e

I haven't seen any compelling pro- arguments that can't be solved
on the client side (curl -w\\n e.g.).

Cheers
Jan
--



Fwd: Fail on a simple case on replication

2009-02-23 Thread Damien Katz
This is a very common misconception about the revision system. Any  
ideas how we can make this better?


random ideas:
- Remove the ability to get old revisions
- Make it much harder/verbose to get old revision
- Make the api to get old revisions something like "? 
old_rev_that_might_still_be_on_disk="
- Don't call them revisions, call them "turd blossoms" or "hobo  
socks". People won't know what they are, but at least they won't  
misuse them.


-Damien

Begin forwarded message:


From: Damien Katz 
Date: February 23, 2009 9:09:09 AM EST
To: u...@couchdb.apache.org
Subject: Re: Fail on a simple case on replication
Reply-To: u...@couchdb.apache.org

Revisions are made available as a convenience, but CouchDB doesn't  
replicate old revisions, only the most recent. Also compaction will  
remove old revisions as well.


-Damien

On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:


Hi:

I'm trying to test the replication process with two local database  
and I
found that replication process don't work as it should (or as I  
think it

would)

The case:

1º Create a db called t2.
2º Create a document called terminator.
3º Add a property to the document, so that makes a new revision,  
with a

property called speed and the value 1
4º Create a new db called t3.
5º Launch replication process from t2 to t3.

In t3 should be a document with two revisions, and if I point to
"t3/terminator?revs=true" appears two revisions. If I try to get  
the last
revision it works as it should but If I try to get the first  
revision (the

one without properties) I get a "not found" message.

In t2 database, this works without problems so I think that is a  
problem

with replication.

I've tried with debian , with the lastest in the web (0.8.1), and  
the trunk

svn version with the same results.

Anyone could help me or the terminator will kill me? :-)

Thanks in advance

Manolo Padrón Martínez






Re: Fail on a simple case on replication

2009-02-23 Thread Ulises
> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
> People won't know what they are, but at least they won't misuse them.

+1 as revision is too tied up to CVS and friends. I'm no so sure about
turd blossoms though ;)

U


Re: Fail on a simple case on replication

2009-02-23 Thread Jan Lehnardt

Hi,

On 23 Feb 2009, at 15:16, Damien Katz wrote:

This is a very common misconception about the revision system. Any  
ideas how we can make this better?


random ideas:
- Remove the ability to get old revisions
- Make it much harder/verbose to get old revision
- Make the api to get old revisions something like "? 
old_rev_that_might_still_be_on_disk="
- Don't call them revisions, call them "turd blossoms" or "hobo  
socks". People won't know what they are, but at least they won't  
misuse them.


I like to think of the _rev as an access token the user has to
provide when attempting a write. So _token would be an idea
or something else along these lines.

Since we are only operating on the "previous" revision for this,
we could also name it _prev.


Cheers
Jan
--





-Damien

Begin forwarded message:


From: Damien Katz 
Date: February 23, 2009 9:09:09 AM EST
To: u...@couchdb.apache.org
Subject: Re: Fail on a simple case on replication
Reply-To: u...@couchdb.apache.org

Revisions are made available as a convenience, but CouchDB doesn't  
replicate old revisions, only the most recent. Also compaction will  
remove old revisions as well.


-Damien

On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:


Hi:

I'm trying to test the replication process with two local database  
and I
found that replication process don't work as it should (or as I  
think it

would)

The case:

1º Create a db called t2.
2º Create a document called terminator.
3º Add a property to the document, so that makes a new revision,  
with a

property called speed and the value 1
4º Create a new db called t3.
5º Launch replication process from t2 to t3.

In t3 should be a document with two revisions, and if I point to
"t3/terminator?revs=true" appears two revisions. If I try to get  
the last
revision it works as it should but If I try to get the first  
revision (the

one without properties) I get a "not found" message.

In t2 database, this works without problems so I think that is a  
problem

with replication.

I've tried with debian , with the lastest in the web (0.8.1), and  
the trunk

svn version with the same results.

Anyone could help me or the terminator will kill me? :-)

Thanks in advance

Manolo Padrón Martínez









Re: Fail on a simple case on replication

2009-02-23 Thread Robert Dionne



On Feb 23, 2009, at 9:16 AM, Damien Katz wrote:

This is a very common misconception about the revision system. Any  
ideas how we can make this better?


random ideas:
- Remove the ability to get old revisions


+1


- Make it much harder/verbose to get old revision
- Make the api to get old revisions something like "? 
old_rev_that_might_still_be_on_disk="
- Don't call them revisions, call them "turd blossoms" or "hobo  
socks". People won't know what they are, but at least they won't  
misuse them.


+1 change _rev to _internal_id




-Damien

Begin forwarded message:


From: Damien Katz 
Date: February 23, 2009 9:09:09 AM EST
To: u...@couchdb.apache.org
Subject: Re: Fail on a simple case on replication
Reply-To: u...@couchdb.apache.org

Revisions are made available as a convenience, but CouchDB doesn't  
replicate old revisions, only the most recent. Also compaction  
will remove old revisions as well.


-Damien

On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:


Hi:

I'm trying to test the replication process with two local  
database and I
found that replication process don't work as it should (or as I  
think it

would)

The case:

1º Create a db called t2.
2º Create a document called terminator.
3º Add a property to the document, so that makes a new revision,  
with a

property called speed and the value 1
4º Create a new db called t3.
5º Launch replication process from t2 to t3.

In t3 should be a document with two revisions, and if I point to
"t3/terminator?revs=true" appears two revisions. If I try to get  
the last
revision it works as it should but If I try to get the first  
revision (the

one without properties) I get a "not found" message.

In t2 database, this works without problems so I think that is a  
problem

with replication.

I've tried with debian , with the lastest in the web (0.8.1), and  
the trunk

svn version with the same results.

Anyone could help me or the terminator will kill me? :-)

Thanks in advance

Manolo Padrón Martínez








Re: Stats

2009-02-23 Thread Jan Lehnardt


On 22 Feb 2009, at 15:06, Jan Lehnardt wrote:

I mentioned this in an earlier mail but I'd like to bring it up again,
since your input is needed here. Metrics are identified with a
tuple `{Module, Key}`. `Module` is the module that initiates the
counting of the metric and `Key` is a uniquely identifies a metric
within a module. Until now, Alex and I just made up names as
they came up without much considering a consistent and intuitive
naming scheme. It would be great if you could help out checking
if the names are any good and suggest alternatives.


So far we have:

{couchdb, open_databases}
{couchdb, request_time}

{httpd, bulk_requests}
{httpd, head_requests}
{httpd, get_requests}
{httpd, put_requests}
{httpd, post_requests}
{httpd, delete_requests}
{httpd, copy_requests}
{httpd, move_requests}

{httpd, document_copies}
{httpd, document_creates}
{httpd, document_deletes}
{httpd, document_moves}
{httpd, document_reads}
{httpd, document_updates}
{httpd, requests}
{httpd, temporary_view_reads}
{httpd, view_reads}

{http_status_codes, Code} (Code is one of 200, 201, 203 ... )


I'd suggest
 - to move the `document_*` keys from `httpd` to `couchdb`,
 - to rename `httpd` to `http`.

Is there anything else that you think should look different?

Cheers
Jan
--





Re: Fail on a simple case on replication

2009-02-23 Thread Chris Anderson
On Mon, Feb 23, 2009 at 6:16 AM, Damien Katz  wrote:
> This is a very common misconception about the revision system. Any ideas how
> we can make this better?
>
> random ideas:
> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
> People won't know what they are, but at least they won't misuse them.
>

+1 for _mvcc (or longer) _mvcc_token

-- 
Chris Anderson
http://jchris.mfdz.com


Re: Fail on a simple case on replication

2009-02-23 Thread Patrick Antivackis
For a reminder :

revision  (n)
1. the act or process of revising,
2. a corrected or new version of a book, article, etc.

For me this term is correct with the use in Couch

I think a good explanation of what a compaction/replication are doing (ie
removing  old rev, or replicating only current rev) is the right solution to
this misunderstanding

- Remove the ability to get old revisions
>>
>
-1 : This functionnality is interesting for some case studies

 - Make it much harder/verbose to get old revision


-1 : I don't see the utility of this

- Make the api to get old revisions something like
>> "?old_rev_that_might_still_be_on_disk="
>
>
0 :


> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
>> People won't know what they are, but at least they won't misuse them.
>>
>
-1 : revision seems the right term to me


>
>
>
>
>> -Damien
>>
>> Begin forwarded message:
>>
>>  From: Damien Katz 
>>> Date: February 23, 2009 9:09:09 AM EST
>>> To: u...@couchdb.apache.org
>>> Subject: Re: Fail on a simple case on replication
>>> Reply-To: u...@couchdb.apache.org
>>>
>>> Revisions are made available as a convenience, but CouchDB doesn't
>>> replicate old revisions, only the most recent. Also compaction will remove
>>> old revisions as well.
>>>
>>> -Damien
>>>
>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>
>>>  Hi:

 I'm trying to test the replication process with two local database and I
 found that replication process don't work as it should (or as I think it
 would)

 The case:

 1º Create a db called t2.
 2º Create a document called terminator.
 3º Add a property to the document, so that makes a new revision, with a
 property called speed and the value 1
 4º Create a new db called t3.
 5º Launch replication process from t2 to t3.

 In t3 should be a document with two revisions, and if I point to
 "t3/terminator?revs=true" appears two revisions. If I try to get the
 last
 revision it works as it should but If I try to get the first revision
 (the
 one without properties) I get a "not found" message.

 In t2 database, this works without problems so I think that is a problem
 with replication.

 I've tried with debian , with the lastest in the web (0.8.1), and the
 trunk
 svn version with the same results.

 Anyone could help me or the terminator will kill me? :-)

 Thanks in advance

 Manolo Padrón Martínez

>>>
>>>
>>
>


Re: Fail on a simple case on replication

2009-02-23 Thread Jan Lehnardt


On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:


For a reminder :

revision  (n)
1. the act or process of revising,
2. a corrected or new version of a book, article, etc.

For me this term is correct with the use in Couch


Damien is not saying the usage is wrong in CouchDB, but people
associate more with "revision" than he'd like. Hence the proposal.


I think a good explanation of what a compaction/replication are  
doing (ie
removing  old rev, or replicating only current rev) is the right  
solution to

this misunderstanding


Can you suggest how we improve the wiki docs to satisfy this? In my
opinion, the docs are clear* and the term is overloaded and confusing.

* http://wiki.apache.org/couchdb/Document_revisions has
"You cannot rely on document revisions for any other purpose
than concurrency control." in bold letters.

I stated this in earlier discussions as well: Even if our documentation
were perfect, we don't control how people learn about CouchDB. We
only control the API and we should work hard to get it right.

The way it stands now, a lot of people new to CouchDB get it wrong
because "revision" is a familiar term and they associate the behaviour
they associate with it to them. That's how humans learn. In this case
we make the learning hard.

Cheers
Jan
--



- Remove the ability to get old revisions





-1 : This functionnality is interesting for some case studies

- Make it much harder/verbose to get old revision


-1 : I don't see the utility of this

- Make the api to get old revisions something like

"?old_rev_that_might_still_be_on_disk="




0 :


- Don't call them revisions, call them "turd blossoms" or "hobo  
socks".
People won't know what they are, but at least they won't misuse  
them.





-1 : revision seems the right term to me








-Damien

Begin forwarded message:

From: Damien Katz 

Date: February 23, 2009 9:09:09 AM EST
To: u...@couchdb.apache.org
Subject: Re: Fail on a simple case on replication
Reply-To: u...@couchdb.apache.org

Revisions are made available as a convenience, but CouchDB doesn't
replicate old revisions, only the most recent. Also compaction  
will remove

old revisions as well.

-Damien

On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:

Hi:


I'm trying to test the replication process with two local  
database and I
found that replication process don't work as it should (or as I  
think it

would)

The case:

1º Create a db called t2.
2º Create a document called terminator.
3º Add a property to the document, so that makes a new revision,  
with a

property called speed and the value 1
4º Create a new db called t3.
5º Launch replication process from t2 to t3.

In t3 should be a document with two revisions, and if I point to
"t3/terminator?revs=true" appears two revisions. If I try to get  
the

last
revision it works as it should but If I try to get the first  
revision

(the
one without properties) I get a "not found" message.

In t2 database, this works without problems so I think that is a  
problem

with replication.

I've tried with debian , with the lastest in the web (0.8.1),  
and the

trunk
svn version with the same results.

Anyone could help me or the terminator will kill me? :-)

Thanks in advance

Manolo Padrón Martínez












Re: Fail on a simple case on replication

2009-02-23 Thread Patrick Antivackis
Hi Jan,
You are right about the document revision wiki page, unfortunately there is
nothing about replication (at least i not found).
The only thing on replication I found is :
http://couchdb.apache.org/docs/overview.html
where it says :
"The replication process is incremental. At the database level, replication
only examines documents updated since the last replication. Then for each
updated document, only fields and blobs that have changed are replicated
across the network. If replication fails at any step, due to network
problems or crash for example, the next replication restarts at the same
document where it left off."

I wrote something like this in the "proposed replication rev history
changes" mail list thread.

version of a document : a version of a document is identified by an id and a
revision. The version of the document contains all the fileds/values as they
were at this specific revision

revision history : the list of all the revisions of the document beginning
by the most recent

Compaction of a database removes all previous versions of a document but
keep the revision history, so a revs=true for a document will return the
revision history of this document but of course I will not be able to see
what contains this document revision

Replication only replicate the last version of a document. if I replicate
baseA to an empty baseB, baseB will contains the last version of all non
deleted documents, and for each documents the full revision history. So for
each document i'm able to do a revs=true, but I am unable to see the content
of a previous version.

May be i can start a wiki page on replication, but i think the
http://couchdb.apache.org/docs/overview.html should be clarified too.


2009/2/23 Jan Lehnardt 

>
> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>
>  For a reminder :
>>
>> revision  (n)
>> 1. the act or process of revising,
>> 2. a corrected or new version of a book, article, etc.
>>
>> For me this term is correct with the use in Couch
>>
>
> Damien is not saying the usage is wrong in CouchDB, but people
> associate more with "revision" than he'd like. Hence the proposal.
>
>
>  I think a good explanation of what a compaction/replication are doing (ie
>> removing  old rev, or replicating only current rev) is the right solution
>> to
>> this misunderstanding
>>
>
> Can you suggest how we improve the wiki docs to satisfy this? In my
> opinion, the docs are clear* and the term is overloaded and confusing.
>
> * http://wiki.apache.org/couchdb/Document_revisions has
> "You cannot rely on document revisions for any other purpose
> than concurrency control." in bold letters.
>
> I stated this in earlier discussions as well: Even if our documentation
> were perfect, we don't control how people learn about CouchDB. We
> only control the API and we should work hard to get it right.
>
> The way it stands now, a lot of people new to CouchDB get it wrong
> because "revision" is a familiar term and they associate the behaviour
> they associate with it to them. That's how humans learn. In this case
> we make the learning hard.
>
> Cheers
> Jan
> --
>
>
>
>  - Remove the ability to get old revisions
>>
>>>

>>>  -1 : This functionnality is interesting for some case studies
>>
>> - Make it much harder/verbose to get old revision
>>
>>
>> -1 : I don't see the utility of this
>>
>> - Make the api to get old revisions something like
>>
>>> "?old_rev_that_might_still_be_on_disk="

>>>
>>>
>>>  0 :
>>
>>
>>  - Don't call them revisions, call them "turd blossoms" or "hobo socks".
>>>
 People won't know what they are, but at least they won't misuse them.


>>>  -1 : revision seems the right term to me
>>
>>
>>
>>>
>>>
>>>
>>>  -Damien

 Begin forwarded message:

 From: Damien Katz 

> Date: February 23, 2009 9:09:09 AM EST
> To: u...@couchdb.apache.org
> Subject: Re: Fail on a simple case on replication
> Reply-To: u...@couchdb.apache.org
>
> Revisions are made available as a convenience, but CouchDB doesn't
> replicate old revisions, only the most recent. Also compaction will
> remove
> old revisions as well.
>
> -Damien
>
> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>
> Hi:
>
>>
>> I'm trying to test the replication process with two local database and
>> I
>> found that replication process don't work as it should (or as I think
>> it
>> would)
>>
>> The case:
>>
>> 1º Create a db called t2.
>> 2º Create a document called terminator.
>> 3º Add a property to the document, so that makes a new revision, with
>> a
>> property called speed and the value 1
>> 4º Create a new db called t3.
>> 5º Launch replication process from t2 to t3.
>>
>> In t3 should be a document with two revisions, and if I point to
>> "t3/terminator?revs=true" appears two revisions. If I try to get the
>> last
>> revision it

Re: Fail on a simple case on replication

2009-02-23 Thread Jan Lehnardt


On 23 Feb 2009, at 16:40, Patrick Antivackis wrote:


May be i can start a wiki page on replication, but i think the
http://couchdb.apache.org/docs/overview.html should be clarified too.


Hey yeah, feel free to add new pages and fi existing ones as you
see fit, thanks! :)

Cheers
Jan
--
(Still +1 for the rename :)





2009/2/23 Jan Lehnardt 



On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:

For a reminder :


revision  (n)
1. the act or process of revising,
2. a corrected or new version of a book, article, etc.

For me this term is correct with the use in Couch



Damien is not saying the usage is wrong in CouchDB, but people
associate more with "revision" than he'd like. Hence the proposal.


I think a good explanation of what a compaction/replication are  
doing (ie
removing  old rev, or replicating only current rev) is the right  
solution

to
this misunderstanding



Can you suggest how we improve the wiki docs to satisfy this? In my
opinion, the docs are clear* and the term is overloaded and  
confusing.


* http://wiki.apache.org/couchdb/Document_revisions has
"You cannot rely on document revisions for any other purpose
than concurrency control." in bold letters.

I stated this in earlier discussions as well: Even if our  
documentation

were perfect, we don't control how people learn about CouchDB. We
only control the API and we should work hard to get it right.

The way it stands now, a lot of people new to CouchDB get it wrong
because "revision" is a familiar term and they associate the  
behaviour

they associate with it to them. That's how humans learn. In this case
we make the learning hard.

Cheers
Jan
--



- Remove the ability to get old revisions







-1 : This functionnality is interesting for some case studies


- Make it much harder/verbose to get old revision


-1 : I don't see the utility of this

- Make the api to get old revisions something like


"?old_rev_that_might_still_be_on_disk="





0 :



- Don't call them revisions, call them "turd blossoms" or "hobo  
socks".


People won't know what they are, but at least they won't misuse  
them.




-1 : revision seems the right term to me








-Damien


Begin forwarded message:

From: Damien Katz 


Date: February 23, 2009 9:09:09 AM EST
To: u...@couchdb.apache.org
Subject: Re: Fail on a simple case on replication
Reply-To: u...@couchdb.apache.org

Revisions are made available as a convenience, but CouchDB  
doesn't
replicate old revisions, only the most recent. Also compaction  
will

remove
old revisions as well.

-Damien

On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:

Hi:



I'm trying to test the replication process with two local  
database and

I
found that replication process don't work as it should (or as  
I think

it
would)

The case:

1º Create a db called t2.
2º Create a document called terminator.
3º Add a property to the document, so that makes a new  
revision, with

a
property called speed and the value 1
4º Create a new db called t3.
5º Launch replication process from t2 to t3.

In t3 should be a document with two revisions, and if I point to
"t3/terminator?revs=true" appears two revisions. If I try to  
get the

last
revision it works as it should but If I try to get the first  
revision

(the
one without properties) I get a "not found" message.

In t2 database, this works without problems so I think that is a
problem
with replication.

I've tried with debian , with the lastest in the web (0.8.1),  
and the

trunk
svn version with the same results.

Anyone could help me or the terminator will kill me? :-)

Thanks in advance

Manolo Padrón Martínez















Re: Fail on a simple case on replication

2009-02-23 Thread Patrick Antivackis
OK , upadated :
http://wiki.apache.org/couchdb/Document_revisions

Created :  http://wiki.apache.org/couchdb/Replication (need more work)

2009/2/23 Jan Lehnardt 

>
> On 23 Feb 2009, at 16:40, Patrick Antivackis wrote:
>
>  May be i can start a wiki page on replication, but i think the
>> http://couchdb.apache.org/docs/overview.html should be clarified too.
>>
>
> Hey yeah, feel free to add new pages and fi existing ones as you
> see fit, thanks! :)
>
> Cheers
> Jan
> --
> (Still +1 for the rename :)
>
>
>
>>
>>
>> 2009/2/23 Jan Lehnardt 
>>
>>
>>> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>>>
>>> For a reminder :
>>>

 revision  (n)
 1. the act or process of revising,
 2. a corrected or new version of a book, article, etc.

 For me this term is correct with the use in Couch


>>> Damien is not saying the usage is wrong in CouchDB, but people
>>> associate more with "revision" than he'd like. Hence the proposal.
>>>
>>>
>>> I think a good explanation of what a compaction/replication are doing (ie
>>>
 removing  old rev, or replicating only current rev) is the right
 solution
 to
 this misunderstanding


>>> Can you suggest how we improve the wiki docs to satisfy this? In my
>>> opinion, the docs are clear* and the term is overloaded and confusing.
>>>
>>> * http://wiki.apache.org/couchdb/Document_revisions has
>>> "You cannot rely on document revisions for any other purpose
>>> than concurrency control." in bold letters.
>>>
>>> I stated this in earlier discussions as well: Even if our documentation
>>> were perfect, we don't control how people learn about CouchDB. We
>>> only control the API and we should work hard to get it right.
>>>
>>> The way it stands now, a lot of people new to CouchDB get it wrong
>>> because "revision" is a familiar term and they associate the behaviour
>>> they associate with it to them. That's how humans learn. In this case
>>> we make the learning hard.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>
>>> - Remove the ability to get old revisions
>>>


>
>>  -1 : This functionnality is interesting for some case studies
>

 - Make it much harder/verbose to get old revision


 -1 : I don't see the utility of this

 - Make the api to get old revisions something like

  "?old_rev_that_might_still_be_on_disk="
>
>>
>>
>
> 0 :
>


 - Don't call them revisions, call them "turd blossoms" or "hobo socks".

>
>  People won't know what they are, but at least they won't misuse them.
>>
>>
>>  -1 : revision seems the right term to me
>




>
>
> -Damien
>
>>
>> Begin forwarded message:
>>
>> From: Damien Katz 
>>
>>  Date: February 23, 2009 9:09:09 AM EST
>>> To: u...@couchdb.apache.org
>>> Subject: Re: Fail on a simple case on replication
>>> Reply-To: u...@couchdb.apache.org
>>>
>>> Revisions are made available as a convenience, but CouchDB doesn't
>>> replicate old revisions, only the most recent. Also compaction will
>>> remove
>>> old revisions as well.
>>>
>>> -Damien
>>>
>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>
>>> Hi:
>>>
>>>
 I'm trying to test the replication process with two local database
 and
 I
 found that replication process don't work as it should (or as I
 think
 it
 would)

 The case:

 1º Create a db called t2.
 2º Create a document called terminator.
 3º Add a property to the document, so that makes a new revision,
 with
 a
 property called speed and the value 1
 4º Create a new db called t3.
 5º Launch replication process from t2 to t3.

 In t3 should be a document with two revisions, and if I point to
 "t3/terminator?revs=true" appears two revisions. If I try to get the
 last
 revision it works as it should but If I try to get the first
 revision
 (the
 one without properties) I get a "not found" message.

 In t2 database, this works without problems so I think that is a
 problem
 with replication.

 I've tried with debian , with the lastest in the web (0.8.1), and
 the
 trunk
 svn version with the same results.

 Anyone could help me or the terminator will kill me? :-)

 Thanks in advance

 Manolo Padrón Martínez



>>>
>>>
>>
>
>>>
>


Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Noah Slater
On Mon, Feb 23, 2009 at 10:20:12AM +0100, Jan Lehnardt wrote:
> Now I'm confused, you wrote, but didn't send the RESULT mail? :)

I had replied, but managed to reply to myself instead of the list.

-- 
Noah Slater, http://tumbolia.org/nslater


Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Noah Slater
On Mon, Feb 23, 2009 at 10:48:22AM +0100, Jan Lehnardt wrote:
> I haven't seen any compelling pro- arguments that can't be solved
> on the client side (curl -w\\n e.g.).

Having to solve on the client side IS the problem. :)

-- 
Noah Slater, http://tumbolia.org/nslater


[jira] Created: (COUCHDB-265) HEAD requests get a Content-Length header

2009-02-23 Thread Paul Joseph Davis (JIRA)
HEAD requests get a Content-Length header
-

 Key: COUCHDB-265
 URL: https://issues.apache.org/jira/browse/COUCHDB-265
 Project: CouchDB
  Issue Type: Bug
  Components: HTTP Interface
Affects Versions: 0.9
 Environment: curl + trunk
Reporter: Paul Joseph Davis


Looks like HEAD requests are returning a bogus Content-Length header. If I 
remember my HTTP spec correctly, HEAD requests are supposed to return no 
Content-Length or a Content-Length of 0 but I could be wrong on that. Either 
way, it confuses the crap out of curl:

$ curl -X HEAD -i http://127.0.0.1:5984/
HTTP/1.1 200 OK
Server: CouchDB/0.9.0a (Erlang OTP/R12B)
Date: Mon, 23 Feb 2009 20:56:55 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 40
Cache-Control: must-revalidate

curl: (18) transfer closed with 40 bytes remaining to read


Also, I just happened to be reading couch_http.erl the other day and I remember 
seeing a note that said mochiweb automatically strips bodies so internally HEAD 
requests are treated like a GET and mochiweb I guess just doesn't send a body. 
That's probably important.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-265) HEAD requests get a Content-Length header

2009-02-23 Thread Jens Alfke (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676072#action_12676072
 ] 

Jens Alfke commented on COUCHDB-265:


No, HEAD is supposed to return _exactly the same headers_ as GET, just without 
the body.
It can be used by clients to determine the size of a resource before 
downloading it, so the Content-Length is definitely useful.

> HEAD requests get a Content-Length header
> -
>
> Key: COUCHDB-265
> URL: https://issues.apache.org/jira/browse/COUCHDB-265
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Affects Versions: 0.9
> Environment: curl + trunk
>Reporter: Paul Joseph Davis
>
> Looks like HEAD requests are returning a bogus Content-Length header. If I 
> remember my HTTP spec correctly, HEAD requests are supposed to return no 
> Content-Length or a Content-Length of 0 but I could be wrong on that. Either 
> way, it confuses the crap out of curl:
> $ curl -X HEAD -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a (Erlang OTP/R12B)
> Date: Mon, 23 Feb 2009 20:56:55 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 40
> Cache-Control: must-revalidate
> curl: (18) transfer closed with 40 bytes remaining to read
> Also, I just happened to be reading couch_http.erl the other day and I 
> remember seeing a note that said mochiweb automatically strips bodies so 
> internally HEAD requests are treated like a GET and mochiweb I guess just 
> doesn't send a body. That's probably important.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-265) HEAD requests get a Content-Length header

2009-02-23 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676073#action_12676073
 ] 

Paul Joseph Davis commented on COUCHDB-265:
---

Yeah. Noah just corrected me and I went back and reread the spec again which 
definitely says that changes in content-length should be detectable. Guess this 
is just curl being dumb. Apologies for the noise.

> HEAD requests get a Content-Length header
> -
>
> Key: COUCHDB-265
> URL: https://issues.apache.org/jira/browse/COUCHDB-265
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Affects Versions: 0.9
> Environment: curl + trunk
>Reporter: Paul Joseph Davis
>
> Looks like HEAD requests are returning a bogus Content-Length header. If I 
> remember my HTTP spec correctly, HEAD requests are supposed to return no 
> Content-Length or a Content-Length of 0 but I could be wrong on that. Either 
> way, it confuses the crap out of curl:
> $ curl -X HEAD -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a (Erlang OTP/R12B)
> Date: Mon, 23 Feb 2009 20:56:55 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 40
> Cache-Control: must-revalidate
> curl: (18) transfer closed with 40 bytes remaining to read
> Also, I just happened to be reading couch_http.erl the other day and I 
> remember seeing a note that said mochiweb automatically strips bodies so 
> internally HEAD requests are treated like a GET and mochiweb I guess just 
> doesn't send a body. That's probably important.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COUCHDB-265) HEAD requests get a Content-Length header

2009-02-23 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis closed COUCHDB-265.
-

   Resolution: Invalid
Fix Version/s: 0.9

Should've read the spec before filing :D

> HEAD requests get a Content-Length header
> -
>
> Key: COUCHDB-265
> URL: https://issues.apache.org/jira/browse/COUCHDB-265
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Affects Versions: 0.9
> Environment: curl + trunk
>Reporter: Paul Joseph Davis
> Fix For: 0.9
>
>
> Looks like HEAD requests are returning a bogus Content-Length header. If I 
> remember my HTTP spec correctly, HEAD requests are supposed to return no 
> Content-Length or a Content-Length of 0 but I could be wrong on that. Either 
> way, it confuses the crap out of curl:
> $ curl -X HEAD -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a (Erlang OTP/R12B)
> Date: Mon, 23 Feb 2009 20:56:55 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 40
> Cache-Control: must-revalidate
> curl: (18) transfer closed with 40 bytes remaining to read
> Also, I just happened to be reading couch_http.erl the other day and I 
> remember seeing a note that said mochiweb automatically strips bodies so 
> internally HEAD requests are treated like a GET and mochiweb I guess just 
> doesn't send a body. That's probably important.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-265) HEAD requests get a Content-Length header

2009-02-23 Thread Jens Alfke (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676075#action_12676075
 ] 

Jens Alfke commented on COUCHDB-265:


By the way, curl has exactly the same problem with HEAD requests to other 
servers, so I'd say this is a bug in curl itself. (For example, try "curl -X 
HEAD http://www.google.com";.)

> HEAD requests get a Content-Length header
> -
>
> Key: COUCHDB-265
> URL: https://issues.apache.org/jira/browse/COUCHDB-265
> Project: CouchDB
>  Issue Type: Bug
>  Components: HTTP Interface
>Affects Versions: 0.9
> Environment: curl + trunk
>Reporter: Paul Joseph Davis
> Fix For: 0.9
>
>
> Looks like HEAD requests are returning a bogus Content-Length header. If I 
> remember my HTTP spec correctly, HEAD requests are supposed to return no 
> Content-Length or a Content-Length of 0 but I could be wrong on that. Either 
> way, it confuses the crap out of curl:
> $ curl -X HEAD -i http://127.0.0.1:5984/
> HTTP/1.1 200 OK
> Server: CouchDB/0.9.0a (Erlang OTP/R12B)
> Date: Mon, 23 Feb 2009 20:56:55 GMT
> Content-Type: text/plain;charset=utf-8
> Content-Length: 40
> Cache-Control: must-revalidate
> curl: (18) transfer closed with 40 bytes remaining to read
> Also, I just happened to be reading couch_http.erl the other day and I 
> remember seeing a note that said mochiweb automatically strips bodies so 
> internally HEAD requests are treated like a GET and mochiweb I guess just 
> doesn't send a body. That's probably important.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Brian Candler
On Mon, Feb 23, 2009 at 07:48:46PM +, Noah Slater wrote:
> On Mon, Feb 23, 2009 at 10:48:22AM +0100, Jan Lehnardt wrote:
> > I haven't seen any compelling pro- arguments that can't be solved
> > on the client side (curl -w\\n e.g.).
> 
> Having to solve on the client side IS the problem. :)

Indeed. And the current situation is anomalous: if there are objections to a
trailing newline, surely newlines should be removed from within view results
too. (I don't want this though; I want them to be kept and the trailing one
added).

Incidentally, I looked at the code and the internal newlines are \r\n - so
the trailing one should be this too, IMO.

send_json_view_row(Resp, Db, {{Key, DocId}, Value}, RowFront, IncludeDocs) ->
JsonObj = view_row_obj(Db, {{Key, DocId}, Value}, IncludeDocs),
RowFront2 = case RowFront of
nil -> ",\r\n";
_ -> RowFront
end,
send_chunk(Resp, RowFront2 ++  ?JSON_ENCODE(JsonObj)).

Regards,

Brian.


Re: Fail on a simple case on replication

2009-02-23 Thread Dean Landolt
On Mon, Feb 23, 2009 at 10:30 AM, Jan Lehnardt  wrote:

>
> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>
>  For a reminder :
>>
>> revision  (n)
>> 1. the act or process of revising,
>> 2. a corrected or new version of a book, article, etc.
>>
>> For me this term is correct with the use in Couch
>>
>
> Damien is not saying the usage is wrong in CouchDB, but people
> associate more with "revision" than he'd like. Hence the proposal.
>
>
>  I think a good explanation of what a compaction/replication are doing (ie
>> removing  old rev, or replicating only current rev) is the right solution
>> to
>> this misunderstanding
>>
>
> Can you suggest how we improve the wiki docs to satisfy this? In my
> opinion, the docs are clear* and the term is overloaded and confusing.
>
> * http://wiki.apache.org/couchdb/Document_revisions has
> "You cannot rely on document revisions for any other purpose
> than concurrency control." in bold letters.
>
> I stated this in earlier discussions as well: Even if our documentation
> were perfect, we don't control how people learn about CouchDB. We
> only control the API and we should work hard to get it right.
>
> The way it stands now, a lot of people new to CouchDB get it wrong
> because "revision" is a familiar term and they associate the behaviour
> they associate with it to them. That's how humans learn. In this case
> we make the learning hard.


I couldn't agree more with this sentiment, but revision still strikes me as
the right term. Perhaps the easiest way to fix this misconception is for
there to actually be a way to keep old revisions around for good :)

Would it be overly difficult to just add in the ability to keep a full rev
history based on a config setting? The replication api would need to
accommodate this, of course, and if the machine you're replicating from
doesn't also keep old revisions around your SOL, but is there any other
compelling reason to not offer this option? If it wouldn't complicate the
code base, this seems like a helpful feature. Sure, it could be wasteful and
should be off by default, but if your dataset is relatively small, this
config flag would be pretty nice to have, and it could help clear up this
confusion.


Re: Fail on a simple case on replication

2009-02-23 Thread Paul Davis
On Mon, Feb 23, 2009 at 6:02 PM, Dean Landolt  wrote:
> On Mon, Feb 23, 2009 at 10:30 AM, Jan Lehnardt  wrote:
>
>>
>> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>>
>>  For a reminder :
>>>
>>> revision  (n)
>>> 1. the act or process of revising,
>>> 2. a corrected or new version of a book, article, etc.
>>>
>>> For me this term is correct with the use in Couch
>>>
>>
>> Damien is not saying the usage is wrong in CouchDB, but people
>> associate more with "revision" than he'd like. Hence the proposal.
>>
>>
>>  I think a good explanation of what a compaction/replication are doing (ie
>>> removing  old rev, or replicating only current rev) is the right solution
>>> to
>>> this misunderstanding
>>>
>>
>> Can you suggest how we improve the wiki docs to satisfy this? In my
>> opinion, the docs are clear* and the term is overloaded and confusing.
>>
>> * http://wiki.apache.org/couchdb/Document_revisions has
>> "You cannot rely on document revisions for any other purpose
>> than concurrency control." in bold letters.
>>
>> I stated this in earlier discussions as well: Even if our documentation
>> were perfect, we don't control how people learn about CouchDB. We
>> only control the API and we should work hard to get it right.
>>
>> The way it stands now, a lot of people new to CouchDB get it wrong
>> because "revision" is a familiar term and they associate the behaviour
>> they associate with it to them. That's how humans learn. In this case
>> we make the learning hard.
>
>
> I couldn't agree more with this sentiment, but revision still strikes me as
> the right term. Perhaps the easiest way to fix this misconception is for
> there to actually be a way to keep old revisions around for good :)
>
> Would it be overly difficult to just add in the ability to keep a full rev
> history based on a config setting? The replication api would need to
> accommodate this, of course, and if the machine you're replicating from
> doesn't also keep old revisions around your SOL, but is there any other
> compelling reason to not offer this option? If it wouldn't complicate the
> code base, this seems like a helpful feature. Sure, it could be wasteful and
> should be off by default, but if your dataset is relatively small, this
> config flag would be pretty nice to have, and it could help clear up this
> confusion.
>

I don't (yet) have a very through knowledge of everything that happens
inside the db files, but from the little I do know changing the
operation seems like it'd be a tall order. Then again, I could be
wrong.

Also, my suggestion for renaming would be _lock.

HTH,
Paul Davis


Re: Fail on a simple case on replication

2009-02-23 Thread Antony Blakey


On 24/02/2009, at 9:32 AM, Dean Landolt wrote:


Can you suggest how we improve the wiki docs to satisfy this? In my
opinion, the docs are clear* and the term is overloaded and  
confusing.


* http://wiki.apache.org/couchdb/Document_revisions has
"You cannot rely on document revisions for any other purpose
than concurrency control." in bold letters.

I stated this in earlier discussions as well: Even if our  
documentation

were perfect, we don't control how people learn about CouchDB. We
only control the API and we should work hard to get it right.

The way it stands now, a lot of people new to CouchDB get it wrong
because "revision" is a familiar term and they associate the  
behaviour

they associate with it to them. That's how humans learn. In this case
we make the learning hard.


Firstly, I completely agree that one should consider the implications  
of using certain terms; the baggage and context such terms bring with  
them.



OTOH, one should use the correct term and not redefine existing terms  
to suit one's own purpose. In a tangentially related way, the use of  
the term RESTful wrt CouchDB is a marketing abomination.



The documentation about replication, the role of revisions, the lack  
of inter-document consistency guarantees (including, crucially to the  
operation model, the lack of Monotonic Write guarantees), really needs  
to be expanded.


The consequences of CouchDB's underlying model aren't immediately  
obvious, and should be spelled out, as I started to do here: http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c0fddc57c-db78-4241-86de-549fecc8b...@gmail.com%3e 
 - which was obviously in the context of changing that mechanism, but  
still the explanation and references are useful.


I couldn't agree more with this sentiment, but revision still  
strikes me as
the right term. Perhaps the easiest way to fix this misconception is  
for

there to actually be a way to keep old revisions around for good :)

Would it be overly difficult to just add in the ability to keep a  
full rev

history based on a config setting? The replication api would need to
accommodate this, of course, and if the machine you're replicating  
from
doesn't also keep old revisions around your SOL, but is there any  
other
compelling reason to not offer this option? If it wouldn't  
complicate the
code base, this seems like a helpful feature. Sure, it could be  
wasteful and
should be off by default, but if your dataset is relatively small,  
this
config flag would be pretty nice to have, and it could help clear up  
this

confusion.


Danger Will Robinson!

The problem here is that you then need to make certain guarantees  
about revisions to make them at all useful, and you get into a  
discussion like the above email thread.


IMO, discussing these issues without having read the relevant  
literature around replication models, is a waste of time. Serious  
research has been done into this, and (once again, IMO) it is more  
productive to advance that understanding than try (and possibly fail)  
to reinvent the wheel.


Antony Blakey
--
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A priest, a minister and a rabbi walk into a bar. The bartender says  
"What is this, a joke?"





Re: Fail on a simple case on replication

2009-02-23 Thread Chris Anderson
>> Would it be overly difficult to just add in the ability to keep a full rev
>> history based on a config setting?

This would be a pretty big change. As Antony says, once you go down
that path a little, you end up at something that is not really much
like Couch.

There's yet to be a really clear reference for how to do
application-versioned documents in CouchDB. Hopefully we'll address
the topic in the book, but we haven't gotten that far yet.

The way I see it, the salient options are:

A) leave it as _rev and answer the versioning question every week forever
B) rename it to _mvcc or _lock or _token or something else that
doesn't confuse people

The main drawback of B is that when we start renaming _rev, someone
else comes along and tries to take the opportunity to change _id, or
otherwise change the whole system. If we can stick to just renaming to
something clearer, I'm happy to go ahead with this.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com


Re: Fail on a simple case on replication

2009-02-23 Thread Antony Blakey


On 24/02/2009, at 12:15 PM, Chris Anderson wrote:

Would it be overly difficult to just add in the ability to keep a  
full rev

history based on a config setting?


This would be a pretty big change. As Antony says, once you go down
that path a little, you end up at something that is not really much
like Couch.


I don't want to re-open a dead issue, but to clarify this - there are  
other models of replication that provide stronger weak-consistency  
guarantees - I urge you to read a few Bayou papers if you are  
interested. Using such replication would be very close to Couch. So I  
don't agree with the strength of Chris's comment.


The issue however, is that Couch's identity is, and has always been,  
largely determined by it's replication model. There's so much more to  
Couch that is independent of that, such as map/reduce views, forms,  
futon, an HTTP API, JSON etc, that it's not immediately obvious that  
it's the *replication model* that makes this product 'CouchDB'. The  
project founder and the PMC, are all committed to that replication  
model, which is derived from Notes.


You can add all of the other Couch features, and in fact reuse all of  
the Couch code, with a different replication model, but it's unlikely  
it would be accepted into the Couch code base. If you want that, you  
need to fork and call it something different (which is what I'm  
doing). It's important to note however that the Couch replication  
model has some characteristics that cannot be achieved using any  
stronger form of consistency. In fact, technically speaking, Couch  
provides coherence, but NO consistency.


Given all of that, it would be good to have a very clear 'What is  
Couch' that emphasizes the primacy of the replication model (and it's  
implications, both pro and con), because none of the other things IMO  
are as central to the identity, as consequential, or as confusing  
(except maybe reduce/re-reduce) as the operational semantics of the  
replicational model.


As an aside to this (and I'm not being bolshy), looking further ahead,  
Eventual Consistency, which seems to be promoted as an article of  
faith, is not *strictly* achievable in a partial replication  
environment. Achieving Eventual Consistency is also dependent on some  
other constraints, so depending on your deployment model, it can be  
more theoretical than practical. At the end of the day however,  
dealing with non-Monotonic Writes subsumes dealing with Eventual  
Consistency in all but asymptotic senses.


These are all points that I think should be made clearly and up front  
in the documentation, because a failure to understand Couch's  
replication model, and the implications for applications, both pro and  
com, will IMO lead to failures that will be blamed on Couch, but are  
in fact due to misunderstanding. You don't want a 'Couch is a piece of  
shit' meme to establish. IMO the bulk of Couch users will not think  
this through themselves, because they will be tool users, not tool  
builders.



There's yet to be a really clear reference for how to do
application-versioned documents in CouchDB. Hopefully we'll address
the topic in the book, but we haven't gotten that far yet.

The way I see it, the salient options are:

A) leave it as _rev and answer the versioning question every week  
forever

B) rename it to _mvcc or _lock or _token or something else that
doesn't confuse people

The main drawback of B is that when we start renaming _rev, someone
else comes along and tries to take the opportunity to change _id, or
otherwise change the whole system. If we can stick to just renaming to
something clearer, I'm happy to go ahead with this.


Orthogonally, I still think the id and rev should be wrapped in a  
_meta tag, but modulo that ...


It's not a _lock. Saying it's a _token has nothing to do with it's  
function - it would be like calling a car a 'construct of metal'. It's  
not _mvcc because that's the name of a technique, not a thing.


Maybe _mvcc_commit_id - although in the current implementation it  
isn't, it philosophically is and could be implemented that way. But  
really, it is a document version/revision identifier. Maybe put  
'couch' in there to emphasize the internal nature of it e.g.  
'_couch_rev_id' i.e. something which, at the limit might be  
'_couch_private_revision_id_which_you_should_treat_as_opaque'.


Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is a  
faithful servant. We have created a society that honours the servant  
and has forgotten the gift.

  -- Albert Einstein




Re: Fail on a simple case on replication

2009-02-23 Thread Damien Katz


On Feb 23, 2009, at 8:45 PM, Chris Anderson wrote:

Would it be overly difficult to just add in the ability to keep a  
full rev

history based on a config setting?


This would be a pretty big change. As Antony says, once you go down
that path a little, you end up at something that is not really much
like Couch.

There's yet to be a really clear reference for how to do
application-versioned documents in CouchDB. Hopefully we'll address
the topic in the book, but we haven't gotten that far yet.

The way I see it, the salient options are:

A) leave it as _rev and answer the versioning question every week  
forever

B) rename it to _mvcc or _lock or _token or something else that
doesn't confuse people

The main drawback of B is that when we start renaming _rev, someone
else comes along and tries to take the opportunity to change _id, or
otherwise change the whole system. If we can stick to just renaming to
something clearer, I'm happy to go ahead with this.


I forgot when I posted this, we still need the ability to get conflict  
revisions, which also uses the ?rev=... syntax. Maybe we should change  
that use from ?rev... to ?conflict=, since those rev ids show up  
in the _conflicts doc member.


I think if we change from _rev to something else, _cc for concurrency  
control is good. I'm not sure this is necessary.


Maybe we should only allow the ability to getting old revisions (? 
disk_rev=...) with a setting in the ini, defaulting it off. That  
discourages it's use as general purpose mechanism, but is easy to turn  
on if you really need it.



-Damien


Re: Fail on a simple case on replication

2009-02-23 Thread Chris Anderson
On Mon, Feb 23, 2009 at 6:30 PM, Damien Katz  wrote:

> Maybe we should change that use from ?rev... to ?conflict=

If we follow your _cc idea, we could change from ?rev= to ?cc=

>
> I think if we change from _rev to something else, _cc for concurrency
> control is good. I'm not sure this is necessary.

yes, if we make the change _cc is the best so far. I can already
imagine office workers thinking it stands for "conflict catcher".

>
> Maybe we should only allow the ability to getting old revisions
> (?disk_rev=...) with a setting in the ini, defaulting it off. That
> discourages it's use as general purpose mechanism, but is easy to turn on if
> you really need it.
>

Not a bad idea. The idea that you can't depend on it being available
would discourage apps from attempting to use _cc as an easy way to
provide undo functionality for users. Undo is a good feature, but undo
that sometimes randomly has been compacted away is worse than no undo.


-- 
Chris Anderson
http://jchris.mfdz.com


Re: [RESULT]: Accept newline patch into CouchDB for 0.9 (Was: Re: VOTE: accept newline patch into CouchDB for 0.9)

2009-02-23 Thread Chris Anderson
On Mon, Feb 23, 2009 at 1:31 AM, Christopher Lenz  wrote:

>
> Providing a reason for your -1 on accepting the patch would be a good start
> ;)
>
> Personally, I don't think this whole thing is very important, but I don't
> see any harm in adding the trailing newline right now.
>

I'm basically of the same mind. It's not very important to me either
way. I don't use command line clients much (oh and my bash-prompt has
a newline in it anyway...) so I lean slightly toward the "be like
Google and others" side of the fence.

If I understand correctly, Damien's objections have to do with the
interaction between a trailing newline and a potential future
canonical JSON format. He may have a point but we'd likely have to
change other things in the future if we wanted to use that canonical
format anyway. (Or maybe I'm mischaracterizing his old argument...)

-- 
Chris Anderson
http://jchris.mfdz.com


Re: Fail on a simple case on replication

2009-02-23 Thread
On Mon, Feb 23, 2009 at 8:43 PM, Chris Anderson  wrote:
> On Mon, Feb 23, 2009 at 6:30 PM, Damien Katz  wrote:
>
>> Maybe we should change that use from ?rev... to ?conflict=
>
> If we follow your _cc idea, we could change from ?rev= to ?cc=
>
>>
>> I think if we change from _rev to something else, _cc for concurrency
>> control is good. I'm not sure this is necessary.
>
> yes, if we make the change _cc is the best so far. I can already
> imagine office workers thinking it stands for "conflict catcher".
>
>>
>> Maybe we should only allow the ability to getting old revisions
>> (?disk_rev=...) with a setting in the ini, defaulting it off. That
>> discourages it's use as general purpose mechanism, but is easy to turn on if
>> you really need it.
>>
>
> Not a bad idea. The idea that you can't depend on it being available
> would discourage apps from attempting to use _cc as an easy way to
> provide undo functionality for users. Undo is a good feature, but undo
> that sometimes randomly has been compacted away is worse than no undo.
I would point out that compaction is not a random event.  It is
controlled by the admin, correct?  To my knowledge, couch does not
spontaneously compact nor even currently support the idea of automated
compaction.

>
>

Also, earlier in the thread, Dean L, suggested allowing unlimited rev
history.  I think that his idea has merit in light of a talked about
patch that would limit revs history to length N.  If the ability to
control the size(N) of rev history is in the cards, why not allow N to
be infinity?  Before you just dismiss the idea, I would state that I
could see usefulness for this in special cases and remind you of the
old saw, "Accountants don't use erasers." And in the new age of
security and compliance, Auditors don't like erasers.

scenario, master-slave -- slaves only keep the most recent, while the
master keeps complete. conflict resolution is handled solely by the
master.
scenario, first-among-equals -- multi-master where a single master is
used as the basis for conflict resolution, other masters keep only a
limited rev history and escalate to the first-among-equals when
Eventual Consistency can not be reached do to missing rev history on a
peer.

This is not an argument for changes to replication, or a desire for
replication with complete rev history.  Only to allow rev history size
of infinity.

> --
> Chris Anderson
> http://jchris.mfdz.com
>

Regards,

Jeff Hinrichs


Re: Fail on a simple case on replication

2009-02-23 Thread Antony Blakey


On 24/02/2009, at 1:00 PM, Damien Katz wrote:

I think if we change from _rev to something else, _cc for  
concurrency control is good. I'm not sure this is necessary.


Concurrency control describes how it got there, but it's not the thing  
itself e.g. it's not 'the concurrency control' it's an artifact of a  
concurrency control mechanism. It would be good to have the name of  
the thing describe it accurately.


I vote for this staying as is and being handled in the documentation,  
where it belongs. The API doesn't approach a descriptive semantics.


Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root  
canal?

His goal: transcend dental medication.




Re: Fail on a simple case on replication

2009-02-23 Thread Antony Blakey


On 24/02/2009, at 1:39 PM, Jeff Hinrichs - DM&T wrote:


scenario, master-slave -- slaves only keep the most recent, while the
master keeps complete. conflict resolution is handled solely by the
master.
scenario, first-among-equals -- multi-master where a single master is
used as the basis for conflict resolution, other masters keep only a
limited rev history and escalate to the first-among-equals when
Eventual Consistency can not be reached do to missing rev history on a
peer.


My initial thought is that Eventual Consistency amongst a set of peers  
is dependent purely on anti-entropy guarantees, no partial  
replication, and deterministic conflict resolution. The idea of  
Eventual Consistency is that the peers will end up with the same  
database contents. Given pruning or revision stemming, this is only  
true if you regard database equality as not including revision history  
e.g. it's the head revision + conflicts.


But then I developed a thought experiment where this wasn't clear - in  
particular whether you can rely on conflicting versions being included  
in the state. Consider 4 peers, with one document, each peer having a  
different revision.


P1 = [ A1 ] & P2 = [ A2 ] & P3 = [ A3 ] & P4 = [ A4 ]

Now bidirectionally replicate P1 & P2: [ A1, A5=A2/A1 ] where A4=A3/A2  
means A4 is the head revision, identical to A3 but with a conflict  
reference to A2. This is Eventual Consistency for P1 & P2.


Now bidirectionally replicate P3 & P4: [ A3, A6=A4/A3 ]. This is  
Eventual Consistency for P1 & P2.


Now replicate P1 and P3. Obviously one of { A5, A6 } will be chosen  
deterministically, but are the conflicts chained? Do you end up with  
[ A2, A3, A5=A2/A1, A7=A4/[A3,A5] ]. How is it intended to treat the  
conflicts already present in documents undergoing conflict resolution?


Even writing this out is hard.

Antony Blakey
--
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Always have a vision. Why spend your life making other people’s dreams?
 -- Orson Welles (1915-1985)



Re: Fail on a simple case on replication

2009-02-23 Thread Antony Blakey


On 24/02/2009, at 12:51 PM, Antony Blakey wrote:

The project founder and the PMC, are all committed to that  
replication model, which is derived from Notes.


BTW I'm the only one in the community that has expressed any strong  
desire to change this - I'm not implying any community division, just  
pointing out that it's both an historical artifact, and accepted by  
the major contributors and committers.


Antony Blakey
--
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Plurality is not to be assumed without necessity
  -- William of Ockham (ca. 1285-1349)