Re: Transactional _bulk_docs

2009-02-05 Thread Antony Blakey
I'm not keen on prolonging this agony, but I am going to respond to  
these points.


On 06/02/2009, at 3:43 PM, Paul Davis wrote:


A brief history:

1. The mythical IRC conversation on 'removing' the feature: (roughly  
quoted)


It wasn't mythical - Damien has stated that is what happened. Why do  
you use the word 'mythical'?



Damien: I don't think we can support transactional commits in the face
of multiple nodes. We can do ACID writes to disk so that updates
aren't lost, but checking with an unbounded number of servers that a
commit doesn't conflict isn't feasible.


Not unbounded. Look at Scalaris. And in any case, what exactly is this  
multi-node mode? Why compromise the API for something that is so  
ephemeral that conflict management isn't feasible? What IS feasible?  
View consistency? MVCC semantics. If I write a document and then read  
it, do I maybe see an earlier document. What about in the view?  
Because if views are going to have a consistency guarantee wrt.  
writes, then that looks to me like distributed MVCC. Is this not 2PC?  
And if views aren't consistent, then why bother? Why not have a client  
layer that does content-directed load balancing.


Regardless, this discussion is also about whether supporting a single- 
node style of operation is useful, because CouchDB had an ACID  
_bulk_docs. IMO it's also about, the danger/cost of abstracting over  
operational models with considerably different operational  
characteristics - c.f. transparent distribution in object models.



Everyone else: That's pretty reasonable.


Everyone == ? Damien told me explicitly that this change was decided,  
and that decision was made on IRC.


What's with this revisionist history Paul?


2. A patch was applied to trunk that made commits to CouchDB
optionally ACID compliant (which gives users the traditional
speed/safety choice) as well as removing the atomic 'all or none'
semantics.


If it's not all or none then it's not ACID with respect to the user  
data model. Conflicts are a metadata feature.



Near as I can tell Damien has been nose to the grindstone for quite
some time on this very specific part of the api. Would I like more
status updates and ideas on where he's heading? Of course. Do I trust
him? Yes. Is the community as a whole going to blindly accept some
asinine patch that has no value that removes a crap load of
functionality? No.


Is the PMC going to accept some massive patch that has significant  
benefit, but as a side effect removes a key feature, for no good  
technical reason? That's what is happening. Damien's patches are  
neither asinine, nor of no value. On IRC Chris Anderson noted in  
response to a question that Damien has a heap of changes coming, but  
that we (the community) have to wait and see what they are.



I tend to think that the 'discussion' that everyone keeps referring to
hasn't even occurred yet. I look at the patch that was applied that
caused this as an unfortunate early release.


And if commits don't represent some sort of decision, what are they? I  
saw the patch, thought WTF?, asked about it, and was eventually told  
that yes, a decision had been made that the ACID API was being removed.



Admissions first: I have no money riding on this issue. Whether or not
CouchDB has transactional _bulk_docs worries me not at all. Though, I
can't say that I have that much sympathy for a business model that
relies on an open source project's trunk to remain compatible with
required assumptions.


Having an ACID guarantee explicitly stated, and then removed with no  
replacement, is not a 'required assumption' on my part. ACID is a big  
deal.


And in any case, my 'business model' response is to fork CouchDB,  
which is the appropriate response. But still, do you want people to  
use this project or not? Promote it or not? What message does that send?



Reductio ad absurdum:


That's about right.

Antony Blakey
--
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Don't anthropomorphize computers. They hate that.




Re: Transactional _bulk_docs

2009-02-05 Thread Antony Blakey


On 06/02/2009, at 12:04 AM, Damien Katz wrote:


He mailed us privately. Now he's mailed us publicly.


BTW: Noah took me to task for emailing you privately, so I forwarded  
the email to the list. I was trying to get a resolution without  
fanning the flames.


Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Every task involves constraint,
Solve the thing without complaint;
There are magic links and chains
Forged to loose our rigid brains.
Structures, structures, though they bind,
Strangely liberate the mind.
  -- James Fallen




Re: Transactional _bulk_docs

2009-02-05 Thread Paul Davis
On Thu, Feb 5, 2009 at 10:02 PM, Antony Blakey  wrote:
>
> On 06/02/2009, at 6:20 AM, Chris Anderson wrote:
>
>> Antony, maybe it would help for you to explain just exactly what you
>> wouldn't be able to do, without the bulk docs API. It will help to
>> inform people about the technical issue.
>
>
> My original email included this:
>
> ---
>
> For example, I have documents that can be cloned. The cloned document
> contains a reference to the originating document. Then I delete the original
> document, the clone history needs to be updated to remove the reference to
> the original document and replace it with an original-deleted history item.
> There is a business case that requires this consistency.
>
> With a transactional API this is easy. Without it, I can't see a way to
> maintain consistency in the face of concurrent application access and/or
> failure.
>
> ---
>
> However, I don't think this is really about a specific example.
>
> The problem is that if you get one side of the relationship written and
> visible, but the other side not, then other concurrent accessors will see a
> partially successful update.
>
> One response is "but you'll see this problem during replication", but I
> think this is making a big assumption about how replication is
> managed/interleaved with local application behaviour.
>
> Replication, and dealing with conflicts, is in no way automatic. As others
> have stated, there is no domain-independent way of resolving conflicts.
> Surely if it were possible to build a transactional API on top of a
> conflict-based system, then this statement would not be true?
>
> I am deploying CouchDB like a Notes CLIENT. Not as a high-performance
> database server. Replication is an explicit operation, that halts normal
> activity. For my first delivery, replicas are read-only, so replication
> conflict isn't possible, but when I move to a distributed writers scenario,
> resolving replication conflicts will involve a specialized UI, that allows
> all conflicts to be resolved before normal operation resumes. Thus the
> editing application always sees a conflict-free database.
>
> The use-case of someone doing a local operation e.g. submitting a web form,
> is very different than resolving replication conflicts. Conflict during a
> local operation is a matter of application concurrency, whereas conflict
> during replication is driven by the overall system model. It has different
> temporal, administrative and UI boundaries.
>
> In short, I think it is a mistake to try and hide the different
> characteristics of local (even clustered) operations, and replication. You
> may disagree, but if the system distinguishes between these two
> fundamentally different things (distinguished by their partition-tolerance),
> you can code as though every operation leads to conflict if you wish, but I
> can't take advantage of the difference.
>
>> I know that the long-standing vision of Couch doesn't include special
>> API exceptions for when you are running on a single node. And I'm a
>> little afraid that the transactional doc commits Antony wants us to
>> keep, are only a mirage, which would lead to trouble anyway, when
>> distributed systems are involved.
>
> I don't understand why this needs to be the case. You can do transactions in
> distributed systems. Do you have a model that isn't amenable to a Scalaris
> treatment? Especially given that we're only talking about transactions over
> a set of processes that are providing an illusion of a single system. Such a
> cluster already requires some degree of partion-tolerance, right? And if
> not, then what distinguishes a cluster from a partition-tolerant p2p mesh?
>
> Antony Blakey
> -
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> The fact that an opinion has been widely held is no evidence whatever that
> it is not utterly absurd.
>  -- Bertrand Russell
>
>
>

I'm upset that CouchDB doesn't make me coffee in the morning.

But the thing is, CouchDB is totally willing to make you coffee *and*
bacon. It loves you *that* much.

Enough with the silly. I've watched this drama avalanche for awhile
and I finally think it's time for me to put out a few words on what
I've seen.

A brief history:

1. The mythical IRC conversation on 'removing' the feature: (roughly quoted)

Damien: I don't think we can support transactional commits in the face
of multiple nodes. We can do ACID writes to disk so that updates
aren't lost, but checking with an unbounded number of servers that a
commit doesn't conflict isn't feasible.

Everyone else: That's pretty reasonable.

2. A patch was applied to trunk that made commits to CouchDB
optionally ACID compliant (which gives users the traditional
speed/safety choice) as well as removing the atomic 'all or none'
semantics.

3. Huge ML threads.

History complete.

Current status (through my eyes):

Near as I can tell Dam

[jira] Commented: (COUCHDB-209) Refine httpd_db_handlers API to map the externals from paths directly to scripts

2009-02-05 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670998#action_12670998
 ] 

Paul Joseph Davis commented on COUCHDB-209:
---

I've been thinking about this ticket for a bit. I understand the desire to 
minimize the number of INI edits etc, but something keeps bothering me about 
removing the [external] section entirely.

Also, is this really 0.9 blocking? I would vote that this isn't an API 
incompatibility.

> Refine httpd_db_handlers API to map the externals from paths directly to 
> scripts
> 
>
> Key: COUCHDB-209
> URL: https://issues.apache.org/jira/browse/COUCHDB-209
> Project: CouchDB
>  Issue Type: Improvement
>  Components: HTTP Interface
>Affects Versions: 0.9
> Environment: all
>Reporter: Jeff Hinrichs
>Assignee: Chris Anderson
>Priority: Blocker
> Fix For: 0.9
>
>
> We could change the API to map the externals from paths directly to
> scripts, like
> [httpd_db_handlers]
> _mypath = {couch_httpd_external, handle_external_req, "/path/to/my/script"}
> which would be fine by me.
> The current code is like it is because the original implementation was
> designed to have multiple scripts mounted at the /db/_external path.
> Do you mind opening a ticket about this? - I'm happy to write the code
> but I'm supposed to be working on the book right now, so it'll have to
> wait.
> link to mail list thread:
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c5aaed53f0901120631v112916eewcc50e96c44728...@mail.gmail.com%3e
> It would appear to be a good solution to allow the flexibility desired while 
> narrowing the number of local.ini edits to accomplish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-128) couchdb is not starting properly from init.d script in trunk

2009-02-05 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670996#action_12670996
 ] 

Paul Joseph Davis commented on COUCHDB-128:
---

Is there no more information on this ticket? I routinely us 
/usr/local/etc/init.d/couchdb to start and stop couchdb on Linux.

> couchdb is not starting properly from init.d script in trunk
> 
>
> Key: COUCHDB-128
> URL: https://issues.apache.org/jira/browse/COUCHDB-128
> Project: CouchDB
>  Issue Type: Bug
>  Components: Build System
>Reporter: Noah Slater
>Assignee: Noah Slater
>Priority: Blocker
> Fix For: 0.9
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method

2009-02-05 Thread Antony Blakey


On 06/02/2009, at 8:36 AM, Paul Davis wrote:


I've been pondering this issue of the weird _design/ doc hack.


IMO it's not possible to get this hack right because it doesn't  
acknowledge the reality of Unicode.



I'd
either agree with Zach on having separately named keys for open or
right on *both* ends, or specific to the string and array types, a
startswith parameter. I don't much like the startswith idea though as
it's not generally applicable.


I posted a description of prefixkey with an interpretation over all  
JSON values.


http://mail-archives.apache.org/mod_mbox/couchdb-dev/200901.mbox/%3c67c42c78-4f52-409a-847b-f545f664d...@gmail.com%3e

Antony Blakey
--
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If at first you don’t succeed, try, try again. Then quit. No use being  
a damn fool about it

  -- W.C. Fields



Re: Transactional _bulk_docs

2009-02-05 Thread Antony Blakey

Ooops...

On 06/02/2009, at 1:32 PM, Antony Blakey wrote:

In short, I think it is a mistake to try and hide the different  
characteristics of local (even clustered) operations, and  
replication. You may disagree, but if the system distinguishes  
between these two fundamentally different things (distinguished by  
their partition-tolerance), you can code as though every operation  
leads to conflict if you wish, but I can't take advantage of the  
difference.


... if the system doesn't distinguish between those two cases.  
Distinguishing between the two cases allows for a wider range of uses  
and application models.


Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

All that is required for evil to triumph is that good men do nothing.




Re: Transactional _bulk_docs

2009-02-05 Thread Antony Blakey


On 06/02/2009, at 6:20 AM, Chris Anderson wrote:


Antony, maybe it would help for you to explain just exactly what you
wouldn't be able to do, without the bulk docs API. It will help to
inform people about the technical issue.



My original email included this:

---

For example, I have documents that can be cloned. The cloned document  
contains a reference to the originating document. Then I delete the  
original document, the clone history needs to be updated to remove the  
reference to the original document and replace it with an original- 
deleted history item. There is a business case that requires this  
consistency.


With a transactional API this is easy. Without it, I can't see a way  
to maintain consistency in the face of concurrent application access  
and/or failure.


---

However, I don't think this is really about a specific example.

The problem is that if you get one side of the relationship written  
and visible, but the other side not, then other concurrent accessors  
will see a partially successful update.


One response is "but you'll see this problem during replication", but  
I think this is making a big assumption about how replication is  
managed/interleaved with local application behaviour.


Replication, and dealing with conflicts, is in no way automatic. As  
others have stated, there is no domain-independent way of resolving  
conflicts. Surely if it were possible to build a transactional API on  
top of a conflict-based system, then this statement would not be true?


I am deploying CouchDB like a Notes CLIENT. Not as a high-performance  
database server. Replication is an explicit operation, that halts  
normal activity. For my first delivery, replicas are read-only, so  
replication conflict isn't possible, but when I move to a distributed  
writers scenario, resolving replication conflicts will involve a  
specialized UI, that allows all conflicts to be resolved before normal  
operation resumes. Thus the editing application always sees a conflict- 
free database.


The use-case of someone doing a local operation e.g. submitting a web  
form, is very different than resolving replication conflicts. Conflict  
during a local operation is a matter of application concurrency,  
whereas conflict during replication is driven by the overall system  
model. It has different temporal, administrative and UI boundaries.


In short, I think it is a mistake to try and hide the different  
characteristics of local (even clustered) operations, and replication.  
You may disagree, but if the system distinguishes between these two  
fundamentally different things (distinguished by their partition- 
tolerance), you can code as though every operation leads to conflict  
if you wish, but I can't take advantage of the difference.



I know that the long-standing vision of Couch doesn't include special
API exceptions for when you are running on a single node. And I'm a
little afraid that the transactional doc commits Antony wants us to
keep, are only a mirage, which would lead to trouble anyway, when
distributed systems are involved.


I don't understand why this needs to be the case. You can do  
transactions in distributed systems. Do you have a model that isn't  
amenable to a Scalaris treatment? Especially given that we're only  
talking about transactions over a set of processes that are providing  
an illusion of a single system. Such a cluster already requires some  
degree of partion-tolerance, right? And if not, then what  
distinguishes a cluster from a partition-tolerant p2p mesh?


Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The fact that an opinion has been widely held is no evidence whatever  
that it is not utterly absurd.

  -- Bertrand Russell




Re: [jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method

2009-02-05 Thread Paul Davis
I've been pondering this issue of the weird _design/ doc hack. I'd
either agree with Zach on having separately named keys for open or
right on *both* ends, or specific to the string and array types, a
startswith parameter. I don't much like the startswith idea though as
it's not generally applicable.

Also, did I miss what you'd pass in the _design doc scenario as end
key assuming right open semantics?

On Thu, Feb 5, 2009 at 4:57 PM, Zachary Zolton  wrote:
> Maximillian,
>
> I'd think both _could_ be useful.
>
> I mean in Ruby we have both for the right-hand boundary of ranges:
>  irb(main):005:0> (1..5).max
>  => 5
>  irb(main):006:0> (1...5).max
>  => 4
>
> IMHO, it would be better to use a different pair of parameter names,
> such that we could easily distinguish between open and closed bounds.
>
>
> Cheers,
>
> Zach
>
>
> PS. Is it "Maximillian" or "Max"?  :^D
>
> On Thu, Feb 5, 2009 at 3:32 PM, Maximillian Dornseif (JIRA)
>  wrote:
>>
>>[ 
>> https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670911#action_12670911
>>  ]
>>
>> Maximillian Dornseif commented on COUCHDB-194:
>> --
>>
>> So far nobody seems against it.
>>
>> The downside is that it MIGHT break some existing code.
>>
>>> [startkey, endkey[: provide a right-open range selection method
>>> ---
>>>
>>> Key: COUCHDB-194
>>> URL: https://issues.apache.org/jira/browse/COUCHDB-194
>>> Project: CouchDB
>>>  Issue Type: Improvement
>>>  Components: HTTP Interface
>>>Affects Versions: 0.9
>>>Reporter: Maximillian Dornseif
>>>Priority: Blocker
>>> Fix For: 1.0
>>>
>>>
>>> While writing something about using CouchDB I came across the issue of 
>>> "slice indexes" (called startkey and endkey in CouchDB lingo).
>>> I found no exact definition of startkey and endkey anywhere in the 
>>> documentation. Testing reveals that access on _all_docs and on views 
>>> documents are retuned in the interval
>>> [startkey, endkey] = (startkey <= k <= endkey).
>>> I don't know if this was a conscious design decision. But I like to promote 
>>> a slightly different interpretation (and thus API change):
>>> [startkey, endkey[ = (startkey <= k < endkey).
>>> Both approaches are valid and used in the real world. Ruby uses the 
>>> inclusive ("right-closed" in math speak) first approach:
>>> >> l = [1,2,3,4]
>>> >> l.slice(1,2)
>>> => [2, 3]
>>> Python uses the exclusive ("right-open" in math speak) second approach:
>>> >>> l = [1,2,3,4]
>>> >>> l[1,2]
>>> [2]
>>> For array indices both work fine and which one to prefer is mostly an issue 
>>> of habit. In spoken language both approaches are used: "Have the Software 
>>> done until saturday" probably means right-open to the client and 
>>> right-closed to the coder.
>>> But if you are working with keys that are more than array indexes, then 
>>> right-open is much easier to handle. That is because you have to *guess* 
>>> the biggest value you want to get. The Wiki at 
>>> http://wiki.apache.org/couchdb/View_collation contains an example of that 
>>> problem:
>>> It is suggested that you use
>>> startkey="_design/"&endkey="_design/Z"
>>> or
>>> startkey="_design/"&endkey="_design/\u″
>>> to get a list of all design documents - also the replication system in the 
>>> db core uses the same hack.
>>> This breaks if a design document is named "ZTop" or 
>>> "\Iñtërnâtiônàlizætiøn". Such names might be unlikely but we are 
>>> computer scientists; "unlikely" is a bad approach to software engineering.
>>> The think what we really want to ask CouchDB is to "get all documents with 
>>> keys starting with '_design/'".
>>> This is basically impossible to do with right-closed intervals. We could 
>>> use startkey="_design/"&endkey="_design0″ ('0′ is the ASCII character after 
>>> '/') and this will work fine ... until there is actually a document with 
>>> the key "_design0″ in the system. Unlikely, but ...
>>> To make selection by intervals reliable currently clients have to guess the 
>>> last key (the  approach) or use the fist key not to include (the 
>>> _design0 approach) and then post process the result to remove the last 
>>> element returned if it exactly matches the given endkey value.
>>> If couchdb would change to a right-open interval approach post processing 
>>> would go away in most cases. See 
>>> http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
>>>  for two real world examples.
>>> At least for string keys and float keys changing the meaning to [startkey, 
>>> endkey[ would allow selections like
>>> * "all strings starting with 'abc'"
>>> * all numbers between 10.5 and 11
>>> It also would hopefully break not to much existing code. Since the notion 
>>> of end

Re: [jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method

2009-02-05 Thread Zachary Zolton
Maximillian,

I'd think both _could_ be useful.

I mean in Ruby we have both for the right-hand boundary of ranges:
  irb(main):005:0> (1..5).max
  => 5
  irb(main):006:0> (1...5).max
  => 4

IMHO, it would be better to use a different pair of parameter names,
such that we could easily distinguish between open and closed bounds.


Cheers,

Zach


PS. Is it "Maximillian" or "Max"?  :^D

On Thu, Feb 5, 2009 at 3:32 PM, Maximillian Dornseif (JIRA)
 wrote:
>
>[ 
> https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670911#action_12670911
>  ]
>
> Maximillian Dornseif commented on COUCHDB-194:
> --
>
> So far nobody seems against it.
>
> The downside is that it MIGHT break some existing code.
>
>> [startkey, endkey[: provide a right-open range selection method
>> ---
>>
>> Key: COUCHDB-194
>> URL: https://issues.apache.org/jira/browse/COUCHDB-194
>> Project: CouchDB
>>  Issue Type: Improvement
>>  Components: HTTP Interface
>>Affects Versions: 0.9
>>Reporter: Maximillian Dornseif
>>Priority: Blocker
>> Fix For: 1.0
>>
>>
>> While writing something about using CouchDB I came across the issue of 
>> "slice indexes" (called startkey and endkey in CouchDB lingo).
>> I found no exact definition of startkey and endkey anywhere in the 
>> documentation. Testing reveals that access on _all_docs and on views 
>> documents are retuned in the interval
>> [startkey, endkey] = (startkey <= k <= endkey).
>> I don't know if this was a conscious design decision. But I like to promote 
>> a slightly different interpretation (and thus API change):
>> [startkey, endkey[ = (startkey <= k < endkey).
>> Both approaches are valid and used in the real world. Ruby uses the 
>> inclusive ("right-closed" in math speak) first approach:
>> >> l = [1,2,3,4]
>> >> l.slice(1,2)
>> => [2, 3]
>> Python uses the exclusive ("right-open" in math speak) second approach:
>> >>> l = [1,2,3,4]
>> >>> l[1,2]
>> [2]
>> For array indices both work fine and which one to prefer is mostly an issue 
>> of habit. In spoken language both approaches are used: "Have the Software 
>> done until saturday" probably means right-open to the client and 
>> right-closed to the coder.
>> But if you are working with keys that are more than array indexes, then 
>> right-open is much easier to handle. That is because you have to *guess* the 
>> biggest value you want to get. The Wiki at 
>> http://wiki.apache.org/couchdb/View_collation contains an example of that 
>> problem:
>> It is suggested that you use
>> startkey="_design/"&endkey="_design/Z"
>> or
>> startkey="_design/"&endkey="_design/\u″
>> to get a list of all design documents - also the replication system in the 
>> db core uses the same hack.
>> This breaks if a design document is named "ZTop" or 
>> "\Iñtërnâtiônàlizætiøn". Such names might be unlikely but we are 
>> computer scientists; "unlikely" is a bad approach to software engineering.
>> The think what we really want to ask CouchDB is to "get all documents with 
>> keys starting with '_design/'".
>> This is basically impossible to do with right-closed intervals. We could use 
>> startkey="_design/"&endkey="_design0″ ('0′ is the ASCII character after '/') 
>> and this will work fine ... until there is actually a document with the key 
>> "_design0″ in the system. Unlikely, but ...
>> To make selection by intervals reliable currently clients have to guess the 
>> last key (the  approach) or use the fist key not to include (the 
>> _design0 approach) and then post process the result to remove the last 
>> element returned if it exactly matches the given endkey value.
>> If couchdb would change to a right-open interval approach post processing 
>> would go away in most cases. See 
>> http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
>>  for two real world examples.
>> At least for string keys and float keys changing the meaning to [startkey, 
>> endkey[ would allow selections like
>> * "all strings starting with 'abc'"
>> * all numbers between 10.5 and 11
>> It also would hopefully break not to much existing code. Since the notion of 
>> endkey seems to be already considered "fishy" (see the Z approach) most 
>> code seems to try to avoid that issue. For example 
>> 'startkey="_design/"&endkey="_design/Z"' still would work unless you 
>> have a design document being named exactly "Z".
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


[jira] Created: (COUCHDB-240) Replication breaks with large Attachments.

2009-02-05 Thread Maximillian Dornseif (JIRA)
Replication breaks with large Attachments.
--

 Key: COUCHDB-240
 URL: https://issues.apache.org/jira/browse/COUCHDB-240
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.9
 Environment: r 741265. Debian Linux unknown revision, FreeBSD 7.0. 
GBit Network connection between the hosts.
Reporter: Maximillian Dornseif


I use the code in http://code.google.com/p/couchdb-python/issues/detail?id=54 
to do replication between two machines.

I'm running 741265 on both machines. I have a Database with big attachments 
(high-res images, 31.1 GB,  34026 Docs). "Pull" replication breaks with 
following message sent via http:

couchdb.client.ServerError: (500, ('function_clause', 
"[{lists,map,[#Fun,ok]},\n 
{couch_rep,open_doc_revs,4},\n {couch_rep,'-enum_docs_parallel/3-fun-1-',3},\n 
{couch_rep,'-spawn_worker/3-fun-0-',3}]"))

With "push" replication the server just drops the connection  
(httplib2/__init__.py", line 715, in connect
socket.error: (61, 'Connection refused') - why "refused" instead of "closed"?). 
I have only been able to replicate the first 100 documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method

2009-02-05 Thread Maximillian Dornseif (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670911#action_12670911
 ] 

Maximillian Dornseif commented on COUCHDB-194:
--

So far nobody seems against it.

The downside is that it MIGHT break some existing code.

> [startkey, endkey[: provide a right-open range selection method
> ---
>
> Key: COUCHDB-194
> URL: https://issues.apache.org/jira/browse/COUCHDB-194
> Project: CouchDB
>  Issue Type: Improvement
>  Components: HTTP Interface
>Affects Versions: 0.9
>Reporter: Maximillian Dornseif
>Priority: Blocker
> Fix For: 1.0
>
>
> While writing something about using CouchDB I came across the issue of "slice 
> indexes" (called startkey and endkey in CouchDB lingo).
> I found no exact definition of startkey and endkey anywhere in the 
> documentation. Testing reveals that access on _all_docs and on views 
> documents are retuned in the interval
> [startkey, endkey] = (startkey <= k <= endkey).
> I don't know if this was a conscious design decision. But I like to promote a 
> slightly different interpretation (and thus API change):
> [startkey, endkey[ = (startkey <= k < endkey).
> Both approaches are valid and used in the real world. Ruby uses the inclusive 
> ("right-closed" in math speak) first approach:
> >> l = [1,2,3,4]
> >> l.slice(1,2)
> => [2, 3]
> Python uses the exclusive ("right-open" in math speak) second approach:
> >>> l = [1,2,3,4]
> >>> l[1,2]
> [2]
> For array indices both work fine and which one to prefer is mostly an issue 
> of habit. In spoken language both approaches are used: "Have the Software 
> done until saturday" probably means right-open to the client and right-closed 
> to the coder.
> But if you are working with keys that are more than array indexes, then 
> right-open is much easier to handle. That is because you have to *guess* the 
> biggest value you want to get. The Wiki at 
> http://wiki.apache.org/couchdb/View_collation contains an example of that 
> problem:
> It is suggested that you use
> startkey="_design/"&endkey="_design/Z"
> or
> startkey="_design/"&endkey="_design/\u″
> to get a list of all design documents - also the replication system in the db 
> core uses the same hack.
> This breaks if a design document is named "ZTop" or 
> "\Iñtërnâtiônàlizætiøn". Such names might be unlikely but we are computer 
> scientists; "unlikely" is a bad approach to software engineering.
> The think what we really want to ask CouchDB is to "get all documents with 
> keys starting with '_design/'".
> This is basically impossible to do with right-closed intervals. We could use 
> startkey="_design/"&endkey="_design0″ ('0′ is the ASCII character after '/') 
> and this will work fine ... until there is actually a document with the key 
> "_design0″ in the system. Unlikely, but ...
> To make selection by intervals reliable currently clients have to guess the 
> last key (the  approach) or use the fist key not to include (the _design0 
> approach) and then post process the result to remove the last element 
> returned if it exactly matches the given endkey value.
> If couchdb would change to a right-open interval approach post processing 
> would go away in most cases. See 
> http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
>  for two real world examples.
> At least for string keys and float keys changing the meaning to [startkey, 
> endkey[ would allow selections like
> * "all strings starting with 'abc'"
> * all numbers between 10.5 and 11
> It also would hopefully break not to much existing code. Since the notion of 
> endkey seems to be already considered "fishy" (see the Z approach) most 
> code seems to try to avoid that issue. For example 
> 'startkey="_design/"&endkey="_design/Z"' still would work unless you 
> have a design document being named exactly "Z".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-135) Offset regression between 0.8.0 and trunk

2009-02-05 Thread Paul Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670893#action_12670893
 ] 

Paul Carey commented on COUCHDB-135:


The new patch nails this issue. I've run all the tests from my lib a few 
hundred times. No failures. Happy days!

> Offset regression between 0.8.0 and trunk
> -
>
> Key: COUCHDB-135
> URL: https://issues.apache.org/jira/browse/COUCHDB-135
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 0.9
> Environment: OSX 10.5
>Reporter: Paul Carey
>Priority: Blocker
> Fix For: 0.9
>
> Attachments: COUCHDB-135.patch, COUCHDB-135.patch, view_offsets.js, 
> view_offsets2.js
>
>
> The offset returned for certain map queries differs between 0.8.0 and 
> 0.9.0r702929.
> The attached test can be pasted into couch_tests.js. It passes in 0.8.0 and 
> fails in 0.9.
> I believe the skip query param must be passed for this bug to be exhibited. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-135) Offset regression between 0.8.0 and trunk

2009-02-05 Thread Paul Joseph Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-135:
--

Attachment: COUCHDB-135.patch

I'm pretty sure that COUCHDB-135 was actually (at least) two different bugs. 
One with propogating row count reductions, the other has to do with when a skip 
is specified and you skip out of the first KV node. Hopefully this fix works.

> Offset regression between 0.8.0 and trunk
> -
>
> Key: COUCHDB-135
> URL: https://issues.apache.org/jira/browse/COUCHDB-135
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 0.9
> Environment: OSX 10.5
>Reporter: Paul Carey
>Priority: Blocker
> Fix For: 0.9
>
> Attachments: COUCHDB-135.patch, COUCHDB-135.patch, view_offsets.js, 
> view_offsets2.js
>
>
> The offset returned for certain map queries differs between 0.8.0 and 
> 0.9.0r702929.
> The attached test can be pasted into couch_tests.js. It passes in 0.8.0 and 
> fails in 0.9.
> I believe the skip query param must be passed for this bug to be exhibited. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Transactional _bulk_docs

2009-02-05 Thread Chris Anderson
On Thu, Feb 5, 2009 at 5:34 AM, Damien Katz  wrote:
> We are going to discuss this on the ML. I was waiting until I got the patch
> work to talk about all the implications and how we'd set the flags and modes
> of operation and all the implications. The code is going to get more
> powerful, the plan is for the feature to go away, not the capability. If we
> decided the feature was too important, we'll put it back. But as it stands,
> the changes to the code that I'm making now all need to be made regardless
> if we change the feature or not.

I agree that we should discuss this on the mailing list, and take a
formal vote when we're ready to. I'm glad we're talking about the
patch, but I can see why Damien would rather finish writing it before
we take it apart on here.

I opened my response to the this thread by asking for interested
parties to discuss how one would implement the bulk_docs feature on
top of the capabilities that CouchDB will make available.

I'm still coming to understand those new capabilities. I think I'll
need to see Damien's patch before I can have any considered opinion of
it. For instance, I am not comfortable holding a vote until we've had
time to understand the code.

On Thu, Feb 5, 2009 at 5:46 AM, Jan Lehnardt  wrote:
> Hi,
>
> *pouring water over the fire*
>
> The progression of this is very unfortunate. There was no formal discussion,
> neither on IRC or a mailing list. We are all aware of the ASF ways of
> running
> a project and we didn't handle that one well.
>
> Apologies.

I agree that we as a PMC should strive to be more transparent in the
future. Making the transactional _bulk_docs API available in the first
place was a hard decision, and it's not clear that it was the right
decision (although it did make testing ACID transactions easier).

The CouchDB project came into the Incubator with a lot of momentum and
direction, and I consider part of my role with the project, to help
insulate Damien from the mailing-list chatter, especially when he's
deep in code. I acknowledge that could be a mistake as well, if it
leads to community misapprehension.

> The whole thing started because I closed a bug with a comment that
> there must be an _upcoming discussion_.

I sympathize with Antony's predicament. He's been using bulk doc
transactions in a high-pressure environment, and it works for him.
It's understandable that he'd be upset, first hearing about the patch
like this.

I know that the long-standing vision of Couch doesn't include special
API exceptions for when you are running on a single node. And I'm a
little afraid that the transactional doc commits Antony wants us to
keep, are only a mirage, which would lead to trouble anyway, when
distributed systems are involved.

I think a consensus algorithm client library could provide the same
semantics as the current feature, even on a cluster. An implementation
would let Antony keep his feature, even on larger clusters. It could
easily be included as an Erlang plugin.

Couch has a way of forcing developers to rethink their applications in
order to make them fit into its mode of operation. I think if we
approach the problem from a technical angle, it will help everyone to
have an informed opinion about the patch.

I'd been hoping to hold this discussion until Damien makes his code
available, as I think that's when it'd be most appropriate.

Antony, maybe it would help for you to explain just exactly what you
wouldn't be able to do, without the bulk docs API. It will help to
inform people about the technical issue.

Sincerely,
Chris

-- 
Chris Anderson
http://jchris.mfdz.com


[jira] Updated: (COUCHDB-190) _uuid should respond to GET, not POST

2009-02-05 Thread Zachary Zolton (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zachary Zolton updated COUCHDB-190:
---

Attachment: COUCH-190.diff

Patch to fix COUCH-190:

  * Changed /_uuids action to GET instead of POST
  * Added broadly-compatible cache-busting headers to response
  * Added to unit tests to JavaScript suite

> _uuid should respond to GET, not POST
> -
>
> Key: COUCHDB-190
> URL: https://issues.apache.org/jira/browse/COUCHDB-190
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.9
>Reporter: Matt Goodall
>Priority: Blocker
> Fix For: 0.9
>
> Attachments: COUCH-190.diff
>
>
> The /_uuid resource can happily return a response to a GET without being 
> unresty. In fact, supporting POST is probably incorrect as it implies it 
> would change server state.
> Quick summary:
> * _uuid never changes server state
> * calling _uuid multiple times does not impact other clients
> * that the resource returns something different each time it is requested 
> does not mean it cannot be a POST
> * GET with proper cache control (i.e. don't cache it ever) will work equally 
> well
> Full discussion can be found on the user m.l., 
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c21939021.1440421230910477169.javamail.serv...@perfora%3e.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-190) _uuid should respond to GET, not POST

2009-02-05 Thread Zachary Zolton (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zachary Zolton updated COUCHDB-190:
---

Attachment: (was: COUCHDB-190.diff)

> _uuid should respond to GET, not POST
> -
>
> Key: COUCHDB-190
> URL: https://issues.apache.org/jira/browse/COUCHDB-190
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.9
>Reporter: Matt Goodall
>Priority: Blocker
> Fix For: 0.9
>
>
> The /_uuid resource can happily return a response to a GET without being 
> unresty. In fact, supporting POST is probably incorrect as it implies it 
> would change server state.
> Quick summary:
> * _uuid never changes server state
> * calling _uuid multiple times does not impact other clients
> * that the resource returns something different each time it is requested 
> does not mean it cannot be a POST
> * GET with proper cache control (i.e. don't cache it ever) will work equally 
> well
> Full discussion can be found on the user m.l., 
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c21939021.1440421230910477169.javamail.serv...@perfora%3e.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Transactional _bulk_docs

2009-02-05 Thread Ted Leung

On Feb 5, 2009, at 3:14 AM, Geir Magnusson Jr. wrote:

And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have meant  
that anyway.  Sure, I can see a chat starting on IRC about a topic,  
but I'd hope that one person would force the move from IRC to the  
mail list - and at that point, maybe posting a pointer to the  
*initial* discussion log would be useful.  And after that,  
discussion is on the mail list.


Ok, I see that I was unclear.  What I said was that the act of making  
the decision must happen on the list.   So the pointer to the IRC  
discussion is the background to a mailing list discussion to actually  
make the decision.   During that discussion I'd expect other voices to  
be heard, issues to be raised, etc.   I didn't mean that all you had  
to do was send the pointer to indicate that a decision had been made.


Ted


Re: Transactional _bulk_docs

2009-02-05 Thread Geir Magnusson Jr.

Sure, ideally.

But you can't have "everyone" together at the same time on IRC, where  
at the ASF, we define "everyone" to be, well, "everyone", not you and  
the 4 others on the PMC.


I see 579 people on the user list.  I see 294 people on the dev list.   
Just focusing on the dev list, that's 290 people, or 98.6% of people  
supposedly interested in CouchDB development, that had zero  
opportunity to see, review and participate in the discussion.   
Further, there's now zero chance that any future project participant  
can look back to understand design decision and philosophy.  No  
institutional memory.  Your goal, besides building a great software  
project, should be to get the community to the point where you can  
step back and do other things w/o material effect on the community,  
and that requires information like this to be somewhere accessible.


And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have meant  
that anyway.  Sure, I can see a chat starting on IRC about a topic,  
but I'd hope that one person would force the move from IRC to the mail  
list - and at that point, maybe posting a pointer to the *initial*  
discussion log would be useful.  And after that, discussion is on the  
mail list.


I think IRC logs are a very poor substitute to mail traffic (and yes,  
I grok the downside of async communications).  A primary one reason  
that they are very "in the moment" - if you are in the conversation,  
it's easy to stay in, but after, when things cool and the context of  
the moment isn't there, it's neigh impossible.  You also can't hit  
reply and quote a piece for others to see and discuss, further  
broadening the discussion.


What got me engaged on this wasn't the decision itself (only because  
it was a secret decision), but -like Ted - the mode of operation.  It  
seemed that a very dedicated, engaged and interested community member  
had to privately petition the PMC for redress on a technical decision  
that none of us had any awareness of, nor a chance to review.  And  
IMO, from a guy that probably should be a committer and PMC member to  
boot!


(By the way - from my count, not all PMC members are even on the PMC's  
private@ list, so I have *no clue* where project private discussion -  
like new committer candidates - are even discussed)


geir


On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:

Ideally yes, but real time communication with everyone together is  
damn useful.


-Damien

On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:

Uh, project decisions are supposed to be made in the public mailing  
lists...


Ted

On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:


This decision was discussed and made on IRC.

-Damien

On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:

can you point me to a reference to where the PMC made this  
decision?


I'm interested in the subject for it's own sake, and I'm also  
interested in figuring out where decisions are made in this  
project, since I didn't see this one go by on a mail list.


geir

On Feb 4, 2009, at 9:13 PM, Damien Katz wrote:

Geir, there was a decision made by the PMCs to change the  
transaction model to support partitioned databases. It is a  
change I am currently working on.


-Damien

On Feb 4, 2009, at 8:46 PM, Geir Magnusson Jr. wrote:


and original question #2?

geir

On Feb 4, 2009, at 8:38 PM, Antony Blakey wrote:



On 05/02/2009, at 12:02 PM, Geir Magnusson Jr. wrote:


1) where is this being forwarded from ?


I sent it to the PMC.

Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Buddhist walks up to a hot-dog stand and says, "Make me one  
with everything". He then pays the vendor and asks for change.  
The vendor says, "Change comes from within".


















[jira] Updated: (COUCHDB-135) Offset regression between 0.8.0 and trunk

2009-02-05 Thread Paul Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Carey updated COUCHDB-135:
---

Attachment: view_offsets2.js

This imaginatively titled test case does trigger what I believe is a race 
condition. The wrong offset is returned roughly one time in ten (on my machine).

Potentially of interest is that once the wrong offset is returned for a query, 
it's always returned for that same query - i.e. it doesn't appear that the 
query is being satisfied before the index has been fully built.

> Offset regression between 0.8.0 and trunk
> -
>
> Key: COUCHDB-135
> URL: https://issues.apache.org/jira/browse/COUCHDB-135
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 0.9
> Environment: OSX 10.5
>Reporter: Paul Carey
>Priority: Blocker
> Fix For: 0.9
>
> Attachments: COUCHDB-135.patch, view_offsets.js, view_offsets2.js
>
>
> The offset returned for certain map queries differs between 0.8.0 and 
> 0.9.0r702929.
> The attached test can be pasted into couch_tests.js. It passes in 0.8.0 and 
> fails in 0.9.
> I believe the skip query param must be passed for this bug to be exhibited. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Transactional _bulk_docs

2009-02-05 Thread Jan Lehnardt


On 5 Feb 2009, at 14:05, Robert Dionne wrote:
I'm not very familiar with the ASF "process", excuse my ignorance,  
but I find the IRC enormously useful and find mailing list threads  
can be too unwieldy.


Check out http://apache.org/foundation/how-it-works.html for
more about The ASF Way/.

Cheers
Jan
--


I guess it's because I'm not a fan of top down design. I see the  
code itself as the design, and the debugging, reworking, and  
documenting of the code as the construction phase.


Best regards,

Bob

Robert Dionne
Chief Bittwiddler
dio...@dionne-associates.com
203.231.9961



On Feb 5, 2009, at 6:14 AM, Geir Magnusson Jr. wrote:

[sending second time, as I see my first is stuck in moderation, and  
I want to reply in a timely manner]


Sure, ideally.

But you can't have "everyone" together at the same time on IRC,  
where at the ASF, we define "everyone" to be, well, "everyone", not  
you and the 4 others on the PMC.


I see 579 people on the user list.  I see 294 people on the dev  
list.  Just focusing on the dev list, that's 290 people, or 98.6%  
of people supposedly interested in CouchDB development, that had  
zero opportunity to see, review and participate in the discussion.   
Further, there's now zero chance that any future project  
participant can look back to understand design decision and  
philosophy.  No institutional memory.  Your goal, besides building  
a great software project, should be to get the community to the  
point where you can step back and do other things w/o material  
effect on the community, and that requires information like this to  
be somewhere accessible.


And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have  
meant that anyway.  Sure, I can see a chat starting on IRC about a  
topic, but I'd hope that one person would force the move from IRC  
to the mail list - and at that point, maybe posting a pointer to  
the *initial* discussion log would be useful.  And after that,  
discussion is on the mail list.


I think IRC logs are a very poor substitute to mail traffic (and  
yes, I grok the downside of async communications).  A primary one  
reason that they are very "in the moment" - if you are in the  
conversation, it's easy to stay in, but after, when things cool and  
the context of the moment isn't there, it's neigh impossible.  You  
also can't hit reply and quote a piece for others to see and  
discuss, further broadening the discussion.


What got me engaged on this wasn't the decision itself (only  
because it was a secret decision), but -like Ted - the mode of  
operation.  It seemed that a very dedicated, engaged and interested  
community member had to privately petition the PMC for redress on a  
technical decision that none of us had any awareness of, nor a  
chance to review.  And IMO, from a guy that probably should be a  
committer and PMC member to boot!


(By the way - from my count, not all PMC members are even on the  
PMC's private@ list, so I have *no clue* where project private  
discussion - like new committer candidates - are even discussed)


geir

On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:

Ideally yes, but real time communication with everyone together is  
damn useful.


-Damien

On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:

Uh, project decisions are supposed to be made in the public  
mailing lists...


Ted

On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:


This decision was discussed and made on IRC.

-Damien

On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:

can you point me to a reference to where the PMC made this  
decision?


I'm interested in the subject for it's own sake, and I'm also  
interested in figuring out where decisions are made in this  
project, since I didn't see this one go by on a mail list.


geir

On Feb 4, 2009, at 9:13 PM, Damien Katz wrote:

Geir, there was a decision made by the PMCs to change the  
transaction model to support partitioned databases. It is a  
change I am currently working on.


-Damien

On Feb 4, 2009, at 8:46 PM, Geir Magnusson Jr. wrote:


and original question #2?

geir

On Feb 4, 2009, at 8:38 PM, Antony Blakey wrote:



On 05/02/2009, at 12:02 PM, Geir Magnusson Jr. wrote:


1) where is this being forwarded from ?


I sent it to the PMC.

Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Buddhist walks up to a hot-dog stand and says, "Make me  
one with everything". He then pays the vendor and asks for  
change. The vendor says, "Change comes from within".























Re: Transactional _bulk_docs

2009-02-05 Thread Noah Slater
On Thu, Feb 05, 2009 at 06:14:26AM -0500, Geir Magnusson Jr. wrote:
> What got me engaged on this wasn't the decision itself (only because it
> was a secret decision), but -like Ted - the mode of operation.  It
> seemed that a very dedicated, engaged and interested community member
> had to privately petition the PMC for redress on a technical decision
> that none of us had any awareness of, nor a chance to review.  And IMO,
> from a guy that probably should be a committer and PMC member to boot!

I think we dropped the ball with this one.

I certainly don't remember being involved in this discussion, though I'm sure
someone has logs to prove otherwise. This in itself should be indicative of a
larger problem here.

I think it was fine that this was discussed on IRC, but the moment it came to
the point of needing to do anything about it, or make any decisions based upon
it, it should have been written up as a formal proposal and sent to the public
mailing list for discussion. I hope that the community reaction to this event
will be enough to remind us all to do this in the future.

> (By the way - from my count, not all PMC members are even on the PMC's
> private@ list, so I have *no clue* where project private discussion -
> like new committer candidates - are even discussed)

Can you email the persons not on the private list as a reminder to join?

-- 
Noah Slater, http://tumbolia.org/nslater


Re: Transactional _bulk_docs

2009-02-05 Thread Jan Lehnardt

Hi,

*pouring water over the fire*

The progression of this is very unfortunate. There was no formal  
discussion,
neither on IRC or a mailing list. We are all aware of the ASF ways of  
running

a project and we didn't handle that one well.

Apologies.

Now: Damien discussed the bulk docs feature on IRC and noted that for
multi-node CouchDB and a consistent interface it has to go and we all
agreed that this is a good thing. This is effectively a PMC decision.  
But

that's not the Apache-way of doing things. We deferred discussing all
details until Damien finished the patch.

Multi-node CouchDB was a day one design goal and well communicated
everywhere. We were also very vocal about breaking the API before 0.9.
Everybody investing in the API has been warned and has been doing on
their own risk.

Now, the new behaviour is currently being worked on and has not been
discussed since Damien is heads down with the code and as usual, I
think, planned to introduce the code with the patch. Again, this is code
that has been planned from day one.

The discussion of keeping the current (in-flux-API) bulk feature is a
separate one and I think the voices here are loud enough that we
should look at a way to support them.

The whole thing started because I closed a bug with a comment that
there must be an _upcoming discussion_. This got taken up at THE
PMC IS DOING EVERYTHING BEHIND THE SCENES. Which we
don't.

Damien's latest mail is a little unfortunate. He gets the Apache way
and the ASF understands the virtues of IRC, and the middle ground is
that major discussions must be held on the mailing lists. The PMC
is simply waiting for the patch to land, so there's no need to get
nervous. Thanks.

(Aside, this came up on user@ last week and I hoped that this would
have been the end of that until the patch lands.)

Cheers
Jan
--




On 5 Feb 2009, at 12:14, Geir Magnusson Jr. wrote:

[sending second time, as I see my first is stuck in moderation, and  
I want to reply in a timely manner]


Sure, ideally.

But you can't have "everyone" together at the same time on IRC,  
where at the ASF, we define "everyone" to be, well, "everyone", not  
you and the 4 others on the PMC.


I see 579 people on the user list.  I see 294 people on the dev  
list.  Just focusing on the dev list, that's 290 people, or 98.6% of  
people supposedly interested in CouchDB development, that had zero  
opportunity to see, review and participate in the discussion.   
Further, there's now zero chance that any future project participant  
can look back to understand design decision and philosophy.  No  
institutional memory.  Your goal, besides building a great software  
project, should be to get the community to the point where you can  
step back and do other things w/o material effect on the community,  
and that requires information like this to be somewhere accessible.


And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have meant  
that anyway.  Sure, I can see a chat starting on IRC about a topic,  
but I'd hope that one person would force the move from IRC to the  
mail list - and at that point, maybe posting a pointer to the  
*initial* discussion log would be useful.  And after that,  
discussion is on the mail list.


I think IRC logs are a very poor substitute to mail traffic (and  
yes, I grok the downside of async communications).  A primary one  
reason that they are very "in the moment" - if you are in the  
conversation, it's easy to stay in, but after, when things cool and  
the context of the moment isn't there, it's neigh impossible.  You  
also can't hit reply and quote a piece for others to see and  
discuss, further broadening the discussion.


What got me engaged on this wasn't the decision itself (only because  
it was a secret decision), but -like Ted - the mode of operation.   
It seemed that a very dedicated, engaged and interested community  
member had to privately petition the PMC for redress on a technical  
decision that none of us had any awareness of, nor a chance to  
review.  And IMO, from a guy that probably should be a committer and  
PMC member to boot!


(By the way - from my count, not all PMC members are even on the  
PMC's private@ list, so I have *no clue* where project private  
discussion - like new committer candidates - are even discussed)


geir

On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:

Ideally yes, but real time communication with everyone together is  
damn useful.


-Damien

On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:

Uh, project decisions are supposed to be made in the public  
mailing lists...


Ted

On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:


This decision was discussed and made on IRC.

-Damien

On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:

can you point me to a reference to where the PMC made this  
decision?


I'm interested in the subject for it's own sake, and I'm also  
interested in figuring ou

Re: Transactional _bulk_docs

2009-02-05 Thread Damien Katz


On Feb 5, 2009, at 6:14 AM, Geir Magnusson Jr. wrote:

[sending second time, as I see my first is stuck in moderation, and  
I want to reply in a timely manner]


Sure, ideally.

But you can't have "everyone" together at the same time on IRC,  
where at the ASF, we define "everyone" to be, well, "everyone", not  
you and the 4 others on the PMC.


I see 579 people on the user list.  I see 294 people on the dev  
list.  Just focusing on the dev list, that's 290 people, or 98.6% of  
people supposedly interested in CouchDB development, that had zero  
opportunity to see, review and participate in the discussion.   
Further, there's now zero chance that any future project participant  
can look back to understand design decision and philosophy.  No  
institutional memory.  Your goal, besides building a great software  
project, should be to get the community to the point where you can  
step back and do other things w/o material effect on the community,  
and that requires information like this to be somewhere accessible.


And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have meant  
that anyway.  Sure, I can see a chat starting on IRC about a topic,  
but I'd hope that one person would force the move from IRC to the  
mail list - and at that point, maybe posting a pointer to the  
*initial* discussion log would be useful.  And after that,  
discussion is on the mail list.


I think IRC logs are a very poor substitute to mail traffic (and  
yes, I grok the downside of async communications).  A primary one  
reason that they are very "in the moment" - if you are in the  
conversation, it's easy to stay in, but after, when things cool and  
the context of the moment isn't there, it's neigh impossible.  You  
also can't hit reply and quote a piece for others to see and  
discuss, further broadening the discussion.




We get a lot of value out of IRC.

We are going to discuss this on the ML. I was waiting until I got the  
patch work to talk about all the implications and how we'd set the  
flags and modes of operation and all the implications. The code is  
going to get more powerful, the plan is for the feature to go away,  
not the capability. If we decided the feature was too important, we'll  
put it back. But as it stands, the changes to the code that I'm making  
now all need to be made regardless if we change the feature or not.


What got me engaged on this wasn't the decision itself (only because  
it was a secret decision), but -like Ted - the mode of operation.   
It seemed that a very dedicated, engaged and interested community  
member had to privately petition the PMC for redress on a technical  
decision that none of us had any awareness of, nor a chance to  
review.  And IMO, from a guy that probably should be a committer and  
PMC member to boot!


He mailed us privately. Now he's mailed us publicly.

Any discussion about Antony being involved with the project should  
probably be private.


-Damien




(By the way - from my count, not all PMC members are even on the  
PMC's private@ list, so I have *no clue* where project private  
discussion - like new committer candidates - are even discussed)


geir

On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:

Ideally yes, but real time communication with everyone together is  
damn useful.


-Damien

On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:

Uh, project decisions are supposed to be made in the public  
mailing lists...


Ted

On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:


This decision was discussed and made on IRC.

-Damien

On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:

can you point me to a reference to where the PMC made this  
decision?


I'm interested in the subject for it's own sake, and I'm also  
interested in figuring out where decisions are made in this  
project, since I didn't see this one go by on a mail list.


geir

On Feb 4, 2009, at 9:13 PM, Damien Katz wrote:

Geir, there was a decision made by the PMCs to change the  
transaction model to support partitioned databases. It is a  
change I am currently working on.


-Damien

On Feb 4, 2009, at 8:46 PM, Geir Magnusson Jr. wrote:


and original question #2?

geir

On Feb 4, 2009, at 8:38 PM, Antony Blakey wrote:



On 05/02/2009, at 12:02 PM, Geir Magnusson Jr. wrote:


1) where is this being forwarded from ?


I sent it to the PMC.

Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Buddhist walks up to a hot-dog stand and says, "Make me one  
with everything". He then pays the vendor and asks for  
change. The vendor says, "Change comes from within".




















Re: Transactional _bulk_docs

2009-02-05 Thread Robert Newson
fwiw, I'd like to see these decisions proposed, discussed and resolved
on the mailing list. I appreciate it's slower than IRC, though. I
thought using mailing lists was the mandated "Apache way" of doing
these things, it certainly appears to be on other projects I follow
(Lucene, for example). To restate, I didn't think it was a permitted
option to use IRC to make important project decisions. Is there at
least a transcript of the IRC decision(s)?

B.

On Thu, Feb 5, 2009 at 8:05 AM, Robert Dionne
 wrote:
> My sense is that the approach to design in CouchDB is very bottoms up. I
> applaud that and encourage it and wholeheartedly agree with Alan Perlis
> about building software top down *except* the first time. We all know that
> very little great software was ever built top down designed by boxologists
> armed with UML diagrams. I think CouchDB is at a key point where it needs to
> continue to be driven by a small core group of dedicated passionate
> programmers.
>
> Please note that I'm in no way commenting on the make up of that group.
>
> I'm not very familiar with the ASF "process", excuse my ignorance, but I
> find the IRC enormously useful and find mailing list threads can be too
> unwieldy.
>
> I guess it's because I'm not a fan of top down design. I see the code itself
> as the design, and the debugging, reworking, and documenting of the code as
> the construction phase.
>
> Best regards,
>
> Bob
>
> Robert Dionne
> Chief Bittwiddler
> dio...@dionne-associates.com
> 203.231.9961
>
>
>
> On Feb 5, 2009, at 6:14 AM, Geir Magnusson Jr. wrote:
>
>> [sending second time, as I see my first is stuck in moderation, and I want
>> to reply in a timely manner]
>>
>> Sure, ideally.
>>
>> But you can't have "everyone" together at the same time on IRC, where at
>> the ASF, we define "everyone" to be, well, "everyone", not you and the 4
>> others on the PMC.
>>
>> I see 579 people on the user list.  I see 294 people on the dev list.
>>  Just focusing on the dev list, that's 290 people, or 98.6% of people
>> supposedly interested in CouchDB development, that had zero opportunity to
>> see, review and participate in the discussion.  Further, there's now zero
>> chance that any future project participant can look back to understand
>> design decision and philosophy.  No institutional memory.  Your goal,
>> besides building a great software project, should be to get the community to
>> the point where you can step back and do other things w/o material effect on
>> the community, and that requires information like this to be somewhere
>> accessible.
>>
>> And unlike Ted, I don't agree that a pointer to an IRC log is sufficient
>> to represent a "done decision", and he may not have meant that anyway.
>>  Sure, I can see a chat starting on IRC about a topic, but I'd hope that one
>> person would force the move from IRC to the mail list - and at that point,
>> maybe posting a pointer to the *initial* discussion log would be useful.
>>  And after that, discussion is on the mail list.
>>
>> I think IRC logs are a very poor substitute to mail traffic (and yes, I
>> grok the downside of async communications).  A primary one reason that they
>> are very "in the moment" - if you are in the conversation, it's easy to stay
>> in, but after, when things cool and the context of the moment isn't there,
>> it's neigh impossible.  You also can't hit reply and quote a piece for
>> others to see and discuss, further broadening the discussion.
>>
>> What got me engaged on this wasn't the decision itself (only because it
>> was a secret decision), but -like Ted - the mode of operation.  It seemed
>> that a very dedicated, engaged and interested community member had to
>> privately petition the PMC for redress on a technical decision that none of
>> us had any awareness of, nor a chance to review.  And IMO, from a guy that
>> probably should be a committer and PMC member to boot!
>>
>> (By the way - from my count, not all PMC members are even on the PMC's
>> private@ list, so I have *no clue* where project private discussion - like
>> new committer candidates - are even discussed)
>>
>> geir
>>
>> On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:
>>
>>> Ideally yes, but real time communication with everyone together is damn
>>> useful.
>>>
>>> -Damien
>>>
>>> On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:
>>>
 Uh, project decisions are supposed to be made in the public mailing
 lists...

 Ted

 On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:

> This decision was discussed and made on IRC.
>
> -Damien
>
> On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:
>
>> can you point me to a reference to where the PMC made this decision?
>>
>> I'm interested in the subject for it's own sake, and I'm also
>> interested in figuring out where decisions are made in this project, 
>> since I
>> didn't see this one go by on a mail list.
>>
>> geir
>>
>> On Feb 4, 2

Re: Transactional _bulk_docs

2009-02-05 Thread Robert Dionne
My sense is that the approach to design in CouchDB is very bottoms  
up. I applaud that and encourage it and wholeheartedly agree with  
Alan Perlis about building software top down *except* the first time.  
We all know that very little great software was ever built top down  
designed by boxologists armed with UML diagrams. I think CouchDB is  
at a key point where it needs to continue to be driven by a small  
core group of dedicated passionate programmers.


Please note that I'm in no way commenting on the make up of that group.

I'm not very familiar with the ASF "process", excuse my ignorance,  
but I find the IRC enormously useful and find mailing list threads  
can be too unwieldy.


I guess it's because I'm not a fan of top down design. I see the code  
itself as the design, and the debugging, reworking, and documenting  
of the code as the construction phase.


Best regards,

Bob

Robert Dionne
Chief Bittwiddler
dio...@dionne-associates.com
203.231.9961



On Feb 5, 2009, at 6:14 AM, Geir Magnusson Jr. wrote:

[sending second time, as I see my first is stuck in moderation, and  
I want to reply in a timely manner]


Sure, ideally.

But you can't have "everyone" together at the same time on IRC,  
where at the ASF, we define "everyone" to be, well, "everyone", not  
you and the 4 others on the PMC.


I see 579 people on the user list.  I see 294 people on the dev  
list.  Just focusing on the dev list, that's 290 people, or 98.6%  
of people supposedly interested in CouchDB development, that had  
zero opportunity to see, review and participate in the discussion.   
Further, there's now zero chance that any future project  
participant can look back to understand design decision and  
philosophy.  No institutional memory.  Your goal, besides building  
a great software project, should be to get the community to the  
point where you can step back and do other things w/o material  
effect on the community, and that requires information like this to  
be somewhere accessible.


And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have  
meant that anyway.  Sure, I can see a chat starting on IRC about a  
topic, but I'd hope that one person would force the move from IRC  
to the mail list - and at that point, maybe posting a pointer to  
the *initial* discussion log would be useful.  And after that,  
discussion is on the mail list.


I think IRC logs are a very poor substitute to mail traffic (and  
yes, I grok the downside of async communications).  A primary one  
reason that they are very "in the moment" - if you are in the  
conversation, it's easy to stay in, but after, when things cool and  
the context of the moment isn't there, it's neigh impossible.  You  
also can't hit reply and quote a piece for others to see and  
discuss, further broadening the discussion.


What got me engaged on this wasn't the decision itself (only  
because it was a secret decision), but -like Ted - the mode of  
operation.  It seemed that a very dedicated, engaged and interested  
community member had to privately petition the PMC for redress on a  
technical decision that none of us had any awareness of, nor a  
chance to review.  And IMO, from a guy that probably should be a  
committer and PMC member to boot!


(By the way - from my count, not all PMC members are even on the  
PMC's private@ list, so I have *no clue* where project private  
discussion - like new committer candidates - are even discussed)


geir

On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:

Ideally yes, but real time communication with everyone together is  
damn useful.


-Damien

On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:

Uh, project decisions are supposed to be made in the public  
mailing lists...


Ted

On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:


This decision was discussed and made on IRC.

-Damien

On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:

can you point me to a reference to where the PMC made this  
decision?


I'm interested in the subject for it's own sake, and I'm also  
interested in figuring out where decisions are made in this  
project, since I didn't see this one go by on a mail list.


geir

On Feb 4, 2009, at 9:13 PM, Damien Katz wrote:

Geir, there was a decision made by the PMCs to change the  
transaction model to support partitioned databases. It is a  
change I am currently working on.


-Damien

On Feb 4, 2009, at 8:46 PM, Geir Magnusson Jr. wrote:


and original question #2?

geir

On Feb 4, 2009, at 8:38 PM, Antony Blakey wrote:



On 05/02/2009, at 12:02 PM, Geir Magnusson Jr. wrote:


1) where is this being forwarded from ?


I sent it to the PMC.

Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Buddhist walks up to a hot-dog stand and says, "Make me  
one with everything". He then pays the vendor and asks for  
change. The vendor says, "Change comes from within".




















[jira] Commented: (COUCHDB-135) Offset regression between 0.8.0 and trunk

2009-02-05 Thread Paul Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670736#action_12670736
 ] 

Paul Carey commented on COUCHDB-135:


Applying this patch put a big smile on my face - it does indeed fix the main 
offset calculation error. 

However, running my test suite, I now see very sporadic errors which makes me 
think there's still a race condition lurking somewhere. Running my pagination 
tests suite 100 times results in about 82k requests to CouchDB and 8k test 
assertions. I had 8 assertions fail over the 100 test runs. I'm fairly sure the 
issue doesn't lie with my test suite or lib.

I'll have a go at creating a test that reproduces the failure. 

> Offset regression between 0.8.0 and trunk
> -
>
> Key: COUCHDB-135
> URL: https://issues.apache.org/jira/browse/COUCHDB-135
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 0.9
> Environment: OSX 10.5
>Reporter: Paul Carey
>Priority: Blocker
> Fix For: 0.9
>
> Attachments: COUCHDB-135.patch, view_offsets.js
>
>
> The offset returned for certain map queries differs between 0.8.0 and 
> 0.9.0r702929.
> The attached test can be pasted into couch_tests.js. It passes in 0.8.0 and 
> fails in 0.9.
> I believe the skip query param must be passed for this bug to be exhibited. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Transactional _bulk_docs

2009-02-05 Thread Geir Magnusson Jr.
[sending second time, as I see my first is stuck in moderation, and I  
want to reply in a timely manner]


Sure, ideally.

But you can't have "everyone" together at the same time on IRC, where  
at the ASF, we define "everyone" to be, well, "everyone", not you and  
the 4 others on the PMC.


I see 579 people on the user list.  I see 294 people on the dev list.   
Just focusing on the dev list, that's 290 people, or 98.6% of people  
supposedly interested in CouchDB development, that had zero  
opportunity to see, review and participate in the discussion.   
Further, there's now zero chance that any future project participant  
can look back to understand design decision and philosophy.  No  
institutional memory.  Your goal, besides building a great software  
project, should be to get the community to the point where you can  
step back and do other things w/o material effect on the community,  
and that requires information like this to be somewhere accessible.


And unlike Ted, I don't agree that a pointer to an IRC log is  
sufficient to represent a "done decision", and he may not have meant  
that anyway.  Sure, I can see a chat starting on IRC about a topic,  
but I'd hope that one person would force the move from IRC to the mail  
list - and at that point, maybe posting a pointer to the *initial*  
discussion log would be useful.  And after that, discussion is on the  
mail list.


I think IRC logs are a very poor substitute to mail traffic (and yes,  
I grok the downside of async communications).  A primary one reason  
that they are very "in the moment" - if you are in the conversation,  
it's easy to stay in, but after, when things cool and the context of  
the moment isn't there, it's neigh impossible.  You also can't hit  
reply and quote a piece for others to see and discuss, further  
broadening the discussion.


What got me engaged on this wasn't the decision itself (only because  
it was a secret decision), but -like Ted - the mode of operation.  It  
seemed that a very dedicated, engaged and interested community member  
had to privately petition the PMC for redress on a technical decision  
that none of us had any awareness of, nor a chance to review.  And  
IMO, from a guy that probably should be a committer and PMC member to  
boot!


(By the way - from my count, not all PMC members are even on the PMC's  
private@ list, so I have *no clue* where project private discussion -  
like new committer candidates - are even discussed)


geir

On Feb 5, 2009, at 2:11 AM, Damien Katz wrote:

Ideally yes, but real time communication with everyone together is  
damn useful.


-Damien

On Feb 5, 2009, at 2:07 AM, Ted Leung wrote:

Uh, project decisions are supposed to be made in the public mailing  
lists...


Ted

On Feb 4, 2009, at 6:51 PM, Damien Katz wrote:


This decision was discussed and made on IRC.

-Damien

On Feb 4, 2009, at 9:26 PM, Geir Magnusson Jr. wrote:

can you point me to a reference to where the PMC made this  
decision?


I'm interested in the subject for it's own sake, and I'm also  
interested in figuring out where decisions are made in this  
project, since I didn't see this one go by on a mail list.


geir

On Feb 4, 2009, at 9:13 PM, Damien Katz wrote:

Geir, there was a decision made by the PMCs to change the  
transaction model to support partitioned databases. It is a  
change I am currently working on.


-Damien

On Feb 4, 2009, at 8:46 PM, Geir Magnusson Jr. wrote:


and original question #2?

geir

On Feb 4, 2009, at 8:38 PM, Antony Blakey wrote:



On 05/02/2009, at 12:02 PM, Geir Magnusson Jr. wrote:


1) where is this being forwarded from ?


I sent it to the PMC.

Antony Blakey
-
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Buddhist walks up to a hot-dog stand and says, "Make me one  
with everything". He then pays the vendor and asks for change.  
The vendor says, "Change comes from within".