Hi Scott,
thanks for raising your concerns. I share it, but I think Brian and
Adam are only suggesting an optional addition to the existing
API which leaves the existing case in place. Much like bulk docs
now has two modes, PUT can have two modes.
Somthing like
PUT /db/doc?rev=foo&allow_conflicts=true
{"json":"body"}
I wouldn't be opposed to add this.
Cheers
Jan
--
On 6 Apr 2009, at 20:40, Scott Shumaker wrote:
Just my $0.02, but I think CouchDB is moving in entirely the wrong
direction with conflicts in a misguided attempt to make multi-master
replication the 'only' way to do things.
Very frequently, you need to attempt to resolve a conflict as soon as
it occurs - and you often need user interaction to help you resolve
the conflict. Sometimes you may need to just refresh the user to the
latest version, other times you may be able to choose one of the
versions based on some criteria, sometimes you can automatically merge
the two versions, and occasionally you need to ask the user what to
do. This just won't work if the process is happening offline, in a
background job.
This isn't just true of CouchDB, but of other distributed systems like
Dynamo (read the paper, they talk about this exact issue. Amazon.com
has a "merge shopping carts" screen for this exact reason).
Getting rid of conflict handling greatly limits the utility of CouchDB
for real-world applications (it will certainly force us to adopt
another technology instead). And worse, this is all for the goal of
supporting multi-master replication, which really isn't a great
technology solution anyway. If you want durability and scalability,
CouchDB should really adopt the much more robust multiple write nodes
/ read nodes system (with quorum and reconciliation) in Dynamo or a
few other distributed key/value stores.
Scott
On Mon, Apr 6, 2009 at 12:40 AM, Brian Candler <[email protected]>
wrote:
The following is part thought-experiment, part serious suggestion.
I propose the following: remove all concurrency control from PUT
operations,
and hence also the 409 response. If you PUT a document where the
_rev is not
the same as a 'head' revision, then a new conflicting version is
inserted.
[1]
The reasoning is as follows:
1. Any application which relies on the 409 PUT conflict behaviour is
not going to work properly in a multi-master replication
environment.
That is: it is protected against concurrent changes on the same
node,
but not on a different node. This is arbitrary.
2. The same reasoning was used for getting rid of bulk non-
conflicting
updates. Paraphrasing: "a grown-up CouchDB app which runs on a
replicated
cluster won't be able to rely on these semantics, so removing this
capability will encourage you to write your app in a more
scalable way.
You will thank us later."
3. A CouchDB app should be written so that it "treats edit
conflicts as a
common state, not an exceptional one" [2]
This change will slightly increase the number of these normal
conflicts,
whilst forcing the app writer to deal with them.
4. By increasing the number of conflicting versions, it is likely to
exercise more the underlying code and flush out bugs (for
example, more
fully testing what happens in views when multiple conflicting
versions of
a document are updated or removed)
5. It may highlight more clearly where API improvements are needed
to help
applications deal with and resolve conflicts. For example:
- making it easier for applications to be aware of the existence of
conflicts (Maybe a GET without _rev should fail if there are
multiple
conflicting revs, or return all of the versions)
- given that multiple concurrent clients will see conflicts, and
may
attempt to resolve them at the same time, then it's likely that
two
clients will independently submit exactly the same document
content
after running the conflict-resolution algorithm. It could be
helpful
if these were treated as a single new rev, and not two new
conflicts.
Comments? I would be especially interested in hearing from core
developers
who didn't want bulk non-conflicting updates, but *do* want to
retain single
non-conflicting updates, as to why this is logical.
Regards,
Brian.
[1] You can get this behaviour on 0.9.0 by POSTing to _bulk_docs with
{"all_or_nothing":true}
[2] http://couchdb.apache.org/docs/overview.html under heading
"Conflicts"