direction with conflicts in a misguided attempt to make multi-master
replication the 'only' way to do things.
Very frequently, you need to attempt to resolve a conflict as soon
as
it occurs - and you often need user interaction to help you resolve
the conflict. Sometimes you may need to just refresh the user to
the
latest version, other times you may be able to choose one of the
versions based on some criteria, sometimes you can automatically
merge
the two versions, and occasionally you need to ask the user what to
do. This just won't work if the process is happening offline, in a
background job.
This isn't just true of CouchDB, but of other distributed systems
like
Dynamo (read the paper, they talk about this exact issue.
Amazon.com
has a "merge shopping carts" screen for this exact reason).
Getting rid of conflict handling greatly limits the utility of
CouchDB
for real-world applications (it will certainly force us to adopt
another technology instead). And worse, this is all for the goal of
supporting multi-master replication, which really isn't a great
technology solution anyway. If you want durability and scalability,
CouchDB should really adopt the much more robust multiple write
nodes
/ read nodes system (with quorum and reconciliation) in Dynamo or a
few other distributed key/value stores.
Scott
On Mon, Apr 6, 2009 at 12:40 AM, Brian Candler <[email protected]>
wrote:
The following is part thought-experiment, part serious suggestion.
I propose the following: remove all concurrency control from PUT
operations,
and hence also the 409 response. If you PUT a document where the
_rev is
not
the same as a 'head' revision, then a new conflicting version is
inserted.
[1]
The reasoning is as follows:
1. Any application which relies on the 409 PUT conflict behaviour
is
not going to work properly in a multi-master replication
environment.
That is: it is protected against concurrent changes on the same
node,
but not on a different node. This is arbitrary.
2. The same reasoning was used for getting rid of bulk non-
conflicting
updates. Paraphrasing: "a grown-up CouchDB app which runs on a
replicated
cluster won't be able to rely on these semantics, so removing this
capability will encourage you to write your app in a more
scalable way.
You will thank us later."
3. A CouchDB app should be written so that it "treats edit
conflicts as a
common state, not an exceptional one" [2]
This change will slightly increase the number of these normal
conflicts,
whilst forcing the app writer to deal with them.
4. By increasing the number of conflicting versions, it is likely
to
exercise more the underlying code and flush out bugs (for
example, more
fully testing what happens in views when multiple conflicting
versions
of
a document are updated or removed)
5. It may highlight more clearly where API improvements are
needed to
help
applications deal with and resolve conflicts. For example:
- making it easier for applications to be aware of the existence of
conflicts (Maybe a GET without _rev should fail if there are
multiple
conflicting revs, or return all of the versions)
- given that multiple concurrent clients will see conflicts, and
may
attempt to resolve them at the same time, then it's likely that
two
clients will independently submit exactly the same document
content
after running the conflict-resolution algorithm. It could be
helpful
if these were treated as a single new rev, and not two new
conflicts.
Comments? I would be especially interested in hearing from core
developers
who didn't want bulk non-conflicting updates, but *do* want to
retain
single
non-conflicting updates, as to why this is logical.
Regards,
Brian.
[1] You can get this behaviour on 0.9.0 by POSTing to _bulk_docs
with
{"all_or_nothing":true}
[2] http://couchdb.apache.org/docs/overview.html under heading
"Conflicts"