I think discussion of this issue is complicated by the lack of a clear exposition of the different ways in which CouchDB may be used/ deployed. I have the following in mind:

---------------------------------------

A. A single-node database engine embedded in a desktop application.
B. A single-node database server.
C. A multi-node clustered database server.

Furthermore it might have:

D. No replication or replication from the app purely for backup. No conflict is possible. E. Replication from a distinguished peer that accepts write operations e.g. a content/query distribution mechanism. No conflict is possible. F. Replication in a p2p mesh e.g. collaborative content management. Conflict is possible.

Then, in a non-orthogonal way, conflicts are dealt with:

G. Not at all because they can't arise.
H. Replication is under user control, and exclusive with 'normal' operation. Conflict resolution is only caused by replication, said conflicts being resolved by the user using a specialized UI/Workflow. Normal operation sees no conflicts. I. Replication is concurrent with normal operation, and may or may not be under user control. Normal operation sees conflicts.

---------------------------------------

I have a pending deployment project of type A/E/G, and pending projects of types A/F/H and B+A/E/G. In all my cases, update and indexing throughput is not an issue, although replication efficiency, especially of incremental updates to attachments, is a concern.

I understand that there is a sense in which CouchDB was on a trajectory pre-Apache to be C/F/I, but I wonder if the desire to achieve that isn't *unnecessarily* at the expense of other deployment models. In particular, some of these sound like a Notes client, and I have heard CouchDB promoted as 'Notes done right', hence my focus on those kinds of use cases (as opposed to high-throughput db servers). IMO it would be a good thing to not burden these other use cases with the operational cost of supporting just one of them.

Obviously supporting transactions in a partition-based cluster can impose a cost (although only if the transaction spans the cluster in some way, the probability of which is potentially lessened by the partitioning), but what if one could turn them off via configuration?

From what Damien has said about replication, I'm getting the idea that it is possible to do replication on an MVCC boundary, in the same way that a view represents an MVCC boundary, although I hear loud and clear that CouchDB has never, ever, claimed that replication works in that manner.

The benefit of a transactional API vs. a conflict based API, for local operations, is not only that certain models can only be implemented using a transactional API, but the transaction failure mode has a clear and simple reflection into the GUI. Users have an expectation of transactionality, and IMO domain-dependent conflict resolution (as opposed to domain-independent transactionality) is a leap into the unknown. I think it's both less natural and more work for the user.

IMO The tradeoff of user-interface model/complexity vs. single/multi- node deployment vs. transaction cost should be in the hands of the application developer.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

When I hear somebody sigh, 'Life is hard,' I am always tempted to ask, 'Compared to what?'
  -- Sydney Harris


Reply via email to