On 24/02/2009, at 12:15 PM, Chris Anderson wrote:
Would it be overly difficult to just add in the ability to keep a
full rev
history based on a config setting?
This would be a pretty big change. As Antony says, once you go down
that path a little, you end up at something that is not really much
like Couch.
I don't want to re-open a dead issue, but to clarify this - there are
other models of replication that provide stronger weak-consistency
guarantees - I urge you to read a few Bayou papers if you are
interested. Using such replication would be very close to Couch. So I
don't agree with the strength of Chris's comment.
The issue however, is that Couch's identity is, and has always been,
largely determined by it's replication model. There's so much more to
Couch that is independent of that, such as map/reduce views, forms,
futon, an HTTP API, JSON etc, that it's not immediately obvious that
it's the *replication model* that makes this product 'CouchDB'. The
project founder and the PMC, are all committed to that replication
model, which is derived from Notes.
You can add all of the other Couch features, and in fact reuse all of
the Couch code, with a different replication model, but it's unlikely
it would be accepted into the Couch code base. If you want that, you
need to fork and call it something different (which is what I'm
doing). It's important to note however that the Couch replication
model has some characteristics that cannot be achieved using any
stronger form of consistency. In fact, technically speaking, Couch
provides coherence, but NO consistency.
Given all of that, it would be good to have a very clear 'What is
Couch' that emphasizes the primacy of the replication model (and it's
implications, both pro and con), because none of the other things IMO
are as central to the identity, as consequential, or as confusing
(except maybe reduce/re-reduce) as the operational semantics of the
replicational model.
As an aside to this (and I'm not being bolshy), looking further ahead,
Eventual Consistency, which seems to be promoted as an article of
faith, is not *strictly* achievable in a partial replication
environment. Achieving Eventual Consistency is also dependent on some
other constraints, so depending on your deployment model, it can be
more theoretical than practical. At the end of the day however,
dealing with non-Monotonic Writes subsumes dealing with Eventual
Consistency in all but asymptotic senses.
These are all points that I think should be made clearly and up front
in the documentation, because a failure to understand Couch's
replication model, and the implications for applications, both pro and
com, will IMO lead to failures that will be blamed on Couch, but are
in fact due to misunderstanding. You don't want a 'Couch is a piece of
shit' meme to establish. IMO the bulk of Couch users will not think
this through themselves, because they will be tool users, not tool
builders.
There's yet to be a really clear reference for how to do
application-versioned documents in CouchDB. Hopefully we'll address
the topic in the book, but we haven't gotten that far yet.
The way I see it, the salient options are:
A) leave it as _rev and answer the versioning question every week
forever
B) rename it to _mvcc or _lock or _token or something else that
doesn't confuse people
The main drawback of B is that when we start renaming _rev, someone
else comes along and tries to take the opportunity to change _id, or
otherwise change the whole system. If we can stick to just renaming to
something clearer, I'm happy to go ahead with this.
Orthogonally, I still think the id and rev should be wrapped in a
_meta tag, but modulo that ...
It's not a _lock. Saying it's a _token has nothing to do with it's
function - it would be like calling a car a 'construct of metal'. It's
not _mvcc because that's the name of a technique, not a thing.
Maybe _mvcc_commit_id - although in the current implementation it
isn't, it philosophically is and could be implemented that way. But
really, it is a document version/revision identifier. Maybe put
'couch' in there to emphasize the internal nature of it e.g.
'_couch_rev_id' i.e. something which, at the limit might be
'_couch_private_revision_id_which_you_should_treat_as_opaque'.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The intuitive mind is a sacred gift and the rational mind is a
faithful servant. We have created a society that honours the servant
and has forgotten the gift.
-- Albert Einstein