On 24/02/2009, at 9:32 AM, Dean Landolt wrote:
Can you suggest how we improve the wiki docs to satisfy this? In my
opinion, the docs are clear* and the term is overloaded and
confusing.
* http://wiki.apache.org/couchdb/Document_revisions has
"You cannot rely on document revisions for any other purpose
than concurrency control." in bold letters.
I stated this in earlier discussions as well: Even if our
documentation
were perfect, we don't control how people learn about CouchDB. We
only control the API and we should work hard to get it right.
The way it stands now, a lot of people new to CouchDB get it wrong
because "revision" is a familiar term and they associate the
behaviour
they associate with it to them. That's how humans learn. In this case
we make the learning hard.
Firstly, I completely agree that one should consider the implications
of using certain terms; the baggage and context such terms bring with
them.
<flamesuit on>
OTOH, one should use the correct term and not redefine existing terms
to suit one's own purpose. In a tangentially related way, the use of
the term RESTful wrt CouchDB is a marketing abomination.
</flamesuit off>
The documentation about replication, the role of revisions, the lack
of inter-document consistency guarantees (including, crucially to the
operation model, the lack of Monotonic Write guarantees), really needs
to be expanded.
The consequences of CouchDB's underlying model aren't immediately
obvious, and should be spelled out, as I started to do here: http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c0fddc57c-db78-4241-86de-549fecc8b...@gmail.com%3e
- which was obviously in the context of changing that mechanism, but
still the explanation and references are useful.
I couldn't agree more with this sentiment, but revision still
strikes me as
the right term. Perhaps the easiest way to fix this misconception is
for
there to actually be a way to keep old revisions around for good :)
Would it be overly difficult to just add in the ability to keep a
full rev
history based on a config setting? The replication api would need to
accommodate this, of course, and if the machine you're replicating
from
doesn't also keep old revisions around your SOL, but is there any
other
compelling reason to not offer this option? If it wouldn't
complicate the
code base, this seems like a helpful feature. Sure, it could be
wasteful and
should be off by default, but if your dataset is relatively small,
this
config flag would be pretty nice to have, and it could help clear up
this
confusion.
Danger Will Robinson!
The problem here is that you then need to make certain guarantees
about revisions to make them at all useful, and you get into a
discussion like the above email thread.
IMO, discussing these issues without having read the relevant
literature around replication models, is a waste of time. Serious
research has been done into this, and (once again, IMO) it is more
productive to advance that understanding than try (and possibly fail)
to reinvent the wheel.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A priest, a minister and a rabbi walk into a bar. The bartender says
"What is this, a joke?"