Please let's stop using the word 'transactional'. Monotonic Writes
requires nothing more 'transactional' then CouchDB already has e.g.
stable storage. The word 'transaction' is commonly used to mean user-
level ACID semantics, which the neither the Bayou nor PRACTI models
provide.
On 16/02/2009, at 3:55 PM, Chris Anderson wrote:
So it seems as though, when a long history is replicated under your
model (interleaving many different client updates) we would end up
sending a lot more data over the wire under your proposed model.
With the tradeoff that you get Monotonic Writes. Whether you see a lot
more data depends on the frequency of Replication wrt writes, and the
distribution of writes. Clustered writes with isolation group
optimization (i.e. protocol, not user-initiated) would end up sending
little, if any more data than would currently be sent. Furthermore,
Monotonic Writes might allow you to do differential encoding of
subsequent revisions. This could be a fantastic win that would reduce
the amount of data sent, even compared to the current protocol.
Especially for attachments.
In order to ensure that the isolation group stays together, even
should
replication fail before completion, we'd have to send the latest
doc-rev for every doc touched in each isolated doc group.
In order to get Monotonic Writes you need to do that, and it's
independent of isolation groups. Isolation groups are a feature that
allows you to send *less* data. Exposing them to the user is entirely
another question.
In the current system we just send the latest non-conflicted rev or
all the conflict revs is they exist. It makes for a lot less data on
the wire. (Correct me if I'm wrong.)
Correct, although incremental replication creates states that don't
provide a Monotonic Write guarantee.
Your story about comments being replicated without their assocaited
posts is a good example of the counter-intuitive things that can
happen when replication fails before completion. Thanks for that.
The current replication implementation, not replication per se.
I think these questions are interesting, I really do. However, in my
mind, what makes CouchDB relaxing, is that we're not trying to be
ambitious on the transactional guarantees front. So far, we've tried
to give only the guarantees we know we can afford to give, and
concentrate on getting them right.
It isn't clear that the tradeoff needs to be forced. A system that
provides Monotonic Writes can easily optimize for bandwidth, either
adaptively or via configuration, but the reverse is not true.
One example of adaptive optimization is automatically increasing the
size of the isolation groups depending on the measured performance
characteristics of the channel, and the size of the data.
You can configure a system that can provide Monotonic Write guarantees
to not do so.
Robert's point that much of this can be implemented on top of CouchDB
is an interesting one. If it is indeed the case, then the question
becomes whether clients or the database should be responsible for
providing the transactional API.
I'm still processing Robert's point in the context of the papers, but
I'm not sure that it's true that it can be done without modification
to CouchDB. It may not be practical to carry session-long version
vectors in a light-weight client. I'm more certain that it can't be
done in the context of Partial Replication. In any case, this can be
done once, efficiently, in the server, rather than ineficiently (if at
all) in a lightweight client.
In the Bayou papers the sessions need to be persistent - the Bayou
context is an explicit client-server model with a persistent client.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Reflecting on W.H. Auden's contemplation of 'necessary murders' in the
Spanish Civil War, George Orwell wrote that such amorality was only
really possible, 'if you are the kind of person who is always
somewhere else when the trigger is pulled'.
-- John Birmingham, "Appeasing Jakarta"