Hi,

Theo Schlossnagle wrote:
On Feb 4, 2007, at 1:36 PM, Jan Wieck wrote:
Obviously the counters will immediately drift apart based on the transaction load of the nodes as soon as the network goes down. And in order to avoid this "clock" confusion and wrong expectation, you'd rather have a system with such a simple, non-clock based counter and accept that it starts behaving totally wonky when the cluster reconnects after a network outage? I rather confuse a few people than having a last update wins conflict resolution that basically rolls dice to determine "last".

If your cluster partition and you have hours of independent action and upon merge you apply a conflict resolution algorithm that has enormous effect undoing portions of the last several hours of work on the nodes, you wouldn't call that "wonky?"

You are talking about different things. Async replication, as Jan is planning to do, is per se "wonky", because you have to cope with conflicts by definition. And you have to resolve them by late-aborting a transaction (i.e. after a commit). Or put it another way: async MM replication means continuing in disconnected mode (w/o quorum or some such) and trying to reconciliate later on. It should not matter if the delay is just some milliseconds of network latency or three days (except of course that you probably have more data to reconciliate).

For sane disconnected (or more generally, partitioned) operation in multi-master environments, a quorum for the dataset must be established. Now, one can consider the "database" to be the dataset. So, on network partitions those in "the" quorum are allowed to progress with data modification and others only read.

You can do this to *prevent* conflicts, but that clearly belongs to the world of sync replication. I'm doing this in Postgres-R: in case of network partitioning, only a primary partition may continue to process writing transactions. For async replication, it does not make sense to prevent conflicts when disconnected. Async is meant to cope with conflicts. So as to be independent of network latency.

However, there is no reason why the dataset _must_ be the database and that multiple datasets _must_ share the same quorum algorithm. You could easily classify certain tables or schema or partitions into a specific dataset and apply a suitable quorum algorithm to that and a different quorum algorithm to other disjoint data sets.

I call that partitioning (among nodes). And it's applicable to sync as well as async replication, while it makes more sense in sync replication.

What I'm more concerned about, with Jan's proposal, is the assumption that you always want to resolve conflicts by time (except for balances, for which we don't have much information, yet). I'd rather say that time does not matter much if your nodes are disconnected. And (especially in async replication) you should prevent your clients from committing to one node and then reading from another, expecting to find your data there. So why resolve by time? It only makes the user think you could guarantee that order, but you certainly cannot.

Regards

Markus


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to