On 2/7/2007 12:54 PM, Markus Schiltknecht wrote:
Hi,

Jan Wieck wrote:
Are we still discussing if the Postgres backend may provide support for
a commit timestamp, that follows the rules for Lamport timestamps in a
multi-node cluster?

No. And I think you know my opinion about that by now. ;-)

Then let me give you a little puzzle just for the fun of it.

A database containing customer contact information (among other things) is a two node multimaster system. One is serving the customer web portal, the other is used by the company staff including the call center. At 13:45 the two servers lose connectivity to each other, yet the internal staff can access the internal server while the web portal is accessible from the outside. At 13:50 customer A updates their credit card information through the web portal, while customer B does the same through the call center. At 13:55 both customers change their mind to use yet another credit card, now customer A phones the call center while customer B does it via the internet.

At 14:00 the two servers reconnect and go through the conflict resolution. How do you intend to solve both conflicts without using any "clock", because that seems to be a stopword causing instant rejection of whatever you propose. Needless to say, both customers will be dissatisfied if you charge the "wrong" credit card during your next billing cycle.



It seems more like we are drifting into what type of
replication system I should design to please most people.

Nobody is telling you what you should do. You're free to do whatever you want to.

I'm only trying to get a discussion going, because a) I'm interested in how you plan to solve these problems and b) in the past, most people were complaining that all the different replication efforts didn't try to work together. I'm slowly trying to open up and discuss what I'm doing with Postgres-R on the lists.

Which is a good discussion because one of the reasons why I stopped looking into Postgres-R is the fact that is based on the idea to push all the replication information through a system that generates a global serialized message queue. That by itself isn't the problem, but the fact that implementing a global serialized message queue has serious throughput issues that are (among other details) linked to the speed of light.

I am trying to start with a system, that doesn't rely on such a mechanism for everything. I do intend to add an option later, that allows to declare a UNIQUE NOT NULL constraint to be synchronous. What that means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATE will require the node to currently be a member of the (quorum or priority defined) majority of the cluster. An advisory lock system, based on a total order group communication, will grant the lock to the unique key values on a first come, first serve base. Every node in the cluster will keep those keys as "locked" until the asynchronous replication stream reports the locking transaction as ended. If another remote transaction in the meantime requires updating such key, the incoming stream from that node will be on hold until the lock is cleared. This is to protect agains node B replicating a transaction from node A and a later update on node B arrives on C before C got the first event from A. A node that got disconnected from the cluster must rebuild the current advisory lock list upon reconnecting to the cluster.

I think that this will be a way to overcome Postgres-R's communication bottleneck, as well as allowing limited update activity even during a completely disconnected state of a node. Synchronous or group communication messages are reduced to the cases, where the application cannot be implemented in a conflict free way, like allocating a natural primary key. There is absolutely no need to synchronize for example creating a sales order. An application can use global unique ID's for the order number. And everything possibly referenced by an order (items, customers, ...) is stored in a way that the references are never updated. Deletes to those possibly referenced objects are implemented in a two step process, where they are first marked obsolete, and later on things that have been marked obsolete for X long are deleted. A REPLICA TRIGGER on inserting an order will simply reset the obsolete flag of referenced objects. If a node is disconnected longer than X, you have a problem - hunt down the guy who defined X.

Just yesterday at the SFPUG meeting, I've experienced how confusing it is for the users to have such a broad variety of (existing and upcoming) replication solutions. And I'm all for working together and probably even for merging different replication solutions.

Merging certain ideas to come up with an async/sync hybrid? Seems to me we have similar enough ideas to need conflict resolution, because we had them simultaneously but communicate them asynchronously.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to