Re: [HACKERS] Proposal: Commit timestamp

Jan Wieck Wed, 07 Feb 2007 17:13:41 -0800

On 2/7/2007 12:54 PM, Markus Schiltknecht wrote:

Hi,


Jan Wieck wrote:

Are we still discussing if the Postgres backend may provide support for
a commit timestamp, that follows the rules for Lamport timestamps in a
multi-node cluster?


No. And I think you know my opinion about that by now. ;-)


Then let me give you a little puzzle just for the fun of it.

A database containing customer contact information (among other things)is a two node multimaster system. One is serving the customer webportal, the other is used by the company staff including the callcenter. At 13:45 the two servers lose connectivity to each other, yetthe internal staff can access the internal server while the web portalis accessible from the outside. At 13:50 customer A updates their creditcard information through the web portal, while customer B does the samethrough the call center. At 13:55 both customers change their mind touse yet another credit card, now customer A phones the call center whilecustomer B does it via the internet.

At 14:00 the two servers reconnect and go through the conflictresolution. How do you intend to solve both conflicts without using any"clock", because that seems to be a stopword causing instant rejectionof whatever you propose. Needless to say, both customers will bedissatisfied if you charge the "wrong" credit card during your nextbilling cycle.

It seems more like we are drifting into what type of
replication system I should design to please most people.
Nobody is telling you what you should do. You're free to do whatever youwant to.
I'm only trying to get a discussion going, because a) I'm interested inhow you plan to solve these problems and b) in the past, most peoplewere complaining that all the different replication efforts didn't tryto work together. I'm slowly trying to open up and discuss what I'mdoing with Postgres-R on the lists.

Which is a good discussion because one of the reasons why I stoppedlooking into Postgres-R is the fact that is based on the idea to pushall the replication information through a system that generates a globalserialized message queue. That by itself isn't the problem, but the factthat implementing a global serialized message queue has seriousthroughput issues that are (among other details) linked to the speed oflight.

I am trying to start with a system, that doesn't rely on such amechanism for everything. I do intend to add an option later, thatallows to declare a UNIQUE NOT NULL constraint to be synchronous. Whatthat means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATEwill require the node to currently be a member of the (quorum orpriority defined) majority of the cluster. An advisory lock system,based on a total order group communication, will grant the lock to theunique key values on a first come, first serve base. Every node in thecluster will keep those keys as "locked" until the asynchronousreplication stream reports the locking transaction as ended. If anotherremote transaction in the meantime requires updating such key, theincoming stream from that node will be on hold until the lock iscleared. This is to protect agains node B replicating a transaction fromnode A and a later update on node B arrives on C before C got the firstevent from A. A node that got disconnected from the cluster must rebuildthe current advisory lock list upon reconnecting to the cluster.

I think that this will be a way to overcome Postgres-R's communicationbottleneck, as well as allowing limited update activity even during acompletely disconnected state of a node. Synchronous or groupcommunication messages are reduced to the cases, where the applicationcannot be implemented in a conflict free way, like allocating a naturalprimary key. There is absolutely no need to synchronize for examplecreating a sales order. An application can use global unique ID's forthe order number. And everything possibly referenced by an order (items,customers, ...) is stored in a way that the references are neverupdated. Deletes to those possibly referenced objects are implemented ina two step process, where they are first marked obsolete, and later onthings that have been marked obsolete for X long are deleted. A REPLICATRIGGER on inserting an order will simply reset the obsolete flag ofreferenced objects. If a node is disconnected longer than X, you have aproblem - hunt down the guy who defined X.

Just yesterday at the SFPUG meeting, I've experienced how confusing itis for the users to have such a broad variety of (existing and upcoming)replication solutions. And I'm all for working together and probablyeven for merging different replication solutions.

Merging certain ideas to come up with an async/sync hybrid? Seems to mewe have similar enough ideas to need conflict resolution, because we hadthem simultaneously but communicate them asynchronously.



Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [HACKERS] Proposal: Commit timestamp

Reply via email to