Re: [HACKERS] Synchronization levels in SR

Greg Smith Wed, 02 Jun 2010 00:23:32 -0700

Heikki Linnakangas wrote:

The possibilities are endless... Your proposal above covers a prettygood set of scenarios, but it's by no means complete. If we try tosolve everything the configuration will need to be written in aTuring-complete Replication Description Language. We'll have to pick auseful, easy-to-understand subset that covers the common scenarios. Tohandle the more exotic scenarios, you can write a proxy that sits infront of the master, and implements whatever rules you wish, with therules written in C.

I was thinking about this a bit recently. As I see it, there are threefundamental parts of this:

1) We have a transaction that is being committed. The rest of thecomputations here are all relative to it.

2) There is an (internal?) table that lists the state of eachreplication target relative to that transaction. It would include thenode name, perhaps some metadata ('location' seems the one that's mostlikely to help with the remote data center issue), and a state code.The codes from http://wiki.postgresql.org/wiki/Streaming_Replicationwork fine for the last part (which is the only dynamic one--everythingelse is static data being joined against):


async=hasn't received yet
recv=been received but just in RAM
fsync=received and synced to disk
apply=applied to the database

These would need to be enums so they can be ordered from lesser togreater consistency.

So in a 3 node case, the internal state table might look like this aftera bit of data had been committed:


node | location | state
----------------------------------

a | local | fsyncb | remote | recv

c | remote | async

This means that the local node has a fully persistent copy, but the besteither remote one has done is received the data, it's not on disk at allyet at the remote data center. Still working its way through.

3) The decision about whether the data has been committed to enoughplaces to be considered safe by the master is computed by a functionthat is passed this internal table as something like a SRF, and itreturns a boolean. Once that returns true, saying it's satisfied, thetransaction closes on the master and continues to percolate out fromthere. If it's false, we wait for another state change to come in andreturn to (2).

I would propose that most behaviors someone has expressed as being theirdesired implementation is possible to implement using this scheme.-Semi-sync commit: proceed as soon somebody else has a copy and hopethe copies all become consistent: EXISTS WHERE state>=recv-Don't proceed until there's a fsync'd commit on at least one of theremote nodes: EXISTS WHERE location='remote' AND state>=fsync-Look for a quorum of n commits of fsync quality: CASE WHEN (SELECTCOUNT(*) WHERE state>=fsync)>n THEN true else FALSE end;

Syntax is obviously rough but I think you can get the drift of what I'msuggesting.

While exposing the local state and running this computation isn't free,in situations where there truly are remote nodes in here beingcommunicated with the network overhead is going to dwarf that. If therewere a fast path for the simplest cases and this complicated one for therest, I think you could get the fully programmable behavior some peoplewant using simple SQL, rather than having to write a new "ReplicationDescription Language" or something so ambitious. This data about what'sbeen replicated to where looks an awful lot like a set of rows you canoperate on using features already in the database to me.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronization levels in SR

Reply via email to