On 01/09/10 10:53, Fujii Masao wrote:
Before discussing about that, we should determine whether registering
standbys in master is really required. It affects configuration a lot.
Heikki thinks that it's required, but I'm still unclear about why and
how.

Why do standbys need to be registered in master? What information
should be registered?

That requirement falls out from the handling of disconnected standbys. If a standby is not connected, what does the master do with commits? If the answer is anything else than acknowledge them to the client immediately, as if the standby never existed, the master needs to know what standby servers exist. Otherwise it can't know if all the standbys are connected or not.

What does synchronous replication mean, when is a transaction
acknowledged as committed?

I proposed four synchronization levels:

1. async
   doesn't make transaction commit wait for replication, i.e.,
   asynchronous replication. This mode has been already supported in
   9.0.

2. recv
   makes transaction commit wait until the standby has received WAL
   records.

3. fsync
   makes transaction commit wait until the standby has received and
   flushed WAL records to disk

4. replay
   makes transaction commit wait until the standby has replayed WAL
   records after receiving and flushing them to disk

OTOH, Simon proposed the quorum commit feature. I think that both
is required for various our use cases. Thought?

I'd like to keep this as simple as possible, yet flexible so that with enough scripting and extensions, you can get all sorts of behavior. I think quorum commit falls into the "extension" category; if you're setup is complex enough, it's going to be impossible to represent that in our config files no matter what. But if you write a little proxy, you can implement arbitrary rules there.

I think recv/fsync/replay should be specified in the standby. It has no direct effect on the master, the master would just relay the setting to the standby when it connects, or the standby would send multiple XLogRecPtrs and let the master decide when the WAL is persistent enough. And what if you write a proxy that has some other meaning of "persistent enough"? Like when it has been written to the OS buffers but not yet fsync'd, or when it has been fsync'd to at least one standby and received by at least three others. recv/fsync/replay is not going to represent that behavior well.

"sync vs async" on the other hand should be specified in the master, because it has a direct impact on the behavior of commits in the master.

I propose a configuration file standbys.conf, in the master:

# STANDBY NAME    SYNCHRONOUS   TIMEOUT
importantreplica  yes           100ms
tempcopy          no            10s

Or perhaps this should be stored in a system catalog.

What to do if a standby server dies and never
acknowledges a commit?

The master's reaction to that situation should be configurable. So
I'd propose new configuration parameter specifying the reaction.
Valid values are:

- standalone
   When the master has waited for the ACK much longer than the timeout
   (or detected the failure of the standby), it closes the connection
   to the standby and restarts transactions.

- down
   When that situation occurs, the master shuts down immediately.
   Though this is unsafe for the system requiring high availability,
   as far as I recall, some people wanted this mode in the previous
   discussion.

Yeah, though of course you might want to set that per-standby too..


Let's step back a bit and ask what would be the simplest thing that you could call "synchronous replication" in good conscience, and also be useful at least to some people. Let's leave out the "down" mode, because that requires registration. We'll probably have to do registration at some point, but let's take as small steps as possible.

Without the "down" mode in the master, frankly I don't see the point of the "recv" and "fsync" levels in the standby. Either way, when the master acknowledges a commit to the client, you don't know if it has made it to the standby yet because the replication connection might be down for some reason.

That leaves us the 'replay' mode, which *is* useful, because it gives you the guarantee that when the master acknowledges a commit, it will appear committed in all hot standby servers that are currently connected. With that guarantee you can build a reliable cluster with something pgpool-II where all writes go to one node, and reads are distributed to multiple nodes.

I'm not sure what we should aim for in the first phase. But if you want as little code as possible yet have something useful, I think 'replay' mode with no standby registration is the way to go.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to