On Tue, May 25, 2010 at 1:10 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: >> Robert Haas <robertmh...@gmail.com> wrote: >> > Simon Riggs <si...@2ndquadrant.com> wrote: >> >> If we define robustness at the standby level then robustness >> >> depends upon unseen administrators, as well as the current >> >> up/down state of standbys. This is action-at-a-distance in its >> >> worst form. >> > >> > Maybe, but I can't help thinking people are going to want some >> > form of this. The case where someone wants to do sync rep to the >> > machine in the next rack over and async rep to a server at a >> > remote site seems too important to ignore. >> >> I think there may be a terminology issue here -- I took "configure >> by standby" to mean that *at the master* you would specify rules for >> each standby. I think Simon took it to mean that each standby would >> define the rules for replication to it. Maybe this issue can >> resolve gracefully with a bit of clarification? > > The use case of "machine in the next rack over and async rep to a server > at a remote site" would require the settings > > server.nextrack = synch > server.remotesite = async > > which leaves open the question of what happens when "nextrack" is down. > > In many cases, to give adequate performance in that situation people add > an additional server, so the config becomes > > server.nextrack1 = synch > server.nextrack2 = synch > server.remotesite = async > > We then want to specify for performance reasons that we can get a reply > from either nextrack1 or nextrack2, so it all still works safely and > quickly if one of them is down. How can we express that rule concisely? > With some difficulty.
Perhaps the difficulty here is that those still look like per-server settings to me. Just maybe with a different set of semantics. > My suggestion is simply to have a single parameter (name unimportant) > > number_of_synch_servers_we_wait_for = N > > which is much easier to understand because it is phrased in terms of the > guarantee given to the transaction, not in terms of what the admin > thinks is the situation. So I agree that we need to talk about whether or not we want to do this. I'll give my opinion. I am not sure how useful this really is. Consider a master with two standbys. The master commits a transaction and waits for one of the two standbys, then acknowledges the commit back to the user. Then the master crashes. Now what? It's not immediately obvious which standby we should being online as the primary, and if we guess wrong we could lose transactions thought to be committed. This is probably a solvable problem, with enough work: we can write a script to check the last LSN received by each of the two standbys and promote whichever one is further along. But... what happens if the master and one standby BOTH crash simultaneously? There's no way of knowing (until we get at least one of them back up) whether it's safe to promote the other standby. I like the idea of a "quorum commit" type feature where we promise the user that things are committed when "enough" servers have acknowledged the commit. But I think most people are not going to want that configuration unless we also provide some really good management tools that we don't have today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers