Re: [HACKERS] Configuring synchronous replication

Heikki Linnakangas Fri, 17 Sep 2010 03:41:53 -0700

On 17/09/10 12:49, Simon Riggs wrote:

This isn't just about UI, there are significant and important
differences between the proposals in terms of the capability and control
they offer.

Sure. The point of focusing on the UI is that the UI demonstrates whatcapability and control a proposal offers.

So what should the user interface be like? Given the 1st and 2nd
requirement, we need standby registration. If some standbys are
important and others are not, the master needs to distinguish between
them to be able to determine that a transaction is safely delivered to
the important standbys.


My patch provides those two requirements without standby registration,
so we very clearly don't "need" standby registration.

It's still not clear to me how you would configure things like "wait forack from reporting slave, but not other slaves" or "wait until replayedin the server on the west coast" in your proposal. Maybe it's possible,but doesn't seem very intuitive, requiring careful configuration in boththe master and the slaves.

In your proposal, you also need to be careful not to connect e.g a testslave with "synchronous_replication_service = apply" to the master, orit will possible shadow a real production slave, acknowledgingtransactions that are not yet received by the real slave. It's certainlypossible to screw up with standby registration too, but you have moredirect control of the master behavior in the master, instead ofdistributing it across all slaves.

The question is do we want standby registration on master and if so,
why?

Well, aside from how to configure synchronous replication, standbyregistration would help with retaining the right amount of WAL in themaster. wal_keep_segments doesn't guarantee that enough is retained, andOTOH when all standbys are connected you retain much more than might berequired.

Giving names to slaves also allows you to view their status in themaster in a more intuitive format. Something like:


postgres=# SELECT * FROM pg_slave_status ;
    name    | connected |  received  |   fsyncd   |  applied
------------+-----------+------------+------------+------------
 reporting  | t         | 0/26000020 | 0/26000020 | 0/25550020
 ha-standby | t         | 0/26000020 | 0/26000020 | 0/26000020
 testserver | f         |            | 0/15000020 |
(3 rows)

For the control between async/recv/fsync/replay, I like to think in
terms of
a) asynchronous vs synchronous
b) if it's synchronous, how synchronous is it? recv, fsync or replay?

I think it makes most sense to set sync vs. async in the master, and the
level of synchronicity in the slave. Although I have sympathy for the
argument that it's simpler if you configure it all from the master side
as well.


I have catered for such requests by suggesting a plugin that allows you
to implement that complexity without overburdening the core code.

Well, plugins are certainly one possibility, but then we need to designthe plugin API. I've been thinking along the lines of a proxy, which canimplement whatever logic you want to decide when to send theacknowledgment. With a proxy as well, if we push any features peoplethat want to a proxy or plugin, we need to make sure that theproxy/plugin has all the necessary information available.

This strikes me as an "ad absurdum" argument. Since the above
over-complexity would doubtless be seen as insane by Tom et al, it
attempts to persuade that we don't need recv, fsync and apply either.

Fujii has long talked about 4 levels of service also. Why change? I had
thought that part was pretty much agreed between all of us.

Now you lost me. I agree that we need 4 levels of service (at leastultimately, not necessarily in the first phase).

Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv ("B") and fsync
("C") as separate options. And Oracle implements all 3 of recv, fsync
and apply. Neither of them describe those options so simply and easily
as the way we are proposing with a 4 valued enum (with async as the
fourth option).

If we have only one option for sync_rep = 'on' which of recv | fsync |
apply would it implement? You don't mention that. Which do you choose?


You would choose between recv, fsync and apply in the slave, with a GUC.

I no longer seek to persuade by words alone. The existence of my patch
means that I think that only measurements and tests will show why I have
been saying these things. We need performance tests.

I don't expect any meaningful differences in terms of performancebetween any of the discussed options. The big question right now is whatfeatures we provide and how they're configured. Performance will dependprimarily on the mode you use, and secondarily on the implementation ofthe mode. It would be completely premature to do performance testing yetIMHO.

Putting all of that together. I think Fujii-san's standby.conf is pretty
close.

What it needs is the additional GUC for transaction-level control.


The difference between the patches is not a simple matter of a GUC.

My proposal allows a single standby to provide efficient replies to
multiple requested durability levels all at the same time. With
efficient use of network resources. ISTM that because the other patch
cannot provide that you'd like to persuade us that we don't need that,
ever. You won't sell me on that point, cos I can see lots of uses for
it.

Simon, how the replies are sent is an implementation detail I haven'tgiven much thought yet. The reason we delved into that discussionearlier was that you seemed to contradict yourself with the claims thatyou don't need to send more than one reply per transaction, and that thestandby doesn't need to know the synchronization level. Other than thatthe curiosity about that contradiction, it doesn't seem like a veryinteresting detail to me right now. It's not a question that drives therest of the design, but the other way round.

But FWIW, something like your proposal of sending 3 XLogRecPtrs in eachreply seems like a good approach. I'm not sure about using walwriter. Ican see that it helps with getting the 'recv' and 'replay'acknowledgments out faster, but I still have the scars from startingbgwriter during recovery.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configuring synchronous replication

Reply via email to