On 17/09/10 12:49, Simon Riggs wrote:
This isn't just about UI, there are significant and important
differences between the proposals in terms of the capability and control
they offer.

Sure. The point of focusing on the UI is that the UI demonstrates what capability and control a proposal offers.

So what should the user interface be like? Given the 1st and 2nd
requirement, we need standby registration. If some standbys are
important and others are not, the master needs to distinguish between
them to be able to determine that a transaction is safely delivered to
the important standbys.

My patch provides those two requirements without standby registration,
so we very clearly don't "need" standby registration.

It's still not clear to me how you would configure things like "wait for ack from reporting slave, but not other slaves" or "wait until replayed in the server on the west coast" in your proposal. Maybe it's possible, but doesn't seem very intuitive, requiring careful configuration in both the master and the slaves.

In your proposal, you also need to be careful not to connect e.g a test slave with "synchronous_replication_service = apply" to the master, or it will possible shadow a real production slave, acknowledging transactions that are not yet received by the real slave. It's certainly possible to screw up with standby registration too, but you have more direct control of the master behavior in the master, instead of distributing it across all slaves.

The question is do we want standby registration on master and if so,
why?

Well, aside from how to configure synchronous replication, standby registration would help with retaining the right amount of WAL in the master. wal_keep_segments doesn't guarantee that enough is retained, and OTOH when all standbys are connected you retain much more than might be required.

Giving names to slaves also allows you to view their status in the master in a more intuitive format. Something like:

postgres=# SELECT * FROM pg_slave_status ;
    name    | connected |  received  |   fsyncd   |  applied
------------+-----------+------------+------------+------------
 reporting  | t         | 0/26000020 | 0/26000020 | 0/25550020
 ha-standby | t         | 0/26000020 | 0/26000020 | 0/26000020
 testserver | f         |            | 0/15000020 |
(3 rows)

For the control between async/recv/fsync/replay, I like to think in
terms of
a) asynchronous vs synchronous
b) if it's synchronous, how synchronous is it? recv, fsync or replay?

I think it makes most sense to set sync vs. async in the master, and the
level of synchronicity in the slave. Although I have sympathy for the
argument that it's simpler if you configure it all from the master side
as well.

I have catered for such requests by suggesting a plugin that allows you
to implement that complexity without overburdening the core code.

Well, plugins are certainly one possibility, but then we need to design the plugin API. I've been thinking along the lines of a proxy, which can implement whatever logic you want to decide when to send the acknowledgment. With a proxy as well, if we push any features people that want to a proxy or plugin, we need to make sure that the proxy/plugin has all the necessary information available.

This strikes me as an "ad absurdum" argument. Since the above
over-complexity would doubtless be seen as insane by Tom et al, it
attempts to persuade that we don't need recv, fsync and apply either.

Fujii has long talked about 4 levels of service also. Why change? I had
thought that part was pretty much agreed between all of us.

Now you lost me. I agree that we need 4 levels of service (at least ultimately, not necessarily in the first phase).

Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv ("B") and fsync
("C") as separate options. And Oracle implements all 3 of recv, fsync
and apply. Neither of them describe those options so simply and easily
as the way we are proposing with a 4 valued enum (with async as the
fourth option).

If we have only one option for sync_rep = 'on' which of recv | fsync |
apply would it implement? You don't mention that. Which do you choose?

You would choose between recv, fsync and apply in the slave, with a GUC.

I no longer seek to persuade by words alone. The existence of my patch
means that I think that only measurements and tests will show why I have
been saying these things. We need performance tests.

I don't expect any meaningful differences in terms of performance between any of the discussed options. The big question right now is what features we provide and how they're configured. Performance will depend primarily on the mode you use, and secondarily on the implementation of the mode. It would be completely premature to do performance testing yet IMHO.

Putting all of that together. I think Fujii-san's standby.conf is pretty
close.

What it needs is the additional GUC for transaction-level control.

The difference between the patches is not a simple matter of a GUC.

My proposal allows a single standby to provide efficient replies to
multiple requested durability levels all at the same time. With
efficient use of network resources. ISTM that because the other patch
cannot provide that you'd like to persuade us that we don't need that,
ever. You won't sell me on that point, cos I can see lots of uses for
it.

Simon, how the replies are sent is an implementation detail I haven't given much thought yet. The reason we delved into that discussion earlier was that you seemed to contradict yourself with the claims that you don't need to send more than one reply per transaction, and that the standby doesn't need to know the synchronization level. Other than that the curiosity about that contradiction, it doesn't seem like a very interesting detail to me right now. It's not a question that drives the rest of the design, but the other way round.

But FWIW, something like your proposal of sending 3 XLogRecPtrs in each reply seems like a good approach. I'm not sure about using walwriter. I can see that it helps with getting the 'recv' and 'replay' acknowledgments out faster, but I still have the scars from starting bgwriter during recovery.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to