Hi, Le mercredi 10 septembre 2008, Heikki Linnakangas a écrit : > Sure. That's the fundamental problem with synchronous replication. > That's why many people choose asynchronous replication instead. Clearly > at some point you'll want to give up and continue without the slave, or > kill the master and fail over to the slave. I'm wondering how that's > different than the lag between master and server in asynchronous > replication from the client's point of view.
As a future user of this new facilities, the difference from client's POV is simple : in normal mode of operation, we want a strong guarantee that any COMMIT has made it to both the master and the slave at commit time. No lag whatsoever. You're considering lag as an option in case of failure, but I don't see this as acceptable when you need sync commit. In case of network timeout, cluster is down. So you want to either continue servicing in degraged mode or get the service down while you repair the cluster, but neither of those choice can be transparent to the admins, I'd argue. Of course, main use case is high availability, which tends to say you do not have the option to stop service, and seems to dictate continue servicing in degraded mode: slave can't keep up (whatever the error domain), master is alone, "advertise" to monitoring solutions and continue servicing. And provide some way for the slave to "rejoin", maybe, too. > I'm not sure I understand that paragraph. Who's the user? Do we need to > expose some new information to the client so that it can do something? Maybe with some GUCs where to set the acceptable "timeout" for WAL sync process, and if reaching timeout is a warning or an error. With a userset GUC we could event have replication-error-level transaction concurrent to non critical ones... Now what to do exactly in case of error remains to be decided... HTH, Regards, -- dim
signature.asc
Description: This is a digitally signed message part.