Hi,

Le mercredi 10 septembre 2008, Heikki Linnakangas a écrit :
> Sure. That's the fundamental problem with synchronous replication.
> That's why many people choose asynchronous replication instead. Clearly
> at some point you'll want to give up and continue without the slave, or
> kill the master and fail over to the slave. I'm wondering how that's
> different than the lag between master and server in asynchronous
> replication from the client's point of view.

As a future user of this new facilities, the difference from client's POV is 
simple : in normal mode of operation, we want a strong guarantee that any 
COMMIT has made it to both the master and the slave at commit time. No lag 
whatsoever.

You're considering lag as an option in case of failure, but I don't see this 
as acceptable when you need sync commit. In case of network timeout, cluster 
is down. So you want to either continue servicing in degraged mode or get the 
service down while you repair the cluster, but neither of those choice can be 
transparent to the admins, I'd argue.

Of course, main use case is high availability, which tends to say you do not 
have the option to stop service, and seems to dictate continue servicing in 
degraded mode: slave can't keep up (whatever the error domain), master is 
alone, "advertise" to monitoring solutions and continue servicing.
And provide some way for the slave to "rejoin", maybe, too.

> I'm not sure I understand that paragraph. Who's the user? Do we need to
> expose some new information to the client so that it can do something?

Maybe with some GUCs where to set the acceptable "timeout" for WAL sync 
process, and if reaching timeout is a warning or an error. With a userset GUC 
we could event have replication-error-level transaction concurrent to non 
critical ones...

Now what to do exactly in case of error remains to be decided...

HTH, Regards,
-- 
dim

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to