Simon Riggs wrote:
You scare me that you see failover as sufficiently frequent that you are
worried that being without one of the servers for an extra 60 seconds
during a failover is a problem. And then say you're not going to add the
feature after all. I really don't understand. If its important, add the
feature, the whole feature that is. If not, don't.

My expectation is that most failovers are serious ones, that the primary
system is down and not coming back very fast. Your worries seem to come
from a scenario where the primary system is still up but Postgres
bounces/crashes, we can diagnose the cause of the crash, decide the
crashed server is safe and then wish to recommence operations on it
again as quickly as possible, where seconds count it doing so.

Are failovers going to be common? Why?

Hi Simon:

I agree with most of your criticism to the "fail over only approach" - but don't agree that fail over frequency should really impact expectations for the failed system to return to service. I see "soft" fails (*not* serious) to potentially be common - somewhere on the network, something went down or some packet was lost, and the system took a few too many seconds to respond. My expectation is that the system can quickly detect that the node is out of service, be removed from the pool, when the situation is resolved (often automatically outside of my control) automatically "catch up" and be put back into the pool. Having to run some other process such as rsync seems unreliable as we already have a mechanism for streaming the data. All that is missing is streaming from an earlier point in time to catch up efficiently and reliably.

I think I'm talking more about the complete solution though which is in line with what you are saying? :-)

Cheers,
mark

--
Mark Mielke <m...@mielke.cc>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to