On Wed, Apr 10, 2013 at 7:44 PM, Shaun Thomas <stho...@optionshouse.com> wrote: > On 04/10/2013 11:40 AM, Fujii Masao wrote: > >> Strange. If this is really true, shared disk failover solution is >> fundamentally broken because the standby needs to start up with the >> shared "corrupted" database at the failover. > > > How so? Shared disk doesn't use replication. The point I was trying to make > is that replication requires synchronization between two disparate servers, > and verifying they have exactly the same data is a non-trivial exercise. > Even a single transaction after a failover (effectively) negates the old > server because there's no easy "catch up" mechanism yet. > > Even if this isn't necessarily true, it's the safest approach IMO.
We already rely on WAL-before-data to ensure correct recovery. What is proposed here is to slightly redefine it to require WAL to be replicated before it is considered to be flushed. This ensures that no data page on disk differs from the WAL that the slave has. The machinery to do this is already mostly there, we already wait for WAL flushes and we know the write location on the slave. The second requirement is that we never start up as master and we don't trust any local WAL. This is actually how pacemaker clusters work, you would only need to amend the RA to wipe the WAL and configure postgresql with restart_after_crash = false. It would be very helpful in restoring HA capability after failover if we wouldn't have to read through the whole database after a VM goes down and is migrated with the shared disk onto a new host. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers