i don't think i've explained things very clearly. the implied contradiction is
that i'd be using asynchronous replication to catch up a slave after a slave
failure and thus i'm losing the transactional consistency that i suggest i
need. if a slave fails and is brought back on line i am indeed proposing that
it catch up with the master asynchronously; however, the slave wouldn't be
promoted to a hot standby until it is completely caught up and could be
reestablished as a synchronous replica (at least that is what i'd like to do in
theory). so i'm proposing that a slave would never be a candidate for a HA
failover unless it is completely in sync with a master: if there is no slave
that is in sync with the master at the time the master fails, then the master
would have to be recovered from the filesystem via traditional recovery. the
fact that i envision 'catching up' a slave to a master using asychronous
replication is not particularly relevant to
the transactional guarantees of the system as a whole if the slave is
effectively unavailable while catching up.
similarly, any slave that isn't caught up to its master would also not be
eligible for queries.
i can understand why the master might hang when there is no reachable replica
during synchronous commit, this is exactly the right thing to do if you want to
guarantee that you have at least 2 distinct spheres of durability. but i'd
prefer to sacrifice the extra durability guarantee in favor of availability in
this case given that recovery from the file system is still an option should
the master subsequently fail. my availability issue is that the master would
clearly be hung/unavailable for an unbounded amount of time without a strong
guarantee about the time it might take to bring a replica back up which is not
acceptable in my case.
if the master hangs commits because there is no active slave, i believe that an
administrator would have to
1. detect that there are no active slaves
2. shut the master down
3. disable synchronous replication
4. bring the master back up
or, alternatively:
1. detect that there are no active slaves
2. interrupt any connections that are blocking on commit
3. set synchronous_replication = local or off on all connections,
effectively disabling synchronous replication going forwardbut i'd prefer
something more automated approach that wouldn't be perceived as an outage to
the client.
i envision some kind of time out after which the slave is removed from the
master's synchronous replica set. and of course i'd need to work out the
mechanics of bringing the slave back up to sync with the master and adding it
back to the replica set, which would clearly require some additional machinery.
i hope that clears it up.
thanks.
________________________________
From: Adrian Klaver <[email protected]>
To: Jameison Martin <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Tuesday, February 28, 2012 7:32 AM
Subject: Re: [GENERAL] synchronous replication: blocking commit on the master
On Monday, February 27, 2012 10:21:24 pm Jameison Martin wrote:
> I have specific needs for wanting synchronous replication instead of
> asynchronous replication, notwithstanding my desire to continue processing
> work on the master if there are no active slaves. I would like to use
> replication for both HA and for query scaling. I'd like replication to be
> synchronous to ensure that any slaves are up to date, and I cannot afford
> even the small data potential loss implied by asynchronous replication.
> However, should there be a situation where no slaves are alive (e.g.
> there is a single slave and it fails for whatever reason), I do not want
> to compromise the availability of the master while the slave is being
> restored. Instead, I'd like to be able to continue processing transactions
> on the master unimpeded until a slave can be brought back online. Once a
> slave is caught back up to the master I'd like to switch back to
> synchronous replication and again be able to use the slave to scale reads
> and as a failover target should the master fail.
>
> Does that make sense?
No not really:)
The two statements below seem to be at odds with each other:
"I'd like replication to be synchronous to ensure that any slaves are up to
date, and I cannot afford even the small data potential loss implied by
asynchronous replication."
"Instead, I'd like to be able to continue processing transactions on the
master
unimpeded until a slave can be brought back online."
It seems you want async sync replication and, under the observation that a
chain
is only as strong as its weakest link, you are really getting async
replication.
That being said, it is your set up and you have the options to have it run the
way you want.
--
Adrian Klaver
[email protected]