Re: [HACKERS] Some problems about cascading replication

Simon Riggs Tue, 16 Aug 2011 06:27:45 -0700

On Tue, Aug 16, 2011 at 9:55 AM, Fujii Masao <[email protected]> wrote:


> When I tested the PITR on git master with max_wal_senders > 0,
> I found that the following inappropriate log meesage was always
> output even though cascading replication is not in progress. Attached
> patch fixes this problem.
>
>    LOG:  terminating all walsender processes to force cascaded
> standby(s) to update timeline and reconnect
>
> When making the patch, I found another problem about cascading
> replication; When promoting a cascading standby, postmaster sends
> SIGUSR2 to any cascading walsenders to kill them. But there is a
> orner-case where such walsender fails to receive SIGUSR2 and
> survives a standby promotion unexpectedly. This happens when
> postmaster sends SIGUSR2 before the walsender marks itself as
> a WAL sender, because postmaster sends SIGUSR2 to only the
> processes marked as a WAL sender.
>
> To avoid the corner-case, I changed walsender so that it checks
> whether recovery is in progress or not again after marking itself
> as a WAL sender. If recovery is not in progress even though the
> walsender is cascading one, it does the same thing as SIGUSR2
> signal handler does, and then exits later. Attached patch also includes
> this fix.

Looks like valid problems and appropriate fixes to me. Will commit.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some problems about cascading replication

Reply via email to