Re: [HACKERS] Primary not sending to synchronous standby

Andres Freund Mon, 23 Feb 2015 07:41:16 -0800

Hi,

On 2015-02-23 15:25:57 +0000, Thom Brown wrote:
> I've noticed that if the primary is started and then a base backup is
> immediately taken from it and started as as a synchronous standby, it
> doesn't replicate and the primary hangs indefinitely when trying to run any
> WAL-generating statements.  It only recovers when either the primary is
> restarted (which has to use a fast shutdown otherwise it also hangs
> forever), or the standby is restarted.
> 
> Here's a way of reproducing it:
> ...
> Note that if you run the commands one by one, there isn't a problem.  If
> you run it as a script, the standby doesn't connect to the primary.  There
> aren't any errors reported by either the standby or the primary.  The
> primary's wal sender process reports the following:
> 
> wal sender process rep_user 127.0.0.1(45243) startup waiting for 0/3000158
> 
> Anyone know why this would be happening?  And if this could be a problem in
> other scenarios?


Given that normally a walsender doesn't wait for syncrep I guess this is
the above backend just did authentication. If you gdb into the
walsender, what's the backtrace?

We previously had discussions about that being rather annoying; I
unfortunately don't remember enough of the thread to reference it
here. If it really is this, I think we should add some more smarts about
only enabling syncrep once a backend is fully up and maybe even remove
it from more scenarios during commits generally (e.g. if no xid was
assigned and we just pruned something).

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Primary not sending to synchronous standby

Reply via email to