Hi, On 2015-02-23 15:25:57 +0000, Thom Brown wrote: > I've noticed that if the primary is started and then a base backup is > immediately taken from it and started as as a synchronous standby, it > doesn't replicate and the primary hangs indefinitely when trying to run any > WAL-generating statements. It only recovers when either the primary is > restarted (which has to use a fast shutdown otherwise it also hangs > forever), or the standby is restarted. > > Here's a way of reproducing it: > ... > Note that if you run the commands one by one, there isn't a problem. If > you run it as a script, the standby doesn't connect to the primary. There > aren't any errors reported by either the standby or the primary. The > primary's wal sender process reports the following: > > wal sender process rep_user 127.0.0.1(45243) startup waiting for 0/3000158 > > Anyone know why this would be happening? And if this could be a problem in > other scenarios?
Given that normally a walsender doesn't wait for syncrep I guess this is the above backend just did authentication. If you gdb into the walsender, what's the backtrace? We previously had discussions about that being rather annoying; I unfortunately don't remember enough of the thread to reference it here. If it really is this, I think we should add some more smarts about only enabling syncrep once a backend is fully up and maybe even remove it from more scenarios during commits generally (e.g. if no xid was assigned and we just pruned something). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers