At Wed, 20 Apr 2016 16:16:40 +0900, Fujii Masao <masao.fu...@gmail.com> wrote in <cahgqgwhvzv2j0qoda8x1xcx3cbabmjtveqeolfzx8hq5g25...@mail.gmail.com> > On Thu, Mar 31, 2016 at 9:15 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: > > Hi hackers, > > > > If you shut down a primary server, a standby that is streaming from it > > says54: > > > > LOG: replication terminated by primary server > > DETAIL: End of WAL reached on timeline 1 at 0/14F4B68. > > FATAL: could not send end-of-streaming message to primary: no COPY in > > progress > > > > Isn't that FATAL ereport a bug? > > ISTM that the cause is that walsender exits and replication connection is > closed just after "COPY 0" is sent. That is, then after receiving "COPY 0", > walreceiver tries to send an end-of-copy message to the primary, but fails > because the connection has been already closed.
Though the message is followed by repetitions of other FATAL messages, the message above itself seems a bit alarming. > > How is clean server shutdown supposed to work? > > One option is to make walsender wait for end-of-copy message from walreceiver > before it closes the connection and exits, after sending "COPY 0" message. > But one question is; how should walsender behave when walreceiver gets stuck > and cannot reply an end-of-copy message to walsender? Probably we need > the timeout (maybe we can use wal_sender_timeout here but not sure yet > if it's appropriate or not). -1. It is totally useless other than to avoid the FATAL message. > Another option is to prevent walreceiver from sending an end-of-copy message. > If "COPY 0" always means the exit of walsender and the termination of > the connection, there seems to be no need to send back an end-of-copy message. > I've not checked yet how this interferes with other replication logics, > though. Looking into walsender.c, walsender thinks "COPY 0" is a signal of its death coming just after, that is, proc_exit(0). On the other hand the comment at the beginning of walreceiver.c says that, * If the primary server ends streaming, but doesn't disconnect, walreceiver * goes into "waiting" mode, and waits for the startup process to give new * instructions. The startup process will treat that the same as * disconnection, and will rescan the archive/pg_xlog directory. But when the * startup process wants to try streaming replication again, it will just * nudge the existing walreceiver process that's waiting, instead of launching * a new one. If we assume this is an useful behavior and want to keep it, a termination after an end of XLOG streaming is just the same with that for psql. | FATAL: terminating connection due to administrator command | server closed the connection unexpectedly | This probably means the server terminated abnormally | before or while processing the request. Or, we should provide another command to inform a termination. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers