Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol
On 06/06/17 23:42, Andres Freund wrote: > On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote: >> On 06/06/17 23:17, Andres Freund wrote: >>> Right. I found a couple more instance of similarly iffy, although not >>> quite as broken, patterns in launcher.c. It's easy to get this wrong, >>> but it's a lot easy if you do it differently everywhere you use a >>> latch. It's not good if code in the same file, by the same author(s), >>> has different ways of using latches. >> >> Huh? I see same pattern everywhere in launcher.c, what am I missing? > > WaitForReplicationWorkerAttach: > while (...) > CHECK_FOR_INTERRUPTS(); > /* other stuff including returns */ > WaitLatch() > WL_POSTMASTER_DEATH > ResetLatch() > > logicalrep_worker_stop loop 1: > while (...) > /* other stuff */ > CHECK_FOR_INTERRUPTS() > WaitLatch() > POSTMASTER_DEATH > ResetLatch() > /* other stuff including returns */ > logicalrep_worker_stop loop 1: > while (...) > /* other stuff including returns */ > CHECK_FOR_INTERRUPTS(); > WaitLatch() > WL_POSTMASTER_DEATH > ResetLatch() > > ApplyLauncherMain: > while (!got_SIGTERM) > /* lots other stuff */ > WaitLatch() > WL_POSTMASTER_DEATH > /* some other stuff */ > ResetLatch() > (note no CFI) > > they're not hugely different, but subtely there are differences. > Sometimes you're guaranteed to check for interrupts after resetting the > latch, in other cases not. Sometimes expensive-ish things happen before > a CFI... > Ah that's because signals in launcher are broken, see https://www.postgresql.org/message-id/fe072153-babd-3b5d-8052-73527a6eb...@2ndquadrant.com which also includes patch to fix it. We originally had custom signal handling everywhere, then I realized it was mistake for workers because of locking interaction but missed same issue with launcher (the CFI in current launcher doesn't work). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol
On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote: > On 06/06/17 23:17, Andres Freund wrote: > > Right. I found a couple more instance of similarly iffy, although not > > quite as broken, patterns in launcher.c. It's easy to get this wrong, > > but it's a lot easy if you do it differently everywhere you use a > > latch. It's not good if code in the same file, by the same author(s), > > has different ways of using latches. > > Huh? I see same pattern everywhere in launcher.c, what am I missing? WaitForReplicationWorkerAttach: while (...) CHECK_FOR_INTERRUPTS(); /* other stuff including returns */ WaitLatch() WL_POSTMASTER_DEATH ResetLatch() logicalrep_worker_stop loop 1: while (...) /* other stuff */ CHECK_FOR_INTERRUPTS() WaitLatch() POSTMASTER_DEATH ResetLatch() /* other stuff including returns */ logicalrep_worker_stop loop 1: while (...) /* other stuff including returns */ CHECK_FOR_INTERRUPTS(); WaitLatch() WL_POSTMASTER_DEATH ResetLatch() ApplyLauncherMain: while (!got_SIGTERM) /* lots other stuff */ WaitLatch() WL_POSTMASTER_DEATH /* some other stuff */ ResetLatch() (note no CFI) they're not hugely different, but subtely there are differences. Sometimes you're guaranteed to check for interrupts after resetting the latch, in other cases not. Sometimes expensive-ish things happen before a CFI... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol
On 06/06/17 23:17, Andres Freund wrote: > On 2017-06-06 17:14:59 -0400, Tom Lane wrote: >> Andres Freundwrites: >>> The function in $subject does: >> >>> ResetLatch(>procLatch); >>> rc = WaitLatchOrSocket(>procLatch, >>>WL_POSTMASTER_DEATH | WL_SOCKET_READABLE >>> | >>>WL_LATCH_SET, >>>PQsocket(streamConn), >>>0, >>>WAIT_EVENT_LIBPQWALRECEIVER); >> >> Yeah, this is certainly broken. >> >>> Afaict, the ResetLatch() really should just instead be in the if (rc & >>> WL_LATCH_SET) block. >> >> And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call, >> since that is the useful work that we want to be sure occurs after >> any latch-setting event. > > Right. I found a couple more instance of similarly iffy, although not > quite as broken, patterns in launcher.c. It's easy to get this wrong, > but it's a lot easy if you do it differently everywhere you use a > latch. It's not good if code in the same file, by the same author(s), > has different ways of using latches. Huh? I see same pattern everywhere in launcher.c, what am I missing? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol
On 2017-06-06 17:14:59 -0400, Tom Lane wrote: > Andres Freundwrites: > > The function in $subject does: > > > ResetLatch(>procLatch); > > rc = WaitLatchOrSocket(>procLatch, > >WL_POSTMASTER_DEATH | WL_SOCKET_READABLE > > | > >WL_LATCH_SET, > >PQsocket(streamConn), > >0, > >WAIT_EVENT_LIBPQWALRECEIVER); > > Yeah, this is certainly broken. > > > Afaict, the ResetLatch() really should just instead be in the if (rc & > > WL_LATCH_SET) block. > > And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call, > since that is the useful work that we want to be sure occurs after > any latch-setting event. Right. I found a couple more instance of similarly iffy, although not quite as broken, patterns in launcher.c. It's easy to get this wrong, but it's a lot easy if you do it differently everywhere you use a latch. It's not good if code in the same file, by the same author(s), has different ways of using latches. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol
Andres Freundwrites: > The function in $subject does: > ResetLatch(>procLatch); > rc = WaitLatchOrSocket(>procLatch, >WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | >WL_LATCH_SET, >PQsocket(streamConn), >0, >WAIT_EVENT_LIBPQWALRECEIVER); Yeah, this is certainly broken. > Afaict, the ResetLatch() really should just instead be in the if (rc & > WL_LATCH_SET) block. And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call, since that is the useful work that we want to be sure occurs after any latch-setting event. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpqrcv_PQexec() seems to violate latch protocol
Hi, The function in $subject does: while (PQisBusy(streamConn)) { int rc; /* * We don't need to break down the sleep into smaller increments, * since we'll get interrupted by signals and can either handle * interrupts here or elog(FATAL) within SIGTERM signal handler if * the signal arrives in the middle of establishment of * replication connection. */ ResetLatch(>procLatch); rc = WaitLatchOrSocket(>procLatch, WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | WL_LATCH_SET, PQsocket(streamConn), 0, WAIT_EVENT_LIBPQWALRECEIVER); if (rc & WL_POSTMASTER_DEATH) exit(1); /* interrupted */ if (rc & WL_LATCH_SET) { CHECK_FOR_INTERRUPTS(); continue; } Doing ResetLatch();WaitLatch() like that makes it possible to miss a the latch being set, e.g. if it happens just after WaitLatchOrSocket() returns. Afaict, the ResetLatch() really should just instead be in the if (rc & WL_LATCH_SET) block. Unless somebody protests, I'll make it so. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers