Re: Reducing WaitEventSet syscall churn
On Sat, Feb 27, 2021 at 2:48 PM Thomas Munro wrote: > Here's the alternative patch set, with no change to existing error > message behaviour. I'm going to commit this version and close this CF > item soon if there are no objections. Pushed. That leaves just walreceiver and postgres_fdw in need of WaitEventSet improvements along these lines. The raw ingredients for that were present in earlier patches, and I think Horiguchi-san had the right idea: we should use the libpq event system to adjust our WES as appropriate when it changes the socket underneath us. I will leave this CF entry open a bit longer in case he would like to post an updated version of that part (considering Tom's feedback[1]). If not, we can close this and try again in the next cycle. [1] https://www.postgresql.org/message-id/2446176.1594495351%40sss.pgh.pa.us
Re: Reducing WaitEventSet syscall churn
On Tue, Jan 5, 2021 at 6:10 PM Thomas Munro wrote: > For point 2, the question I am raising is: why should users get a > special FATAL message in some places and not others, for PM death? > However, if people are attached to that behaviour, we could still > achieve goal 1 without goal 2 by handling PM death explicitly in > walsender.c and I'd be happy to post an alternative patch set like > that. Here's the alternative patch set, with no change to existing error message behaviour. I'm going to commit this version and close this CF item soon if there are no objections. From 8f563edefdc2f276b3d8abeb019af1ff172bf7d9 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 27 Feb 2021 13:34:37 +1300 Subject: [PATCH v8 1/2] Introduce symbolic names for FeBeWaitSet positions. Previously we used 0 and 1 to refer to the socket and latch in far flung parts of the tree, without any explanation. Also use PGINVALID_SOCKET rather than -1 in the place where we weren't already. Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/libpq/be-secure.c | 4 ++-- src/backend/libpq/pqcomm.c| 18 +++--- src/backend/utils/init/miscinit.c | 6 -- src/include/libpq/libpq.h | 3 +++ 4 files changed, 24 insertions(+), 7 deletions(-) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index d1545a2ad6..bb603ad209 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -180,7 +180,7 @@ retry: Assert(waitfor); - ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL); WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_READ); @@ -292,7 +292,7 @@ retry: Assert(waitfor); - ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL); WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_WRITE); diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 1e6b6db540..27a298f110 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -191,6 +191,9 @@ WaitEventSet *FeBeWaitSet; void pq_init(void) { + int socket_pos PG_USED_FOR_ASSERTS_ONLY; + int latch_pos PG_USED_FOR_ASSERTS_ONLY; + /* initialize state variables */ PqSendBufferSize = PQ_SEND_BUFFER_SIZE; PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize); @@ -219,10 +222,19 @@ pq_init(void) #endif FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3); - AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock, + socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, + MyProcPort->sock, NULL, NULL); + latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET, + MyLatch, NULL); + AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, PGINVALID_SOCKET, NULL, NULL); - AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL); - AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL); + + /* + * The event positions match the order we added them, but let's sanity + * check them to be sure. + */ + Assert(socket_pos == FeBeWaitSetSocketPos); + Assert(latch_pos == FeBeWaitSetLatchPos); } /* diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index 734c66d4e8..8de70eb46d 100644 --- a/src/backend/utils/init/miscinit.c +++ b/src/backend/utils/init/miscinit.c @@ -202,7 +202,8 @@ SwitchToSharedLatch(void) MyLatch = >procLatch; if (FeBeWaitSet) - ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, + MyLatch); /* * Set the shared latch as the local one might have been set. This @@ -221,7 +222,8 @@ SwitchBackToLocalLatch(void) MyLatch = if (FeBeWaitSet) - ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, + MyLatch); SetLatch(MyLatch); } diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h index b41b10620a..e4e5c21565 100644 --- a/src/include/libpq/libpq.h +++ b/src/include/libpq/libpq.h @@ -55,6 +55,9 @@ extern const PGDLLIMPORT PQcommMethods *PqCommMethods; */ extern WaitEventSet *FeBeWaitSet; +#define FeBeWaitSetSocketPos 0 +#define FeBeWaitSetLatchPos 1 + extern int StreamServerPort(int family, const char *hostName, unsigned short portNumber, const char *unixSocketDir, pgsocket ListenSocket[], int MaxListen); -- 2.30.0 From 22b2c474b476ca91b579408f73ebdc014bb76083 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 23:48:29 +1300 Subject: [PATCH v8 2/2] Use FeBeWaitSet for walsender.c. This avoids the need to set up and tear down a new WaitEventSet every time we
Re: Reducing WaitEventSet syscall churn
On Fri, Jul 31, 2020 at 9:42 AM Thomas Munro wrote: > On Thu, Jul 30, 2020 at 5:50 PM Thomas Munro wrote: > > I pushed those three patches, but will wait for more discussion on the rest. > > And here's a rebase, to make cfbot happy. And again. To restate the two goals of the remaining patches: 1. Use FeBeWaitSet for walsender, instead of setting up and tearing down temporary WaitEventSets all the time. 2. Use the standard WL_EXIT_ON_PM_DEATH flag for FeBeWaitSet, instead of the special case ereport(FATAL, ... "terminating connection due to unexpected postmaster exit" ...). For point 2, the question I am raising is: why should users get a special FATAL message in some places and not others, for PM death? However, if people are attached to that behaviour, we could still achieve goal 1 without goal 2 by handling PM death explicitly in walsender.c and I'd be happy to post an alternative patch set like that. From 72fa3b6858b3ab121fd60533ed7d6a4dcdbc4a5c Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 23:51:09 +1300 Subject: [PATCH v7 1/3] Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet. Previously, we'd give either a FATAL message or a silent exit() when we detected postmaster death, depending on which wait point we were at. Make the exit more uniform, by using the standard exit facility. This will allow a later patch to use FeBeWaitSet in a new location without having to add more explicit handling for postmaster death, in line with our policy of reducing such clutter. Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/libpq/be-secure.c | 28 src/backend/libpq/pqcomm.c| 2 +- 2 files changed, 1 insertion(+), 29 deletions(-) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 4cf139a223..746fce8288 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -184,28 +184,6 @@ retry: WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_READ); - /* - * If the postmaster has died, it's not safe to continue running, - * because it is the postmaster's job to kill us if some other backend - * exits uncleanly. Moreover, we won't run very well in this state; - * helper processes like walwriter and the bgwriter will exit, so - * performance may be poor. Finally, if we don't exit, pg_ctl will be - * unable to restart the postmaster without manual intervention, so no - * new connections can be accepted. Exiting clears the deck for a - * postmaster restart. - * - * (Note that we only make this check when we would otherwise sleep on - * our latch. We might still continue running for a while if the - * postmaster is killed in mid-query, or even through multiple queries - * if we never have to wait for read. We don't want to burn too many - * cycles checking for this very rare condition, and this should cause - * us to exit quickly in most cases.) - */ - if (event.events & WL_POSTMASTER_DEATH) - ereport(FATAL, - (errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating connection due to unexpected postmaster exit"))); - /* Handle interrupt. */ if (event.events & WL_LATCH_SET) { @@ -296,12 +274,6 @@ retry: WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_WRITE); - /* See comments in secure_read. */ - if (event.events & WL_POSTMASTER_DEATH) - ereport(FATAL, - (errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating connection due to unexpected postmaster exit"))); - /* Handle interrupt. */ if (event.events & WL_LATCH_SET) { diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 1e6b6db540..289d9bd2da 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -222,7 +222,7 @@ pq_init(void) AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock, NULL, NULL); AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL); - AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL); + AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, -1, NULL, NULL); } /* -- 2.20.1 From ec7348f4fc794eed6cfd6bb3897086ef29c213ae Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 23:48:29 +1300 Subject: [PATCH v7 2/3] Introduce symbolic names for FeBeWaitSet positions. Previously we used hard coded 0 and 1 to refer to the socket and latch. Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/libpq/be-secure.c | 4 ++-- src/backend/libpq/pqcomm.c| 18 +++--- src/backend/utils/init/miscinit.c | 4 ++-- src/include/libpq/libpq.h | 3 +++ 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/src/backend/libpq/be-secure.c
Re: Reducing WaitEventSet syscall churn
On Thu, Jul 30, 2020 at 5:50 PM Thomas Munro wrote: > I pushed those three patches, but will wait for more discussion on the rest. And here's a rebase, to make cfbot happy. From a760053ac6fea1b9a829e9a170902a555863ce66 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 23:51:09 +1300 Subject: [PATCH v6 1/4] Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet. Previously, we'd give either a FATAL message or a silent exit() when we detected postmaster death, depending on which wait point we were at. Make the exit more uniform, by using the standard exit facility. This will allow a later patch to use FeBeWaitSet in a new location without having to add more explicit handling for postmaster death, in line with our policy of reducing such clutter. Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/libpq/be-secure.c | 28 src/backend/libpq/pqcomm.c| 2 +- 2 files changed, 1 insertion(+), 29 deletions(-) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index 2ae507a902..aec0926d93 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -184,28 +184,6 @@ retry: WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_READ); - /* - * If the postmaster has died, it's not safe to continue running, - * because it is the postmaster's job to kill us if some other backend - * exits uncleanly. Moreover, we won't run very well in this state; - * helper processes like walwriter and the bgwriter will exit, so - * performance may be poor. Finally, if we don't exit, pg_ctl will be - * unable to restart the postmaster without manual intervention, so no - * new connections can be accepted. Exiting clears the deck for a - * postmaster restart. - * - * (Note that we only make this check when we would otherwise sleep on - * our latch. We might still continue running for a while if the - * postmaster is killed in mid-query, or even through multiple queries - * if we never have to wait for read. We don't want to burn too many - * cycles checking for this very rare condition, and this should cause - * us to exit quickly in most cases.) - */ - if (event.events & WL_POSTMASTER_DEATH) - ereport(FATAL, - (errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating connection due to unexpected postmaster exit"))); - /* Handle interrupt. */ if (event.events & WL_LATCH_SET) { @@ -296,12 +274,6 @@ retry: WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_WRITE); - /* See comments in secure_read. */ - if (event.events & WL_POSTMASTER_DEATH) - ereport(FATAL, - (errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating connection due to unexpected postmaster exit"))); - /* Handle interrupt. */ if (event.events & WL_LATCH_SET) { diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index ac986c0505..6e91581c0b 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -222,7 +222,7 @@ pq_init(void) AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock, NULL, NULL); AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL); - AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL); + AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, -1, NULL, NULL); } /* -- 2.20.1 From 91c7a5a8a04363274ea0e74a69d89e261d7a0e4f Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 23:48:29 +1300 Subject: [PATCH v6 2/4] Introduce symbolic names for FeBeWaitSet positions. Previously we used hard coded 0 and 1 to refer to the socket and latch. Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/libpq/be-secure.c | 4 ++-- src/backend/libpq/pqcomm.c| 18 +++--- src/backend/utils/init/miscinit.c | 4 ++-- src/include/libpq/libpq.h | 3 +++ 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index aec0926d93..2d5cfbd6f8 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -179,7 +179,7 @@ retry: Assert(waitfor); - ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL); WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_READ); @@ -269,7 +269,7 @@ retry: Assert(waitfor); - ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL); + ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL); WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1, WAIT_EVENT_CLIENT_WRITE); diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index
Re: Reducing WaitEventSet syscall churn
On Tue, Jul 14, 2020 at 6:51 PM Thomas Munro wrote: > In the meantime, here's a rebase of the more straightforward patches > in the stack. These are the ones that deal only with fixed sets of > file descriptors, and they survive check-world on Linux, > Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least > check on macOS and Windows (my CI recipes need more work to get > check-world working on those two). There's one user-visible change > that I'd appreciate feedback on: I propose to drop the FATAL error > when the postmaster goes away, to make things more consistent. See > below for more on that. Here's the effect of patches 0001-0003 on the number of relevant system calls generate by "make check" on Linux and FreeBSD, according to strace/truss -f -c: epoll_create1: 4,825 -> 865 epoll_ctl: 12,454 -> 2,721 epoll_wait:~45k -> ~45k close: ~81k -> ~77k kqueue:4,618 -> 866 kevent:~54k -> ~46k close: ~65k -> ~61k I pushed those three patches, but will wait for more discussion on the rest.
Re: Reducing WaitEventSet syscall churn
On Sun, Jul 12, 2020 at 7:22 AM Tom Lane wrote: > [...complaints about 0005 and 0006...] We already have > PGEVT_CONNRESET and PGEVT_CONNDESTROY as application-visible connection > state change events, and I don't see why those aren't sufficient. I think Horiguchi-san's general idea of using event callbacks for this sounds much more promising than my earlier idea of exposing a change counter (that really was terrible). If it can be done with existing events then that's even better. Perhaps he and/or I can look into that for the next CF. In the meantime, here's a rebase of the more straightforward patches in the stack. These are the ones that deal only with fixed sets of file descriptors, and they survive check-world on Linux, Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least check on macOS and Windows (my CI recipes need more work to get check-world working on those two). There's one user-visible change that I'd appreciate feedback on: I propose to drop the FATAL error when the postmaster goes away, to make things more consistent. See below for more on that. Responding to earlier review from Horiguchi-san: On Tue, Mar 10, 2020 at 12:20 PM Kyotaro Horiguchi wrote: > > 0002: "Use a long lived WaitEventSet for WaitLatch()." > > > > In the last version, I had a new function WaitMyLatch(), but that > > didn't help with recoveryWakeupLatch. Switching between latches > > doesn't require a syscall, so I didn't want to have a separate WES and > > function just for that. So I went back to using plain old > > WaitLatch(), and made it "modify" the latch every time before waiting > > on CommonWaitSet. An alternative would be to get rid of the concept > > of latches other than MyLatch, and change the function to > > WaitMyLatch(). A similar thing happens for exit_on_postmaster_death, > > for which I didn't want to have a separate WES, so I just set that > > flag every time. Thoughts? > > It is surely an improvement from that we create a full-fledged WES > every time. The name CommonWaitSet gives an impression as if it is > used for a variety of waitevent sets, but it is used only by > WaitLatch. So I would name it LatchWaitSet. With that name I won't be > surprised by that the function is pointing WL_LATCH_SET by index 0 > without any explanation when calling ModifyWaitSet. Ok, I changed it to LatchWaitSet. I also replaced the index 0 with a symbolic name LatchWaitSetLatchPos, to make that clearer. > @@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set) > ReleaseExternalFD(); > #elif defined(WAIT_USE_KQUEUE) > close(set->kqueue_fd); > - ReleaseExternalFD(); > + if (set->kqueue_fd >= 0) > + { > + close(set->kqueue_fd); > + ReleaseExternalFD(); > + } > > Did you forget to remove the close() outside the if block? > Don't we need the same amendment for epoll_fd with kqueue_fd? Hmm, maybe I screwed that up when resolving a conflict with the ReleaseExternalFD() stuff. Fixed. > WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET > is given), the function returns immediately.". But now the function > reacts to latch even if WL_LATCH_SET is not set. I think actually it > is alwys set so I think we need to modify Assert and function comment > following the change. It seems a bit silly to call WaitLatch() if you don't want to wait for a latch, but I think we can keep that comment and logic by assigning set->latch = NULL when you wait without WL_LATCH_SET. I tried that in the attached. > > 0004: "Introduce RemoveWaitEvent()." > > > > We'll need that to be able to handle connections that are closed and > > reopened under the covers by libpq (replication, postgres_fdw). We > > also wanted this on a couple of other threads for multiplexing FDWs, > > to be able to adjust the wait set dynamically for a proposed async > > Append feature. > > > > The implementation is a little naive, and leaves holes in the > > "pollfds" and "handles" arrays (for poll() and win32 implementations). > > That could be improved with a bit more footwork in a later patch. > > > > XXX The Windows version is based on reading documentation. I'd be > > very interested to know if check-world passes (especially > > contrib/postgres_fdw and src/test/recovery). Unfortunately my > > appveyor.yml fu is not yet strong enough. > > I didn't find the documentation about INVALID_HANDLE_VALUE in > lpHandles. Could you give me the URL for that? I was looking for how you do the equivalent of Unix file descriptor -1 in a call to poll(), and somewhere I read that INVALID_HANDLE_VALUE has the right effect. I can't find that reference now. Apparently it works because that's the pseudo-handle value -1 that is returned by GetCurrentProcess(), and waiting for your own process waits forever so it's a suitable value for holes in an array of event handles. We should probably call GetCurrentProcess() instead, but really that is just stand-in code: we
Re: Reducing WaitEventSet syscall churn
Daniel Gustafsson writes: >> On 13 Mar 2020, at 08:21, Kyotaro Horiguchi wrote: >> The attached are: >> 0001-0004 Not changed >> 0005 Fix interface of PQregisterEventProc >> 0006 Add new libpq event for this use. >> 0007 Another version of "0006 Reuse a WaitEventSet in >> libpqwalreceiver.c" based on libpq event. >> 0008-0011 Not changed (old 0007-0010, blindly appended) > Since 0001 has been applied already in 9b8aa0929390a, the patchtester is > unable > to make heads or tails with trying this patchset. Can you please submit a > rebased version without the already applied changes? While I've not really looked at this patchset, I did happen to notice 0005, and I think that's utterly unacceptable as written. You can't break API/ABI on a released libpq function. I suppose the patch is assuming that an enum return value is ABI-equivalent to int, but that assumption is faulty. Moreover, what's the point? None of the later patches require this AFAICS. (The patch could probably be salvaged ABI-wise by making the error codes be integer #define's not an enum, but I fail to see the point of changing this at all. I also don't much like the idea of allowing callers to assume that there is a fixed set of possible failure conditions for PQregisterEventProc. If we wanted to return error details, it'd likely be better to say that an error message is left in conn->errorMessage.) 0006 is an even more egregious ABI break; you can't renumber existing enum values without breaking applications. That in itself could be fixed by putting the new value at the end. But I'd still object very strongly to 0006, because I do not think that it's acceptable to have pqDropConnection correspond to an application-visible event. We use that all over the place for cases that should not be application-visible, for example when deciding that a connection attempt has failed and moving on to the next candidate host. We already have PGEVT_CONNRESET and PGEVT_CONNDESTROY as application-visible connection state change events, and I don't see why those aren't sufficient. (BTW, even if these weren't ABI breaks, where are the documentation changes to go with them?) regards, tom lane
Re: Reducing WaitEventSet syscall churn
> On 13 Mar 2020, at 08:21, Kyotaro Horiguchi wrote: > The attached are: > 0001-0004 Not changed > 0005 Fix interface of PQregisterEventProc > 0006 Add new libpq event for this use. > 0007 Another version of "0006 Reuse a WaitEventSet in > libpqwalreceiver.c" based on libpq event. > 0008-0011 Not changed (old 0007-0010, blindly appended) Since 0001 has been applied already in 9b8aa0929390a, the patchtester is unable to make heads or tails with trying this patchset. Can you please submit a rebased version without the already applied changes? cheers ./daniel
Re: Reducing WaitEventSet syscall churn
At Fri, 13 Mar 2020 16:21:13 +0900 (JST), Kyotaro Horiguchi wrote in > > 0007: "Use a WaitEventSet for postgres_fdw." > > Continues.. At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro wrote in > 0007: "Use a WaitEventSet for postgres_fdw." > > Create a single WaitEventSet and use it for all FDW connections. By > having our own dedicated WES, we can do the bookkeeping required to > know when sockets have been closed or need to removed from kernel wait > sets explicitly (which would be much harder to achieve if > CommonWaitSet were to be used like that; you need to know when sockets > are closed by other code, so you can provide fd_closed to > RemoveWaitEvent()). It is straightforward and looks fine. If we use the libpq-event-based solution instead of using the changecount, pgfdw_wait_for_socket and disconnect_pg_server would be a bit simpler. > Concretely, if you use just one postgres_fdw connection, you'll see > just epoll_wait()/kevent() calls for waits, but whever you switch > between different connections, you'll see eg EPOLL_DEL/EV_DELETE > followed by EPOLL_ADD/EV_ADD when the set is adjusted (in the kqueue > implementation these could be collapse into the following wait, but I Such syscall sequences is shorter than or equal to what we issue now, so this patch is still an improvement. > haven't done the work for that). An alternative would be to have one > WES per FDW connection, but that seemed wasteful of file descriptors. In the multi-connection case, we can save both fds and syscalls by one WaitEventSet containing all connection fds together. WaitEventSetWait should take event mask parameter or ModifyWaitEvent should set the mask in that case. The event is now completely level-triggered so we can safely ignore unwanted events. > 0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet." > > The FATAL message you get if you happen to be waiting for IO rather > than waiting somewhere else seems arbitrarily different. By switching > to a standard automatic exit, it opens the possibility of using > FeBeWaitSet in a couple more places that would otherwise need to > create their own WES (see also [1]). Thoughts? Don't we really need any message on backend disconnection due to postmaster death? As for authentication code, it seems to me the rationale for not writing the log is that the connection has not been established at the time. (That depends on what we think the "connection" is.) And I suppose that the reason for the authentication logic ignoring signals is that clients expect that authentication should be completed as far as possible. > 0009: "Use FeBeWaitSet for walsender.c." > > Enabled by 0008. It works and doesn't change behavior. But I found it a bit difficult to find what events the WaitEventSetWait waits. Maybe a comment at the caller sites would be sufficient. I think any comment about the bare number "0" as the event position of ModifyWaitEvent is also needed. > 0010: "Introduce a WaitEventSet for the stats collector." This looks fine. The variable event is defined outside its effective scope but almost all of its function variables the same way. By the way I found that pqcomm.c uses -1 instead of PGINVALID_SOCKET for AddWaitEventToSet. > [1] > https://www.postgresql.org/message-id/flat/CA%2BhUKGK%3Dm9dLrq42oWQ4XfK9iDjGiZVwpQ1HkHrAPfG7Kh681g%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Reducing WaitEventSet syscall churn
Hello. At Tue, 10 Mar 2020 08:19:24 +0900 (JST), Kyotaro Horiguchi wrote in me> I'l continue reviewing in later mail. me> me> > 0005: "libpq: Add PQsocketChangeCount to advertise socket changes." me> At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro wrote in > On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro wrote: > 0005: "libpq: Add PQsocketChangeCount to advertise socket changes." > > To support a long lived WES, libpq needs a way tell us when the socket > changes underneath our feet. This is the simplest thing I could think > of; better ideas welcome. I think at least windows is the reason for not just detecting the change of the value of fd. Instead of counting disconnection, we could use event libpq-event. QregisterEventProc returns false for all of bad-parameter, already-registered, out-of-memory and proc-rejection. I don't think it is usable interface so the attached 0005 patch fixes that. (but I found it not necessarily needed after making 0007, but I included it as a proposal separate from this patch set. It's not including the corresponding doc fix.). > 0006: "Reuse a WaitEventSet in libpqwalreceiver.c." > > Rather than having all users of libpqwalreceiver.c deal with the > complicated details of wait set management, have libpqwalreceiver > expose a waiting interface that understands socket changes. Looks reasonable. The attached 0006 and 0007 are a possible replacement if we use libpq-event. > Unfortunately, I couldn't figure out how to use CommonWaitSet for this > (ie adding and removing sockets to that as required), due to > complications with the bookkeeping required to provide the fd_closed > flag to RemoveWaitEvent(). So it creates its own internal long lived > WaitEventSet. Agreed since they are used different way. But with the attached closed connection is marked as wes_socket_position = -1. > 0007: "Use a WaitEventSet for postgres_fdw." Continues.. The attached are: 0001-0004 Not changed 0005 Fix interface of PQregisterEventProc 0006 Add new libpq event for this use. 0007 Another version of "0006 Reuse a WaitEventSet in libpqwalreceiver.c" based on libpq event. 0008-0011 Not changed (old 0007-0010, blindly appended) passed the regression (includeing TAP recovery test) up to here. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1a80abd292ff0d6a47274eb188a5576b2ef3cf6e Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 25 Feb 2020 02:53:35 +1300 Subject: [PATCH 01/11] Don't use EV_CLEAR for kqueue events. For the semantics to match the epoll implementation, we need a socket to continue to appear readable/writable if you wait multiple times without doing I/O in between (in Linux terminology, level-triggered rather than edge-triggered). Similar to commit 3b790d256f8 for Windows. Author: Thomas Munro Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 046ca5c6c7..3b6acfb251 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -991,7 +991,7 @@ WaitEventAdjustKqueueAdd(struct kevent *k_ev, int filter, int action, { k_ev->ident = event->fd; k_ev->filter = filter; - k_ev->flags = action | EV_CLEAR; + k_ev->flags = action; k_ev->fflags = 0; k_ev->data = 0; AccessWaitEvent(k_ev) = event; @@ -1003,7 +1003,7 @@ WaitEventAdjustKqueueAddPostmaster(struct kevent *k_ev, WaitEvent *event) /* For now postmaster death can only be added, not removed. */ k_ev->ident = PostmasterPid; k_ev->filter = EVFILT_PROC; - k_ev->flags = EV_ADD | EV_CLEAR; + k_ev->flags = EV_ADD; k_ev->fflags = NOTE_EXIT; k_ev->data = 0; AccessWaitEvent(k_ev) = event; -- 2.18.2 >From f93551a42b9e9ad0da05197d48fac5790474249d Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 24 Feb 2020 15:39:24 +1300 Subject: [PATCH 02/11] Use a long lived WaitEventSet for WaitLatch(). Create CommonWaitSet at backend startup time, and use it to implement WaitLatch(). This avoids a bunch of epoll/kqueue system calls, and makes sure we don't run into EMFILE later due to lack of file descriptors. Reorder SubPostmasterMain() slightly so that we restore the postmaster pipe and Windows signal before we reach InitPostmasterChild(), to make this work in EXEC_BACKEND builds. Author: Thomas Munro Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com --- src/backend/postmaster/postmaster.c | 24 +++--- src/backend/storage/ipc/latch.c | 49 +++-- src/backend/utils/init/miscinit.c | 2 ++ src/include/storage/latch.h | 1 + 4 files changed, 61 insertions(+), 15 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 46be78aadb..c472971ce0 100644 ---
Re: Reducing WaitEventSet syscall churn
Hello. I looked this. At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro wrote in > On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro wrote: > > > > Here are some patches to get rid of frequent system calls. > > Here's a new version of this patch set. It gets rid of all temporary > WaitEventSets except a couple I mentioned in another thread[1]. > WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are > replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl > auth are also candidates) or a special purpose long lived WaitEventSet > (replication, postgres_fdw, pgstats). It passes make check-world with > WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without > -DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor). > > 0001: "Don't use EV_CLEAR for kqueue events." > > This fixes a problem in the kqueue implementation that only shows up > once you switch to long lived WaitEventSets. It needs to be > level-triggered like the other implementations, for example because > there's a place in the recovery tests where we wait twice in a row > without trying to do I/O in between. (This is a bit like commit > 3b790d256f8 that fixed a similar problem on Windows.) It looks fine in the light of document of kqueue. > 0002: "Use a long lived WaitEventSet for WaitLatch()." > > In the last version, I had a new function WaitMyLatch(), but that > didn't help with recoveryWakeupLatch. Switching between latches > doesn't require a syscall, so I didn't want to have a separate WES and > function just for that. So I went back to using plain old > WaitLatch(), and made it "modify" the latch every time before waiting > on CommonWaitSet. An alternative would be to get rid of the concept > of latches other than MyLatch, and change the function to > WaitMyLatch(). A similar thing happens for exit_on_postmaster_death, > for which I didn't want to have a separate WES, so I just set that > flag every time. Thoughts? It is surely an improvement from that we create a full-fledged WES every time. The name CommonWaitSet gives an impression as if it is used for a variety of waitevent sets, but it is used only by WaitLatch. So I would name it LatchWaitSet. With that name I won't be surprised by that the function is pointing WL_LATCH_SET by index 0 without any explanation when calling ModifyWaitSet. @@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set) ReleaseExternalFD(); #elif defined(WAIT_USE_KQUEUE) close(set->kqueue_fd); - ReleaseExternalFD(); + if (set->kqueue_fd >= 0) + { + close(set->kqueue_fd); + ReleaseExternalFD(); + } Did you forget to remove the close() outside the if block? Don't we need the same amendment for epoll_fd with kqueue_fd? WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET is given), the function returns immediately.". But now the function reacts to latch even if WL_LATCH_SET is not set. I think actually it is alwys set so I think we need to modify Assert and function comment following the change. > 0003: "Use regular WaitLatch() for condition variables." > > That mechanism doesn't need its own WES anymore. Looks fine. > 0004: "Introduce RemoveWaitEvent()." > > We'll need that to be able to handle connections that are closed and > reopened under the covers by libpq (replication, postgres_fdw). We > also wanted this on a couple of other threads for multiplexing FDWs, > to be able to adjust the wait set dynamically for a proposed async > Append feature. > > The implementation is a little naive, and leaves holes in the > "pollfds" and "handles" arrays (for poll() and win32 implementations). > That could be improved with a bit more footwork in a later patch. > > XXX The Windows version is based on reading documentation. I'd be > very interested to know if check-world passes (especially > contrib/postgres_fdw and src/test/recovery). Unfortunately my > appveyor.yml fu is not yet strong enough. I didn't find the documentation about INVALID_HANDLE_VALUE in lpHandles. Could you give me the URL for that? I didn't run recoverycheck because because I couldn't install IPC::Run for ActivePerl.. But contribcheck succeeded. + for (int i = 0; i < nevents; ++i) + set->handles[i + 1] = INVALID_HANDLE_VALUE; It accesses set->handles[nevents], which is overrun. + /* Set up the free list. */ + for (int i = 0; i < nevents; ++i) + set->events[i].next_free = i + 1; + set->events[nevents - 1].next_free = -1; It sets the last element twice. (harmless but useless). set->handles = (HANDLE) data; data += MAXALIGN(sizeof(HANDLE) * nevents); It is not an issue of thie patch, but does hadles need nevents + 1 elements? WaitEventSetSize is not checking its parameter range. I'l continue reviewing in later mail. > 0005: "libpq: Add PQsocketChangeCount to advertise socket changes." regards. -- Kyotaro
Re: Reducing WaitEventSet syscall churn
On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro wrote: > > > Here are some patches to get rid of frequent system calls. Here's a new version of this patch set. It gets rid of all temporary WaitEventSets except a couple I mentioned in another thread[1]. WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl auth are also candidates) or a special purpose long lived WaitEventSet (replication, postgres_fdw, pgstats). It passes make check-world with WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without -DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor). 0001: "Don't use EV_CLEAR for kqueue events." This fixes a problem in the kqueue implementation that only shows up once you switch to long lived WaitEventSets. It needs to be level-triggered like the other implementations, for example because there's a place in the recovery tests where we wait twice in a row without trying to do I/O in between. (This is a bit like commit 3b790d256f8 that fixed a similar problem on Windows.) 0002: "Use a long lived WaitEventSet for WaitLatch()." In the last version, I had a new function WaitMyLatch(), but that didn't help with recoveryWakeupLatch. Switching between latches doesn't require a syscall, so I didn't want to have a separate WES and function just for that. So I went back to using plain old WaitLatch(), and made it "modify" the latch every time before waiting on CommonWaitSet. An alternative would be to get rid of the concept of latches other than MyLatch, and change the function to WaitMyLatch(). A similar thing happens for exit_on_postmaster_death, for which I didn't want to have a separate WES, so I just set that flag every time. Thoughts? 0003: "Use regular WaitLatch() for condition variables." That mechanism doesn't need its own WES anymore. 0004: "Introduce RemoveWaitEvent()." We'll need that to be able to handle connections that are closed and reopened under the covers by libpq (replication, postgres_fdw). We also wanted this on a couple of other threads for multiplexing FDWs, to be able to adjust the wait set dynamically for a proposed async Append feature. The implementation is a little naive, and leaves holes in the "pollfds" and "handles" arrays (for poll() and win32 implementations). That could be improved with a bit more footwork in a later patch. XXX The Windows version is based on reading documentation. I'd be very interested to know if check-world passes (especially contrib/postgres_fdw and src/test/recovery). Unfortunately my appveyor.yml fu is not yet strong enough. 0005: "libpq: Add PQsocketChangeCount to advertise socket changes." To support a long lived WES, libpq needs a way tell us when the socket changes underneath our feet. This is the simplest thing I could think of; better ideas welcome. 0006: "Reuse a WaitEventSet in libpqwalreceiver.c." Rather than having all users of libpqwalreceiver.c deal with the complicated details of wait set management, have libpqwalreceiver expose a waiting interface that understands socket changes. Unfortunately, I couldn't figure out how to use CommonWaitSet for this (ie adding and removing sockets to that as required), due to complications with the bookkeeping required to provide the fd_closed flag to RemoveWaitEvent(). So it creates its own internal long lived WaitEventSet. 0007: "Use a WaitEventSet for postgres_fdw." Create a single WaitEventSet and use it for all FDW connections. By having our own dedicated WES, we can do the bookkeeping required to know when sockets have been closed or need to removed from kernel wait sets explicitly (which would be much harder to achieve if CommonWaitSet were to be used like that; you need to know when sockets are closed by other code, so you can provide fd_closed to RemoveWaitEvent()). Concretely, if you use just one postgres_fdw connection, you'll see just epoll_wait()/kevent() calls for waits, but whever you switch between different connections, you'll see eg EPOLL_DEL/EV_DELETE followed by EPOLL_ADD/EV_ADD when the set is adjusted (in the kqueue implementation these could be collapse into the following wait, but I haven't done the work for that). An alternative would be to have one WES per FDW connection, but that seemed wasteful of file descriptors. 0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet." The FATAL message you get if you happen to be waiting for IO rather than waiting somewhere else seems arbitrarily different. By switching to a standard automatic exit, it opens the possibility of using FeBeWaitSet in a couple more places that would otherwise need to create their own WES (see also [1]). Thoughts? 0009: "Use FeBeWaitSet for walsender.c." Enabled by 0008. 0010: "Introduce a WaitEventSet for the stats collector." [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK%3Dm9dLrq42oWQ4XfK9iDjGiZVwpQ1HkHrAPfG7Kh681g%40mail.gmail.com From
Re: Reducing WaitEventSet syscall churn
On Sat, Feb 8, 2020 at 10:00 AM Thomas Munro wrote: > On Tue, Jan 21, 2020 at 1:45 PM Thomas Munro wrote: > > Here are some patches to get rid of frequent system calls. > > Here is one more case that I was sitting on because I wasn't sure how > to do it: walreceiver.c. To make that work, libpq needs to be able to > tell you when the socket has changed, which I did with a counter that > is exposed to client code in patch 0004. The walreceiver change in > 0005 works (trace the system calls on walreceiver to see the > difference), but perhaps we can come up with a better way to code it > so that eg logical/worker.c doesn't finish up duplicating the logic. > Thoughts? (To be clear: I know the 0005 patch doesn't clean up after itself in various cases, it's for discussion only to see if others have ideas about how to structure things to suit various potential users of libpqwalreceiver.so.)
Re: Reducing WaitEventSet syscall churn
On Tue, Jan 21, 2020 at 1:45 PM Thomas Munro wrote: > Here are some patches to get rid of frequent system calls. Here is one more case that I was sitting on because I wasn't sure how to do it: walreceiver.c. To make that work, libpq needs to be able to tell you when the socket has changed, which I did with a counter that is exposed to client code in patch 0004. The walreceiver change in 0005 works (trace the system calls on walreceiver to see the difference), but perhaps we can come up with a better way to code it so that eg logical/worker.c doesn't finish up duplicating the logic. Thoughts? 0001-Add-WaitMyLatch-to-replace-many-WaitLatch-calls.patch Description: Binary data 0002-Use-WaitMyLatch-for-condition-variables.patch Description: Binary data 0003-Introduce-a-reusable-WaitEventSet-for-the-stats-coll.patch Description: Binary data 0004-libpq-Add-PQsocketChangeCount-to-advertise-socket-ch.patch Description: Binary data 0005-Reuse-a-WaitEventSet-in-walreceiver.patch Description: Binary data
Reducing WaitEventSet syscall churn
Hi, Here are some patches to get rid of frequent system calls. 0001 changes all qualifying WaitLatch() calls to use a new function WaitMyLatch() that reuses a common WaitEventSet. That's for callers that only want to wait for their own latch or an optional timeout, with automatic exit-on-postmaster-death. 0002 changes condition_variable.c to use WaitMyLatch(), instead of its own local thing like that. Perhaps this makes up for the use of the extra fd consumed by 0001. 0003 changes pgstat.c to use its own local reusable WaitEventSet. To see what I'm talking about, try tracing a whole cluster with eg strace/truss/dtruss -f postgres -D pgdata. This applies to Linux systems, or BSD/macOS systems with the nearby kqueue patch applied. On systems that fall back to poll(), there aren't any setup/teardown syscalls. 0001-Add-WaitMyLatch-to-replace-many-WaitLatch-calls.patch Description: Binary data 0002-Use-WaitMyLatch-for-condition-variables.patch Description: Binary data 0003-Introduce-a-reusable-WaitEventSet-for-the-stats-coll.patch Description: Binary data