Re: Reducing WaitEventSet syscall churn

2021-02-28 Thread Thomas Munro
On Sat, Feb 27, 2021 at 2:48 PM Thomas Munro  wrote:
> Here's the alternative patch set, with no change to existing error
> message behaviour.  I'm going to commit this version and close this CF
> item soon if there are no objections.

Pushed.

That leaves just walreceiver and postgres_fdw in need of WaitEventSet
improvements along these lines.  The raw ingredients for that were
present in earlier patches, and I think Horiguchi-san had the right
idea: we should use the libpq event system to adjust our WES as
appropriate when it changes the socket underneath us.  I will leave
this CF entry open a bit longer in case he would like to post an
updated version of that part (considering Tom's feedback[1]).  If not,
we can close this and try again in the next cycle.

[1] https://www.postgresql.org/message-id/2446176.1594495351%40sss.pgh.pa.us




Re: Reducing WaitEventSet syscall churn

2021-02-26 Thread Thomas Munro
On Tue, Jan 5, 2021 at 6:10 PM Thomas Munro  wrote:
> For point 2, the question I am raising is: why should users get a
> special FATAL message in some places and not others, for PM death?
> However, if people are attached to that behaviour, we could still
> achieve goal 1 without goal 2 by handling PM death explicitly in
> walsender.c and I'd be happy to post an alternative patch set like
> that.

Here's the alternative patch set, with no change to existing error
message behaviour.  I'm going to commit this version and close this CF
item soon if there are no objections.
From 8f563edefdc2f276b3d8abeb019af1ff172bf7d9 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 27 Feb 2021 13:34:37 +1300
Subject: [PATCH v8 1/2] Introduce symbolic names for FeBeWaitSet positions.

Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in the place where we weren't already.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c |  4 ++--
 src/backend/libpq/pqcomm.c| 18 +++---
 src/backend/utils/init/miscinit.c |  6 --
 src/include/libpq/libpq.h |  3 +++
 4 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index d1545a2ad6..bb603ad209 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -180,7 +180,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_READ);
@@ -292,7 +292,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_WRITE);
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 1e6b6db540..27a298f110 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -191,6 +191,9 @@ WaitEventSet *FeBeWaitSet;
 void
 pq_init(void)
 {
+	int		socket_pos PG_USED_FOR_ASSERTS_ONLY;
+	int		latch_pos PG_USED_FOR_ASSERTS_ONLY;
+
 	/* initialize state variables */
 	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
 	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
@@ -219,10 +222,19 @@ pq_init(void)
 #endif
 
 	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
-	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
+	socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
+   MyProcPort->sock, NULL, NULL);
+	latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET,
+  MyLatch, NULL);
+	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
 	  NULL, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+
+	/*
+	 * The event positions match the order we added them, but let's sanity
+	 * check them to be sure.
+	 */
+	Assert(socket_pos == FeBeWaitSetSocketPos);
+	Assert(latch_pos == FeBeWaitSetLatchPos);
 }
 
 /* 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 734c66d4e8..8de70eb46d 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -202,7 +202,8 @@ SwitchToSharedLatch(void)
 	MyLatch = >procLatch;
 
 	if (FeBeWaitSet)
-		ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET,
+		MyLatch);
 
 	/*
 	 * Set the shared latch as the local one might have been set. This
@@ -221,7 +222,8 @@ SwitchBackToLocalLatch(void)
 	MyLatch = 
 
 	if (FeBeWaitSet)
-		ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET,
+		MyLatch);
 
 	SetLatch(MyLatch);
 }
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index b41b10620a..e4e5c21565 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -55,6 +55,9 @@ extern const PGDLLIMPORT PQcommMethods *PqCommMethods;
  */
 extern WaitEventSet *FeBeWaitSet;
 
+#define FeBeWaitSetSocketPos 0
+#define FeBeWaitSetLatchPos 1
+
 extern int	StreamServerPort(int family, const char *hostName,
 			 unsigned short portNumber, const char *unixSocketDir,
 			 pgsocket ListenSocket[], int MaxListen);
-- 
2.30.0

From 22b2c474b476ca91b579408f73ebdc014bb76083 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:48:29 +1300
Subject: [PATCH v8 2/2] Use FeBeWaitSet for walsender.c.

This avoids the need to set up and tear down a new WaitEventSet every
time we 

Re: Reducing WaitEventSet syscall churn

2021-01-04 Thread Thomas Munro
On Fri, Jul 31, 2020 at 9:42 AM Thomas Munro  wrote:
> On Thu, Jul 30, 2020 at 5:50 PM Thomas Munro  wrote:
> > I pushed those three patches, but will wait for more discussion on the rest.
>
> And here's a rebase, to make cfbot happy.

And again.

To restate the two goals of the remaining patches:

1.  Use FeBeWaitSet for walsender, instead of setting up and tearing
down temporary WaitEventSets all the time.
2.  Use the standard WL_EXIT_ON_PM_DEATH flag for FeBeWaitSet, instead
of the special case ereport(FATAL, ... "terminating connection due to
unexpected postmaster exit" ...).

For point 2, the question I am raising is: why should users get a
special FATAL message in some places and not others, for PM death?
However, if people are attached to that behaviour, we could still
achieve goal 1 without goal 2 by handling PM death explicitly in
walsender.c and I'd be happy to post an alternative patch set like
that.
From 72fa3b6858b3ab121fd60533ed7d6a4dcdbc4a5c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:51:09 +1300
Subject: [PATCH v7 1/3] Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet.

Previously, we'd give either a FATAL message or a silent exit() when
we detected postmaster death, depending on which wait point we were at.
Make the exit more uniform, by using the standard exit facility.  This
will allow a later patch to use FeBeWaitSet in a new location without
having to add more explicit handling for postmaster death, in line with
our policy of reducing such clutter.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c | 28 
 src/backend/libpq/pqcomm.c|  2 +-
 2 files changed, 1 insertion(+), 29 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 4cf139a223..746fce8288 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -184,28 +184,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_READ);
 
-		/*
-		 * If the postmaster has died, it's not safe to continue running,
-		 * because it is the postmaster's job to kill us if some other backend
-		 * exits uncleanly.  Moreover, we won't run very well in this state;
-		 * helper processes like walwriter and the bgwriter will exit, so
-		 * performance may be poor.  Finally, if we don't exit, pg_ctl will be
-		 * unable to restart the postmaster without manual intervention, so no
-		 * new connections can be accepted.  Exiting clears the deck for a
-		 * postmaster restart.
-		 *
-		 * (Note that we only make this check when we would otherwise sleep on
-		 * our latch.  We might still continue running for a while if the
-		 * postmaster is killed in mid-query, or even through multiple queries
-		 * if we never have to wait for read.  We don't want to burn too many
-		 * cycles checking for this very rare condition, and this should cause
-		 * us to exit quickly in most cases.)
-		 */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
@@ -296,12 +274,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_WRITE);
 
-		/* See comments in secure_read. */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 1e6b6db540..289d9bd2da 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -222,7 +222,7 @@ pq_init(void)
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+	AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, -1, NULL, NULL);
 }
 
 /* 
-- 
2.20.1

From ec7348f4fc794eed6cfd6bb3897086ef29c213ae Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:48:29 +1300
Subject: [PATCH v7 2/3] Introduce symbolic names for FeBeWaitSet positions.

Previously we used hard coded 0 and 1 to refer to the socket and latch.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c |  4 ++--
 src/backend/libpq/pqcomm.c| 18 +++---
 src/backend/utils/init/miscinit.c |  4 ++--
 src/include/libpq/libpq.h |  3 +++
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure.c 

Re: Reducing WaitEventSet syscall churn

2020-07-30 Thread Thomas Munro
On Thu, Jul 30, 2020 at 5:50 PM Thomas Munro  wrote:
> I pushed those three patches, but will wait for more discussion on the rest.

And here's a rebase, to make cfbot happy.
From a760053ac6fea1b9a829e9a170902a555863ce66 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:51:09 +1300
Subject: [PATCH v6 1/4] Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet.

Previously, we'd give either a FATAL message or a silent exit() when
we detected postmaster death, depending on which wait point we were at.
Make the exit more uniform, by using the standard exit facility.  This
will allow a later patch to use FeBeWaitSet in a new location without
having to add more explicit handling for postmaster death, in line with
our policy of reducing such clutter.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c | 28 
 src/backend/libpq/pqcomm.c|  2 +-
 2 files changed, 1 insertion(+), 29 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 2ae507a902..aec0926d93 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -184,28 +184,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_READ);
 
-		/*
-		 * If the postmaster has died, it's not safe to continue running,
-		 * because it is the postmaster's job to kill us if some other backend
-		 * exits uncleanly.  Moreover, we won't run very well in this state;
-		 * helper processes like walwriter and the bgwriter will exit, so
-		 * performance may be poor.  Finally, if we don't exit, pg_ctl will be
-		 * unable to restart the postmaster without manual intervention, so no
-		 * new connections can be accepted.  Exiting clears the deck for a
-		 * postmaster restart.
-		 *
-		 * (Note that we only make this check when we would otherwise sleep on
-		 * our latch.  We might still continue running for a while if the
-		 * postmaster is killed in mid-query, or even through multiple queries
-		 * if we never have to wait for read.  We don't want to burn too many
-		 * cycles checking for this very rare condition, and this should cause
-		 * us to exit quickly in most cases.)
-		 */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
@@ -296,12 +274,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_WRITE);
 
-		/* See comments in secure_read. */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index ac986c0505..6e91581c0b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -222,7 +222,7 @@ pq_init(void)
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+	AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, -1, NULL, NULL);
 }
 
 /* 
-- 
2.20.1

From 91c7a5a8a04363274ea0e74a69d89e261d7a0e4f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:48:29 +1300
Subject: [PATCH v6 2/4] Introduce symbolic names for FeBeWaitSet positions.

Previously we used hard coded 0 and 1 to refer to the socket and latch.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c |  4 ++--
 src/backend/libpq/pqcomm.c| 18 +++---
 src/backend/utils/init/miscinit.c |  4 ++--
 src/include/libpq/libpq.h |  3 +++
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index aec0926d93..2d5cfbd6f8 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -179,7 +179,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_READ);
@@ -269,7 +269,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_WRITE);
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 

Re: Reducing WaitEventSet syscall churn

2020-07-29 Thread Thomas Munro
On Tue, Jul 14, 2020 at 6:51 PM Thomas Munro  wrote:
> In the meantime, here's a rebase of the more straightforward patches
> in the stack.  These are the ones that deal only with fixed sets of
> file descriptors, and they survive check-world on Linux,
> Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least
> check on macOS and Windows (my CI recipes need more work to get
> check-world working on those two).  There's one user-visible change
> that I'd appreciate feedback on: I propose to drop the FATAL error
> when the postmaster goes away, to make things more consistent.  See
> below for more on that.

Here's the effect of patches 0001-0003 on the number of relevant
system calls generate by "make check" on Linux and FreeBSD, according
to strace/truss -f -c:

epoll_create1: 4,825 -> 865
epoll_ctl: 12,454 -> 2,721
epoll_wait:~45k -> ~45k
close: ~81k -> ~77k

kqueue:4,618 -> 866
kevent:~54k -> ~46k
close: ~65k -> ~61k

I pushed those three patches, but will wait for more discussion on the rest.




Re: Reducing WaitEventSet syscall churn

2020-07-14 Thread Thomas Munro
On Sun, Jul 12, 2020 at 7:22 AM Tom Lane  wrote:
> [...complaints about 0005 and 0006...] We already have
> PGEVT_CONNRESET and PGEVT_CONNDESTROY as application-visible connection
> state change events, and I don't see why those aren't sufficient.

I think Horiguchi-san's general idea of using event callbacks for this
sounds much more promising than my earlier idea of exposing a change
counter (that really was terrible).  If it can be done with existing
events then that's even better.  Perhaps he and/or I can look into
that for the next CF.

In the meantime, here's a rebase of the more straightforward patches
in the stack.  These are the ones that deal only with fixed sets of
file descriptors, and they survive check-world on Linux,
Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least
check on macOS and Windows (my CI recipes need more work to get
check-world working on those two).  There's one user-visible change
that I'd appreciate feedback on: I propose to drop the FATAL error
when the postmaster goes away, to make things more consistent.  See
below for more on that.

Responding to earlier review from Horiguchi-san:

On Tue, Mar 10, 2020 at 12:20 PM Kyotaro Horiguchi
 wrote:
> > 0002: "Use a long lived WaitEventSet for WaitLatch()."
> >
> > In the last version, I had a new function WaitMyLatch(), but that
> > didn't help with recoveryWakeupLatch.  Switching between latches
> > doesn't require a syscall, so I didn't want to have a separate WES and
> > function just for that.  So I went back to using plain old
> > WaitLatch(), and made it "modify" the latch every time before waiting
> > on CommonWaitSet.  An alternative would be to get rid of the concept
> > of latches other than MyLatch, and change the function to
> > WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
> > for which I didn't want to have a separate WES, so I just set that
> > flag every time.  Thoughts?
>
> It is surely an improvement from that we create a full-fledged WES
> every time. The name CommonWaitSet gives an impression as if it is
> used for a variety of waitevent sets, but it is used only by
> WaitLatch. So I would name it LatchWaitSet. With that name I won't be
> surprised by that the function is pointing WL_LATCH_SET by index 0
> without any explanation when calling ModifyWaitSet.

Ok, I changed it to LatchWaitSet.  I also replaced the index 0 with a
symbolic name LatchWaitSetLatchPos, to make that clearer.

> @@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set)
> ReleaseExternalFD();
>  #elif defined(WAIT_USE_KQUEUE)
> close(set->kqueue_fd);
> -   ReleaseExternalFD();
> +   if (set->kqueue_fd >= 0)
> +   {
> +   close(set->kqueue_fd);
> +   ReleaseExternalFD();
> +   }
>
> Did you forget to remove the close() outside the if block?
> Don't we need the same amendment for epoll_fd with kqueue_fd?

Hmm, maybe I screwed that up when resolving a conflict with the
ReleaseExternalFD() stuff.  Fixed.

> WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET
> is given), the function returns immediately.". But now the function
> reacts to latch even if WL_LATCH_SET is not set. I think actually it
> is alwys set so I think we need to modify Assert and function comment
> following the change.

It seems a bit silly to call WaitLatch() if you don't want to wait for
a latch, but I think we can keep that comment and logic by assigning
set->latch = NULL when you wait without WL_LATCH_SET.  I tried that in
the attached.

> > 0004: "Introduce RemoveWaitEvent()."
> >
> > We'll need that to be able to handle connections that are closed and
> > reopened under the covers by libpq (replication, postgres_fdw).  We
> > also wanted this on a couple of other threads for multiplexing FDWs,
> > to be able to adjust the wait set dynamically for a proposed async
> > Append feature.
> >
> > The implementation is a little naive, and leaves holes in the
> > "pollfds" and "handles" arrays (for poll() and win32 implementations).
> > That could be improved with a bit more footwork in a later patch.
> >
> > XXX The Windows version is based on reading documentation.  I'd be
> > very interested to know if check-world passes (especially
> > contrib/postgres_fdw and src/test/recovery).  Unfortunately my
> > appveyor.yml fu is not yet strong enough.
>
> I didn't find the documentation about INVALID_HANDLE_VALUE in
> lpHandles. Could you give me the URL for that?

I was looking for how you do the equivalent of Unix file descriptor -1
in a call to poll(), and somewhere I read that INVALID_HANDLE_VALUE
has the right effect.  I can't find that reference now.  Apparently it
works because that's the pseudo-handle value -1 that is returned by
GetCurrentProcess(), and waiting for your own process waits forever so
it's a suitable value for holes in an array of event handles.  We
should probably call GetCurrentProcess() instead, but really that is
just stand-in code: we 

Re: Reducing WaitEventSet syscall churn

2020-07-11 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 13 Mar 2020, at 08:21, Kyotaro Horiguchi  wrote:
>> The attached are:
>> 0001-0004 Not changed
>> 0005 Fix interface of PQregisterEventProc
>> 0006 Add new libpq event for this use.
>> 0007 Another version of "0006 Reuse a WaitEventSet in
>> libpqwalreceiver.c" based on libpq event.
>> 0008-0011 Not changed (old 0007-0010, blindly appended)

> Since 0001 has been applied already in 9b8aa0929390a, the patchtester is 
> unable
> to make heads or tails with trying this patchset.  Can you please submit a
> rebased version without the already applied changes?

While I've not really looked at this patchset, I did happen to notice
0005, and I think that's utterly unacceptable as written.  You can't
break API/ABI on a released libpq function.  I suppose the patch is
assuming that an enum return value is ABI-equivalent to int, but that
assumption is faulty.  Moreover, what's the point?  None of the later
patches require this AFAICS.  (The patch could probably be salvaged
ABI-wise by making the error codes be integer #define's not an enum,
but I fail to see the point of changing this at all.  I also don't
much like the idea of allowing callers to assume that there is a fixed
set of possible failure conditions for PQregisterEventProc.  If we
wanted to return error details, it'd likely be better to say that an
error message is left in conn->errorMessage.)

0006 is an even more egregious ABI break; you can't renumber existing
enum values without breaking applications.  That in itself could be
fixed by putting the new value at the end.  But I'd still object very
strongly to 0006, because I do not think that it's acceptable to
have pqDropConnection correspond to an application-visible event.
We use that all over the place for cases that should not be
application-visible, for example when deciding that a connection attempt
has failed and moving on to the next candidate host.  We already have
PGEVT_CONNRESET and PGEVT_CONNDESTROY as application-visible connection
state change events, and I don't see why those aren't sufficient.

(BTW, even if these weren't ABI breaks, where are the documentation
changes to go with them?)

regards, tom lane




Re: Reducing WaitEventSet syscall churn

2020-07-02 Thread Daniel Gustafsson
> On 13 Mar 2020, at 08:21, Kyotaro Horiguchi  wrote:

> The attached are:
> 0001-0004 Not changed
> 0005 Fix interface of PQregisterEventProc
> 0006 Add new libpq event for this use.
> 0007 Another version of "0006 Reuse a WaitEventSet in
> libpqwalreceiver.c" based on libpq event.
> 0008-0011 Not changed (old 0007-0010, blindly appended)

Since 0001 has been applied already in 9b8aa0929390a, the patchtester is unable
to make heads or tails with trying this patchset.  Can you please submit a
rebased version without the already applied changes?

cheers ./daniel



Re: Reducing WaitEventSet syscall churn

2020-03-29 Thread Kyotaro Horiguchi
At Fri, 13 Mar 2020 16:21:13 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > 0007: "Use a WaitEventSet for postgres_fdw."
> 
> Continues..

At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro  wrote 
in 
> 0007: "Use a WaitEventSet for postgres_fdw."
> 
> Create a single WaitEventSet and use it for all FDW connections.  By
> having our own dedicated WES, we can do the bookkeeping required to
> know when sockets have been closed or need to removed from kernel wait
> sets explicitly (which would be much harder to achieve if
> CommonWaitSet were to be used like that; you need to know when sockets
> are closed by other code, so you can provide fd_closed to
> RemoveWaitEvent()).

It is straightforward and looks fine. If we use the libpq-event-based
solution instead of using the changecount, pgfdw_wait_for_socket and
disconnect_pg_server would be a bit simpler.

> Concretely, if you use just one postgres_fdw connection, you'll see
> just epoll_wait()/kevent() calls for waits, but whever you switch
> between different connections, you'll see eg EPOLL_DEL/EV_DELETE
> followed by EPOLL_ADD/EV_ADD when the set is adjusted (in the kqueue
> implementation these could be collapse into the following wait, but I

Such syscall sequences is shorter than or equal to what we issue now,
so this patch is still an improvement.

> haven't done the work for that).  An alternative would be to have one
> WES per FDW connection, but that seemed wasteful of file descriptors.

In the multi-connection case, we can save both fds and syscalls by one
WaitEventSet containing all connection fds together.  WaitEventSetWait
should take event mask parameter or ModifyWaitEvent should set the
mask in that case. The event is now completely level-triggered so we
can safely ignore unwanted events.

> 0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet."
> 
> The FATAL message you get if you happen to be waiting for IO rather
> than waiting somewhere else seems arbitrarily different.  By switching
> to a standard automatic exit, it opens the possibility of using
> FeBeWaitSet in a couple more places that would otherwise need to
> create their own WES (see also [1]).  Thoughts?

Don't we really need any message on backend disconnection due to
postmaster death?  As for authentication code, it seems to me the
rationale for not writing the log is that the connection has not been
established at the time. (That depends on what we think the
"connection" is.)

And I suppose that the reason for the authentication logic ignoring
signals is that clients expect that authentication should be completed
as far as possible.

> 0009: "Use FeBeWaitSet for walsender.c."
> 
> Enabled by 0008.

It works and doesn't change behavior. But I found it a bit difficult
to find what events the WaitEventSetWait waits.  Maybe a comment at
the caller sites would be sufficient. I think any comment about the
bare number "0" as the event position of ModifyWaitEvent is also
needed.

> 0010: "Introduce a WaitEventSet for the stats collector."

This looks fine.  The variable event is defined outside its effective
scope but almost all of its function variables the same way.



By the way I found that pqcomm.c uses -1 instead of PGINVALID_SOCKET
for AddWaitEventToSet.

> [1] 
> https://www.postgresql.org/message-id/flat/CA%2BhUKGK%3Dm9dLrq42oWQ4XfK9iDjGiZVwpQ1HkHrAPfG7Kh681g%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Reducing WaitEventSet syscall churn

2020-03-13 Thread Kyotaro Horiguchi
Hello.

At Tue, 10 Mar 2020 08:19:24 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> I'l continue reviewing in later mail.
me> 
me> > 0005: "libpq: Add PQsocketChangeCount to advertise socket changes."
me>  

At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro  wrote
in 
> On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro  wrote:
> 0005: "libpq: Add PQsocketChangeCount to advertise socket changes."
> 
> To support a long lived WES, libpq needs a way tell us when the socket
> changes underneath our feet.  This is the simplest thing I could think
> of; better ideas welcome.

I think at least windows is the reason for not just detecting the
change of the value of fd.  Instead of counting disconnection, we
could use event libpq-event.

QregisterEventProc returns false for all of bad-parameter,
already-registered, out-of-memory and proc-rejection. I don't think it
is usable interface so the attached 0005 patch fixes that. (but I
found it not necessarily needed after making 0007, but I included it
as a proposal separate from this patch set. It's not including the
corresponding doc fix.).

> 0006: "Reuse a WaitEventSet in libpqwalreceiver.c."
> 
> Rather than having all users of libpqwalreceiver.c deal with the
> complicated details of wait set management, have libpqwalreceiver
> expose a waiting interface that understands socket changes.

Looks reasonable. The attached 0006 and 0007 are a possible
replacement if we use libpq-event.

> Unfortunately, I couldn't figure out how to use CommonWaitSet for this
> (ie adding and removing sockets to that as required), due to
> complications with the bookkeeping required to provide the fd_closed
> flag to RemoveWaitEvent().  So it creates its own internal long lived
> WaitEventSet.

Agreed since they are used different way. But with the attached closed
connection is marked as wes_socket_position = -1.

> 0007: "Use a WaitEventSet for postgres_fdw."

Continues..

The attached are:
0001-0004 Not changed
0005 Fix interface of PQregisterEventProc
0006 Add new libpq event for this use.
0007 Another version of "0006 Reuse a WaitEventSet in
 libpqwalreceiver.c" based on libpq event.
0008-0011 Not changed (old 0007-0010, blindly appended)

passed the regression (includeing TAP recovery test) up to here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1a80abd292ff0d6a47274eb188a5576b2ef3cf6e Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 25 Feb 2020 02:53:35 +1300
Subject: [PATCH 01/11] Don't use EV_CLEAR for kqueue events.

For the semantics to match the epoll implementation, we need a
socket to continue to appear readable/writable if you wait
multiple times without doing I/O in between (in Linux
terminology, level-triggered rather than edge-triggered).
Similar to commit 3b790d256f8 for Windows.

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 046ca5c6c7..3b6acfb251 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -991,7 +991,7 @@ WaitEventAdjustKqueueAdd(struct kevent *k_ev, int filter, int action,
 {
 	k_ev->ident = event->fd;
 	k_ev->filter = filter;
-	k_ev->flags = action | EV_CLEAR;
+	k_ev->flags = action;
 	k_ev->fflags = 0;
 	k_ev->data = 0;
 	AccessWaitEvent(k_ev) = event;
@@ -1003,7 +1003,7 @@ WaitEventAdjustKqueueAddPostmaster(struct kevent *k_ev, WaitEvent *event)
 	/* For now postmaster death can only be added, not removed. */
 	k_ev->ident = PostmasterPid;
 	k_ev->filter = EVFILT_PROC;
-	k_ev->flags = EV_ADD | EV_CLEAR;
+	k_ev->flags = EV_ADD;
 	k_ev->fflags = NOTE_EXIT;
 	k_ev->data = 0;
 	AccessWaitEvent(k_ev) = event;
-- 
2.18.2

>From f93551a42b9e9ad0da05197d48fac5790474249d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 15:39:24 +1300
Subject: [PATCH 02/11] Use a long lived WaitEventSet for WaitLatch().

Create CommonWaitSet at backend startup time, and use it to
implement WaitLatch().  This avoids a bunch of epoll/kqueue
system calls, and makes sure we don't run into EMFILE later
due to lack of file descriptors.

Reorder SubPostmasterMain() slightly so that we restore the
postmaster pipe and Windows signal before we reach
InitPostmasterChild(), to make this work in EXEC_BACKEND
builds.

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/postmaster/postmaster.c | 24 +++---
 src/backend/storage/ipc/latch.c | 49 +++--
 src/backend/utils/init/miscinit.c   |  2 ++
 src/include/storage/latch.h |  1 +
 4 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 46be78aadb..c472971ce0 100644
--- 

Re: Reducing WaitEventSet syscall churn

2020-03-09 Thread Kyotaro Horiguchi
Hello.

I looked this.

At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro  wrote 
in 
> On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro  wrote:
> > > > Here are some patches to get rid of frequent system calls.
> 
> Here's a new version of this patch set.  It gets rid of all temporary
> WaitEventSets except a couple I mentioned in another thread[1].
> WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are
> replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl
> auth are also candidates) or a special purpose long lived WaitEventSet
> (replication, postgres_fdw, pgstats).  It passes make check-world with
> WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without
> -DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor).
> 
> 0001: "Don't use EV_CLEAR for kqueue events."
> 
> This fixes a problem in the kqueue implementation that only shows up
> once you switch to long lived WaitEventSets.  It needs to be
> level-triggered like the other implementations, for example because
> there's a place in the recovery tests where we wait twice in a row
> without trying to do I/O in between.  (This is a bit like commit
> 3b790d256f8 that fixed a similar problem on Windows.)

It looks fine in the light of document of kqueue.

> 0002: "Use a long lived WaitEventSet for WaitLatch()."
> 
> In the last version, I had a new function WaitMyLatch(), but that
> didn't help with recoveryWakeupLatch.  Switching between latches
> doesn't require a syscall, so I didn't want to have a separate WES and
> function just for that.  So I went back to using plain old
> WaitLatch(), and made it "modify" the latch every time before waiting
> on CommonWaitSet.  An alternative would be to get rid of the concept
> of latches other than MyLatch, and change the function to
> WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
> for which I didn't want to have a separate WES, so I just set that
> flag every time.  Thoughts?

It is surely an improvement from that we create a full-fledged WES
every time. The name CommonWaitSet gives an impression as if it is
used for a variety of waitevent sets, but it is used only by
WaitLatch. So I would name it LatchWaitSet. With that name I won't be
surprised by that the function is pointing WL_LATCH_SET by index 0
without any explanation when calling ModifyWaitSet.

@@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set)
ReleaseExternalFD();
 #elif defined(WAIT_USE_KQUEUE)
close(set->kqueue_fd);
-   ReleaseExternalFD();
+   if (set->kqueue_fd >= 0)
+   {
+   close(set->kqueue_fd);
+   ReleaseExternalFD();
+   }

Did you forget to remove the close() outside the if block?
Don't we need the same amendment for epoll_fd with kqueue_fd?

WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET
is given), the function returns immediately.". But now the function
reacts to latch even if WL_LATCH_SET is not set. I think actually it
is alwys set so I think we need to modify Assert and function comment
following the change.

> 0003: "Use regular WaitLatch() for condition variables."
> 
> That mechanism doesn't need its own WES anymore.

Looks fine.

> 0004: "Introduce RemoveWaitEvent()."
> 
> We'll need that to be able to handle connections that are closed and
> reopened under the covers by libpq (replication, postgres_fdw).  We
> also wanted this on a couple of other threads for multiplexing FDWs,
> to be able to adjust the wait set dynamically for a proposed async
> Append feature.
> 
> The implementation is a little naive, and leaves holes in the
> "pollfds" and "handles" arrays (for poll() and win32 implementations).
> That could be improved with a bit more footwork in a later patch.
> 
> XXX The Windows version is based on reading documentation.  I'd be
> very interested to know if check-world passes (especially
> contrib/postgres_fdw and src/test/recovery).  Unfortunately my
> appveyor.yml fu is not yet strong enough.

I didn't find the documentation about INVALID_HANDLE_VALUE in
lpHandles. Could you give me the URL for that?

I didn't run recoverycheck because because I couldn't install IPC::Run
for ActivePerl.. But contribcheck succeeded.

+   for (int i = 0; i < nevents; ++i)
+   set->handles[i + 1] = INVALID_HANDLE_VALUE;

It accesses set->handles[nevents], which is overrun.

+   /* Set up the free list. */
+   for (int i = 0; i < nevents; ++i)
+   set->events[i].next_free = i + 1;
+   set->events[nevents - 1].next_free = -1;

It sets the last element twice. (harmless but useless).

set->handles = (HANDLE) data;
data += MAXALIGN(sizeof(HANDLE) * nevents);

It is not an issue of thie patch, but does hadles need nevents + 1
elements?

WaitEventSetSize is not checking its parameter range.

I'l continue reviewing in later mail.

> 0005: "libpq: Add PQsocketChangeCount to advertise socket changes."
 

regards.

-- 
Kyotaro 

Re: Reducing WaitEventSet syscall churn

2020-02-26 Thread Thomas Munro
On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro  wrote:
> > > Here are some patches to get rid of frequent system calls.

Here's a new version of this patch set.  It gets rid of all temporary
WaitEventSets except a couple I mentioned in another thread[1].
WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are
replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl
auth are also candidates) or a special purpose long lived WaitEventSet
(replication, postgres_fdw, pgstats).  It passes make check-world with
WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without
-DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor).

0001: "Don't use EV_CLEAR for kqueue events."

This fixes a problem in the kqueue implementation that only shows up
once you switch to long lived WaitEventSets.  It needs to be
level-triggered like the other implementations, for example because
there's a place in the recovery tests where we wait twice in a row
without trying to do I/O in between.  (This is a bit like commit
3b790d256f8 that fixed a similar problem on Windows.)

0002: "Use a long lived WaitEventSet for WaitLatch()."

In the last version, I had a new function WaitMyLatch(), but that
didn't help with recoveryWakeupLatch.  Switching between latches
doesn't require a syscall, so I didn't want to have a separate WES and
function just for that.  So I went back to using plain old
WaitLatch(), and made it "modify" the latch every time before waiting
on CommonWaitSet.  An alternative would be to get rid of the concept
of latches other than MyLatch, and change the function to
WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
for which I didn't want to have a separate WES, so I just set that
flag every time.  Thoughts?

0003: "Use regular WaitLatch() for condition variables."

That mechanism doesn't need its own WES anymore.

0004: "Introduce RemoveWaitEvent()."

We'll need that to be able to handle connections that are closed and
reopened under the covers by libpq (replication, postgres_fdw).  We
also wanted this on a couple of other threads for multiplexing FDWs,
to be able to adjust the wait set dynamically for a proposed async
Append feature.

The implementation is a little naive, and leaves holes in the
"pollfds" and "handles" arrays (for poll() and win32 implementations).
That could be improved with a bit more footwork in a later patch.

XXX The Windows version is based on reading documentation.  I'd be
very interested to know if check-world passes (especially
contrib/postgres_fdw and src/test/recovery).  Unfortunately my
appveyor.yml fu is not yet strong enough.

0005: "libpq: Add PQsocketChangeCount to advertise socket changes."

To support a long lived WES, libpq needs a way tell us when the socket
changes underneath our feet.  This is the simplest thing I could think
of; better ideas welcome.

0006: "Reuse a WaitEventSet in libpqwalreceiver.c."

Rather than having all users of libpqwalreceiver.c deal with the
complicated details of wait set management, have libpqwalreceiver
expose a waiting interface that understands socket changes.

Unfortunately, I couldn't figure out how to use CommonWaitSet for this
(ie adding and removing sockets to that as required), due to
complications with the bookkeeping required to provide the fd_closed
flag to RemoveWaitEvent().  So it creates its own internal long lived
WaitEventSet.

0007: "Use a WaitEventSet for postgres_fdw."

Create a single WaitEventSet and use it for all FDW connections.  By
having our own dedicated WES, we can do the bookkeeping required to
know when sockets have been closed or need to removed from kernel wait
sets explicitly (which would be much harder to achieve if
CommonWaitSet were to be used like that; you need to know when sockets
are closed by other code, so you can provide fd_closed to
RemoveWaitEvent()).

Concretely, if you use just one postgres_fdw connection, you'll see
just epoll_wait()/kevent() calls for waits, but whever you switch
between different connections, you'll see eg EPOLL_DEL/EV_DELETE
followed by EPOLL_ADD/EV_ADD when the set is adjusted (in the kqueue
implementation these could be collapse into the following wait, but I
haven't done the work for that).  An alternative would be to have one
WES per FDW connection, but that seemed wasteful of file descriptors.

0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet."

The FATAL message you get if you happen to be waiting for IO rather
than waiting somewhere else seems arbitrarily different.  By switching
to a standard automatic exit, it opens the possibility of using
FeBeWaitSet in a couple more places that would otherwise need to
create their own WES (see also [1]).  Thoughts?

0009: "Use FeBeWaitSet for walsender.c."

Enabled by 0008.

0010: "Introduce a WaitEventSet for the stats collector."

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGK%3Dm9dLrq42oWQ4XfK9iDjGiZVwpQ1HkHrAPfG7Kh681g%40mail.gmail.com
From 

Re: Reducing WaitEventSet syscall churn

2020-02-07 Thread Thomas Munro
On Sat, Feb 8, 2020 at 10:00 AM Thomas Munro  wrote:
> On Tue, Jan 21, 2020 at 1:45 PM Thomas Munro  wrote:
> > Here are some patches to get rid of frequent system calls.
>
> Here is one more case that I was sitting on because I wasn't sure how
> to do it: walreceiver.c.  To make that work, libpq needs to be able to
> tell you when the socket has changed, which I did with a counter that
> is exposed to client code in patch 0004.  The walreceiver change in
> 0005 works (trace the system calls on walreceiver to see the
> difference), but perhaps we can come up with a better way to code it
> so that eg logical/worker.c doesn't finish up duplicating the logic.
> Thoughts?

(To be clear: I know the 0005 patch doesn't clean up after itself in
various cases, it's for discussion only to see if others have ideas
about how to structure things to suit various potential users of
libpqwalreceiver.so.)




Re: Reducing WaitEventSet syscall churn

2020-02-07 Thread Thomas Munro
On Tue, Jan 21, 2020 at 1:45 PM Thomas Munro  wrote:
> Here are some patches to get rid of frequent system calls.

Here is one more case that I was sitting on because I wasn't sure how
to do it: walreceiver.c.  To make that work, libpq needs to be able to
tell you when the socket has changed, which I did with a counter that
is exposed to client code in patch 0004.  The walreceiver change in
0005 works (trace the system calls on walreceiver to see the
difference), but perhaps we can come up with a better way to code it
so that eg logical/worker.c doesn't finish up duplicating the logic.
Thoughts?


0001-Add-WaitMyLatch-to-replace-many-WaitLatch-calls.patch
Description: Binary data


0002-Use-WaitMyLatch-for-condition-variables.patch
Description: Binary data


0003-Introduce-a-reusable-WaitEventSet-for-the-stats-coll.patch
Description: Binary data


0004-libpq-Add-PQsocketChangeCount-to-advertise-socket-ch.patch
Description: Binary data


0005-Reuse-a-WaitEventSet-in-walreceiver.patch
Description: Binary data


Reducing WaitEventSet syscall churn

2020-01-20 Thread Thomas Munro
Hi,

Here are some patches to get rid of frequent system calls.

0001 changes all qualifying WaitLatch() calls to use a new function
WaitMyLatch() that reuses a common WaitEventSet.  That's for callers
that only want to wait for their own latch or an optional timeout,
with automatic exit-on-postmaster-death.

0002 changes condition_variable.c to use WaitMyLatch(), instead of its
own local thing like that.  Perhaps this makes up for the use of the
extra fd consumed by 0001.

0003 changes pgstat.c to use its own local reusable WaitEventSet.

To see what I'm talking about, try tracing a whole cluster with eg
strace/truss/dtruss -f postgres -D pgdata.  This applies to Linux
systems, or BSD/macOS systems with the nearby kqueue patch applied.
On systems that fall back to poll(), there aren't any setup/teardown
syscalls.


0001-Add-WaitMyLatch-to-replace-many-WaitLatch-calls.patch
Description: Binary data


0002-Use-WaitMyLatch-for-condition-variables.patch
Description: Binary data


0003-Introduce-a-reusable-WaitEventSet-for-the-stats-coll.patch
Description: Binary data