On Thu, Mar 2, 2023 at 9:57 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Thu, Mar 2, 2023 at 9:49 AM Gregory Stark (as CFM)
> <stark....@gmail.com> wrote:
> > On Mon, 20 Feb 2023 at 23:04, Thomas Munro <thomas.mu...@gmail.com> wrote:
> > > Done like that in this version.  This is the version I'm thinking of
> > > committing, unless someone wants to argue for another level.
> >
> > FWIW the cfbot doesn't understand this patch series. I'm not sure why
> > but it's only trying to apply the first (the MacOS one) and it's
> > failing to apply even that.
>
> Ah, it's because I committed one patch in the series.  I'll commit one
> more, and then repost the rest, shortly.

I pushed the main patch, "Don't leak descriptors into subprograms.".
Here's a rebase of the POSIX-next stuff, but I'll sit on these for a
bit longer to see if the build farm agrees with my claim about the
ubiquity of O_CLOEXEC, and if anyone has comments on this stuff.
From 1d03c0b2811b53a20eeff781918ef99d4c9b9df9 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 6 Feb 2023 13:51:19 +1300
Subject: [PATCH v5 1/3] Use accept4() to accept connections, where available.

The postmaster previously accepted connections and then made extra
system calls in child processes to make them non-blocking and (as of a
recent commit) close-on-exit.

On all target Unixes except macOS and AIX, that can be done atomically
with flags to the newer accept4() variant (expected in the next POSIX
revision, but around in the wild for many years now), saving a couple of
system calls.

The exceptions are Windows, where we didn't have to worry about that
problem anyway, EXEC_BACKEND builds on Unix (used by developers only for
slightly-more-like-Windows testing), and the straggler Unixes that
haven't implemented it yet (at the time of writing: macOS and AIX).

Suggested-by: Andres Freund <and...@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 configure                  | 12 ++++++++++++
 configure.ac               |  1 +
 meson.build                |  1 +
 src/backend/libpq/pqcomm.c | 34 +++++++++++++++++++++++++++-------
 src/include/pg_config.h.in |  4 ++++
 5 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index e35769ea73..d06f64cdbb 100755
--- a/configure
+++ b/configure
@@ -16219,6 +16219,18 @@ _ACEOF
 
 # We can't use AC_REPLACE_FUNCS to replace these functions, because it
 # won't handle deployment target restrictions on macOS
+ac_fn_c_check_decl "$LINENO" "accept4" "ac_cv_have_decl_accept4" "#include <sys/socket.h>
+"
+if test "x$ac_cv_have_decl_accept4" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_ACCEPT4 $ac_have_decl
+_ACEOF
+
 ac_fn_c_check_decl "$LINENO" "preadv" "ac_cv_have_decl_preadv" "#include <sys/uio.h>
 "
 if test "x$ac_cv_have_decl_preadv" = xyes; then :
diff --git a/configure.ac b/configure.ac
index af23c15cb2..070a0b33db 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1843,6 +1843,7 @@ AC_CHECK_DECLS([strlcat, strlcpy, strnlen])
 
 # We can't use AC_REPLACE_FUNCS to replace these functions, because it
 # won't handle deployment target restrictions on macOS
+AC_CHECK_DECLS([accept4], [], [], [#include <sys/socket.h>])
 AC_CHECK_DECLS([preadv], [], [AC_LIBOBJ(preadv)], [#include <sys/uio.h>])
 AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include <sys/uio.h>])
 
diff --git a/meson.build b/meson.build
index 26be83afb6..c91fb05133 100644
--- a/meson.build
+++ b/meson.build
@@ -2097,6 +2097,7 @@ decl_checks = [
 # checking for library symbols wouldn't handle deployment target
 # restrictions on macOS
 decl_checks += [
+  ['accept4', 'sys/socket.h'],
   ['preadv', 'sys/uio.h'],
   ['pwritev', 'sys/uio.h'],
 ]
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index da5bb5fc5d..8b97d1a7e3 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -184,6 +184,14 @@ pq_init(void)
 	/* set up process-exit hook to close the socket */
 	on_proc_exit(socket_close, 0);
 
+#ifndef WIN32
+	/*
+	 * On most systems, the socket has already been made non-blocking and
+	 * close-on-exec by StreamConnection().  On systems that don't have
+	 * accept4() yet, and EXEC_BACKEND builds that need the socket to survive
+	 * exec*(), we do that separately here in the child process.
+	 */
+#if !HAVE_DECL_ACCEPT4 || defined(EXEC_BACKEND)
 	/*
 	 * In backends (as soon as forked) we operate the underlying socket in
 	 * nonblocking mode and use latches to implement blocking semantics if
@@ -194,17 +202,14 @@ pq_init(void)
 	 * the client, which might require changing the mode again, leading to
 	 * infinite recursion.
 	 */
-#ifndef WIN32
 	if (!pg_set_noblock(MyProcPort->sock))
 		ereport(COMMERROR,
 				(errmsg("could not set socket to nonblocking mode: %m")));
-#endif
-
-#ifndef WIN32
 
 	/* Don't give the socket to any subprograms we execute. */
 	if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0)
 		elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
+#endif
 #endif
 
 	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, FeBeWaitSetNEvents);
@@ -699,9 +704,24 @@ StreamConnection(pgsocket server_fd, Port *port)
 {
 	/* accept connection and fill in the client (remote) address */
 	port->raddr.salen = sizeof(port->raddr.addr);
-	if ((port->sock = accept(server_fd,
-							 (struct sockaddr *) &port->raddr.addr,
-							 &port->raddr.salen)) == PGINVALID_SOCKET)
+
+#if HAVE_DECL_ACCEPT4 && !defined(EXEC_BACKEND)
+	/*
+	 * If we have accept4(), we can avoid a couple of fcntl() calls in
+	 * pq_init().  We can't use this optimization in EXEC_BACKEND builds,
+	 * though, because they need the socket to survive exec().
+	 */
+	port->sock = accept4(server_fd,
+						 (struct sockaddr *) &port->raddr.addr,
+						 &port->raddr.salen,
+						 SOCK_CLOEXEC | SOCK_NONBLOCK);
+#else
+	port->sock = accept(server_fd,
+						(struct sockaddr *) &port->raddr.addr,
+						&port->raddr.salen);
+#endif
+
+	if (port->sock == PGINVALID_SOCKET)
 	{
 		ereport(LOG,
 				(errcode_for_socket_access(),
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 20c82f5979..e21d0f05f5 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -91,6 +91,10 @@
 /* Define to 1 if you have the `CRYPTO_lock' function. */
 #undef HAVE_CRYPTO_LOCK
 
+/* Define to 1 if you have the declaration of `accept4', and to 0 if you
+   don't. */
+#undef HAVE_DECL_ACCEPT4
+
 /* Define to 1 if you have the declaration of `fdatasync', and to 0 if you
    don't. */
 #undef HAVE_DECL_FDATASYNC
-- 
2.39.1

From 51772e0b4b47981056d874c5c170ca5e3945c789 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 6 Feb 2023 14:54:03 +1300
Subject: [PATCH v5 2/3] Use newer client socket options where available.

As a recent commit did for the server side, we can now also try to set
client sockets to nonblocking and close-on-exit atomically when creating
the socket.  This uses flags expected in the next POSIX revision, but
already widely available on common platforms.

This saves a couple of fcntl() calls, and plugs a tiny window for
resource leaks in multi-threaded programs that might fork-and-exec while
another thread would otherwise have to run socket() and then fcntl().

Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 src/interfaces/libpq/fe-connect.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 8f80c35c89..c5907eb786 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2464,6 +2464,7 @@ keep_going:						/* We will come back to here until there is
 				{
 					struct addrinfo *addr_cur = conn->addr_cur;
 					char		host_addr[NI_MAXHOST];
+					int			sock_type;
 
 					/*
 					 * Advance to next possible host, if we've tried all of
@@ -2494,7 +2495,23 @@ keep_going:						/* We will come back to here until there is
 						conn->connip = strdup(host_addr);
 
 					/* Try to create the socket */
-					conn->sock = socket(addr_cur->ai_family, SOCK_STREAM, 0);
+					sock_type = SOCK_STREAM;
+#ifdef SOCK_CLOEXEC
+					/*
+					 * Atomically mark close-on-exec, if possible on this
+					 * platform, for the benefit of multi-threaded programs.
+					 * Otherwise see below.
+					 */
+					sock_type |= SOCK_CLOEXEC;
+#endif
+#ifdef SOCK_NONBLOCK
+					/*
+					 * We might as well skip a system call for nonblocking
+					 * mode too, if we can.
+					 */
+					sock_type |= SOCK_NONBLOCK;
+#endif
+					conn->sock = socket(addr_cur->ai_family, sock_type, 0);
 					if (conn->sock == PGINVALID_SOCKET)
 					{
 						int			errorno = SOCK_ERRNO;
@@ -2540,6 +2557,7 @@ keep_going:						/* We will come back to here until there is
 							goto keep_going;
 						}
 					}
+#ifndef SOCK_NONBLOCK
 					if (!pg_set_noblock(conn->sock))
 					{
 						libpq_append_conn_error(conn, "could not set socket to nonblocking mode: %s",
@@ -2547,7 +2565,9 @@ keep_going:						/* We will come back to here until there is
 						conn->try_next_addr = true;
 						goto keep_going;
 					}
+#endif
 
+#ifndef SOCK_CLOEXEC
 #ifdef F_SETFD
 					if (fcntl(conn->sock, F_SETFD, FD_CLOEXEC) == -1)
 					{
@@ -2557,6 +2577,7 @@ keep_going:						/* We will come back to here until there is
 						goto keep_going;
 					}
 #endif							/* F_SETFD */
+#endif
 
 					if (addr_cur->ai_family != AF_UNIX)
 					{
-- 
2.39.1

From 92b0b27733f0f22e230a9139657d9b432ebf6ee0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 21 Feb 2023 11:50:08 +1300
Subject: [PATCH v5 3/3] Use pipe2() for postmaster pipe, where available.

The 2-argument variant of pipe() (expected in the next POSIX revision)
saves the need to make a separate fcntl() call later.  It's already
available on all targeted Unixes except macOS and AIX.

Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 configure                           | 12 ++++++++++++
 configure.ac                        |  1 +
 meson.build                         |  1 +
 src/backend/postmaster/postmaster.c | 10 ++++++++++
 src/backend/utils/init/miscinit.c   |  9 ++++++++-
 src/include/pg_config.h.in          |  4 ++++
 6 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index d06f64cdbb..0aff9f5732 100755
--- a/configure
+++ b/configure
@@ -16231,6 +16231,18 @@ cat >>confdefs.h <<_ACEOF
 #define HAVE_DECL_ACCEPT4 $ac_have_decl
 _ACEOF
 
+ac_fn_c_check_decl "$LINENO" "pipe2" "ac_cv_have_decl_pipe2" "#include <unistd.h>
+"
+if test "x$ac_cv_have_decl_pipe2" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_PIPE2 $ac_have_decl
+_ACEOF
+
 ac_fn_c_check_decl "$LINENO" "preadv" "ac_cv_have_decl_preadv" "#include <sys/uio.h>
 "
 if test "x$ac_cv_have_decl_preadv" = xyes; then :
diff --git a/configure.ac b/configure.ac
index 070a0b33db..d75eb76897 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1844,6 +1844,7 @@ AC_CHECK_DECLS([strlcat, strlcpy, strnlen])
 # We can't use AC_REPLACE_FUNCS to replace these functions, because it
 # won't handle deployment target restrictions on macOS
 AC_CHECK_DECLS([accept4], [], [], [#include <sys/socket.h>])
+AC_CHECK_DECLS([pipe2], [], [], [#include <unistd.h>])
 AC_CHECK_DECLS([preadv], [], [AC_LIBOBJ(preadv)], [#include <sys/uio.h>])
 AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include <sys/uio.h>])
 
diff --git a/meson.build b/meson.build
index c91fb05133..ad2faefff4 100644
--- a/meson.build
+++ b/meson.build
@@ -2098,6 +2098,7 @@ decl_checks = [
 # restrictions on macOS
 decl_checks += [
   ['accept4', 'sys/socket.h'],
+  ['pipe2', 'unistd.h'],
   ['preadv', 'sys/uio.h'],
   ['pwritev', 'sys/uio.h'],
 ]
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2552327d90..fdcfdfd558 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -6506,12 +6506,22 @@ InitPostmasterDeathWatchHandle(void)
 	 * close the write end as soon as possible after forking, because EOF
 	 * won't be signaled in the read end until all processes have closed the
 	 * write fd. That is taken care of in ClosePostmasterPorts().
+	 *
+	 * If this platform has pipe2(), and we're not in an EXEC_BACKEND build,
+	 * then we can avoid a later fcntl() call by asking for O_CLOEXEC now.
 	 */
 	Assert(MyProcPid == PostmasterPid);
+#if HAVE_DECL_PIPE2 && !defined(EXEC_BACKEND)
+	if (pipe2(postmaster_alive_fds, O_CLOEXEC) < 0)
+		ereport(FATAL,
+				(errcode_for_file_access(),
+				 errmsg_internal("could not create pipe to monitor postmaster death: %m")));
+#else
 	if (pipe(postmaster_alive_fds) < 0)
 		ereport(FATAL,
 				(errcode_for_file_access(),
 				 errmsg_internal("could not create pipe to monitor postmaster death: %m")));
+#endif
 
 	/* Notify fd.c that we've eaten two FDs for the pipe. */
 	ReserveExternalFD();
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 7eb7fe87f6..f706b73e91 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -164,13 +164,20 @@ InitPostmasterChild(void)
 	/* Request a signal if the postmaster dies, if possible. */
 	PostmasterDeathSignalInit();
 
-	/* Don't give the pipe to subprograms that we execute. */
+	/*
+	 * Don't give the pipe to subprograms that we execute.  We only need to
+	 * do this explicitly here if the platform lacks pipe2(), or we're in an
+	 * EXEC_BACKEND build so the pipe had to survive the first level of
+	 * exec*() to get here.
+	 */
 #ifndef WIN32
+#if !HAVE_DECL_PIPE2 || defined(EXEC_BACKEND)
 	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFD, FD_CLOEXEC) < 0)
 		ereport(FATAL,
 				(errcode_for_socket_access(),
 				 errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m")));
 #endif
+#endif
 }
 
 /*
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index e21d0f05f5..5cd5acf98e 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -123,6 +123,10 @@
    to 0 if you don't. */
 #undef HAVE_DECL_LLVMORCGETSYMBOLADDRESSIN
 
+/* Define to 1 if you have the declaration of `pipe2', and to 0 if you don't.
+   */
+#undef HAVE_DECL_PIPE2
+
 /* Define to 1 if you have the declaration of `posix_fadvise', and to 0 if you
    don't. */
 #undef HAVE_DECL_POSIX_FADVISE
-- 
2.39.1

Reply via email to