Re: File descriptors in exec'd subprocesses

2023-03-17 Thread Thomas Munro
I pushed the libpq changes.  I'll leave the pipe2 and accept4 changes
on ice for now, maybe for a later cycle (unlike the committed patches,
they don't currently fix a known problem, they just avoid some
syscalls that are already fairly rare).  For the libpq change, the
build farm seems happy so far.  I was a little worried that there
could be ways that #ifdef SOCK_CLOEXEC could be true for a build that
might encounter a too-old kernel and break, but it looks like you'd
have to go so far back into EOL'd releases that even our zombie build
farm animals have it.  Only macOS and AIX don't have it yet, and this
should be fine with Apple's availability guards, which leaves just
AIX.  (AFAIK headers and kernels are always in sync at the major
version level on AIX, but if not presumably there'd have to be some
similar guard system?)




Re: File descriptors in exec'd subprocesses

2023-03-02 Thread Thomas Munro
On Thu, Mar 2, 2023 at 9:57 AM Thomas Munro  wrote:
> On Thu, Mar 2, 2023 at 9:49 AM Gregory Stark (as CFM)
>  wrote:
> > On Mon, 20 Feb 2023 at 23:04, Thomas Munro  wrote:
> > > Done like that in this version.  This is the version I'm thinking of
> > > committing, unless someone wants to argue for another level.
> >
> > FWIW the cfbot doesn't understand this patch series. I'm not sure why
> > but it's only trying to apply the first (the MacOS one) and it's
> > failing to apply even that.
>
> Ah, it's because I committed one patch in the series.  I'll commit one
> more, and then repost the rest, shortly.

I pushed the main patch, "Don't leak descriptors into subprograms.".
Here's a rebase of the POSIX-next stuff, but I'll sit on these for a
bit longer to see if the build farm agrees with my claim about the
ubiquity of O_CLOEXEC, and if anyone has comments on this stuff.
From 1d03c0b2811b53a20eeff781918ef99d4c9b9df9 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 6 Feb 2023 13:51:19 +1300
Subject: [PATCH v5 1/3] Use accept4() to accept connections, where available.

The postmaster previously accepted connections and then made extra
system calls in child processes to make them non-blocking and (as of a
recent commit) close-on-exit.

On all target Unixes except macOS and AIX, that can be done atomically
with flags to the newer accept4() variant (expected in the next POSIX
revision, but around in the wild for many years now), saving a couple of
system calls.

The exceptions are Windows, where we didn't have to worry about that
problem anyway, EXEC_BACKEND builds on Unix (used by developers only for
slightly-more-like-Windows testing), and the straggler Unixes that
haven't implemented it yet (at the time of writing: macOS and AIX).

Suggested-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 configure  | 12 
 configure.ac   |  1 +
 meson.build|  1 +
 src/backend/libpq/pqcomm.c | 34 +++---
 src/include/pg_config.h.in |  4 
 5 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index e35769ea73..d06f64cdbb 100755
--- a/configure
+++ b/configure
@@ -16219,6 +16219,18 @@ _ACEOF
 
 # We can't use AC_REPLACE_FUNCS to replace these functions, because it
 # won't handle deployment target restrictions on macOS
+ac_fn_c_check_decl "$LINENO" "accept4" "ac_cv_have_decl_accept4" "#include 
+"
+if test "x$ac_cv_have_decl_accept4" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_ACCEPT4 $ac_have_decl
+_ACEOF
+
 ac_fn_c_check_decl "$LINENO" "preadv" "ac_cv_have_decl_preadv" "#include 
 "
 if test "x$ac_cv_have_decl_preadv" = xyes; then :
diff --git a/configure.ac b/configure.ac
index af23c15cb2..070a0b33db 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1843,6 +1843,7 @@ AC_CHECK_DECLS([strlcat, strlcpy, strnlen])
 
 # We can't use AC_REPLACE_FUNCS to replace these functions, because it
 # won't handle deployment target restrictions on macOS
+AC_CHECK_DECLS([accept4], [], [], [#include ])
 AC_CHECK_DECLS([preadv], [], [AC_LIBOBJ(preadv)], [#include ])
 AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include ])
 
diff --git a/meson.build b/meson.build
index 26be83afb6..c91fb05133 100644
--- a/meson.build
+++ b/meson.build
@@ -2097,6 +2097,7 @@ decl_checks = [
 # checking for library symbols wouldn't handle deployment target
 # restrictions on macOS
 decl_checks += [
+  ['accept4', 'sys/socket.h'],
   ['preadv', 'sys/uio.h'],
   ['pwritev', 'sys/uio.h'],
 ]
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index da5bb5fc5d..8b97d1a7e3 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -184,6 +184,14 @@ pq_init(void)
 	/* set up process-exit hook to close the socket */
 	on_proc_exit(socket_close, 0);
 
+#ifndef WIN32
+	/*
+	 * On most systems, the socket has already been made non-blocking and
+	 * close-on-exec by StreamConnection().  On systems that don't have
+	 * accept4() yet, and EXEC_BACKEND builds that need the socket to survive
+	 * exec*(), we do that separately here in the child process.
+	 */
+#if !HAVE_DECL_ACCEPT4 || defined(EXEC_BACKEND)
 	/*
 	 * In backends (as soon as forked) we operate the underlying socket in
 	 * nonblocking mode and use latches to implement blocking semantics if
@@ -194,17 +202,14 @@ pq_init(void)
 	 * the client, which might require changing the mode again, leading to
 	 * infinite recursion.
 	 */
-#ifndef WIN32
 	if (!pg_set_noblock(MyProcPort->sock))
 		ereport(COMMERROR,
 (errmsg("could not set socket to nonblocking mode: %m")));
-#endif
-
-#ifndef WIN32
 
 	/* Don't give the socket to any subprograms we execute. */
 	if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0)
 		elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
+#endif
 #endif
 
 	FeBeWaitSet = 

Re: File descriptors in exec'd subprocesses

2023-03-01 Thread Thomas Munro
On Thu, Mar 2, 2023 at 9:49 AM Gregory Stark (as CFM)
 wrote:
> On Mon, 20 Feb 2023 at 23:04, Thomas Munro  wrote:
> >
> > Done like that in this version.  This is the version I'm thinking of
> > committing, unless someone wants to argue for another level.
>
> FWIW the cfbot doesn't understand this patch series. I'm not sure why
> but it's only trying to apply the first (the MacOS one) and it's
> failing to apply even that.

Ah, it's because I committed one patch in the series.  I'll commit one
more, and then repost the rest, shortly.




Re: File descriptors in exec'd subprocesses

2023-03-01 Thread Gregory Stark (as CFM)
On Mon, 20 Feb 2023 at 23:04, Thomas Munro  wrote:
>
> Done like that in this version.  This is the version I'm thinking of
> committing, unless someone wants to argue for another level.

FWIW the cfbot doesn't understand this patch series. I'm not sure why
but it's only trying to apply the first (the MacOS one) and it's
failing to apply even that.


-- 
Gregory Stark
As Commitfest Manager




Re: File descriptors in exec'd subprocesses

2023-02-20 Thread Thomas Munro
Something bothered me about the previous versions:  Which layer should
add O_CLOEXEC, given that we have md.c -> PathNameOpenXXX() ->
BasicOpenFile() -> open()?  Previously I had md.c adding it, but on
reflection, it makes no sense to open a "File" (virtual file
descriptor) that is *not* O_CLOEXEC.  The client doesn't really know
when the raw descriptor is currently open and shouldn't really access
it; it would be strange to want it to survive a call to exec*().  I
think the 'highest' level API that we could consider requiring
O_CLOEXEC to be passed in explicitly, or not, is BasicOpenFile().
Done like that in this version.  This is the version I'm thinking of
committing, unless someone wants to argue for another level.

Another good choice would be to do it inside BasicOpenFile(), and then
the patch would be smaller again (xlog.c wouldn't need to mention it,
and there would perhaps be less risk that some long-lived descriptor
somewhere else has failed to request it), but perhaps that would be
presumptuous.  That function returns raw descriptors, and the caller,
perhaps an extension, might legitimately want to make an inheritable
descriptor for some reason, I guess?  Does anyone think I should move
it in there instead?

I realised that if we're going to use accept4() to cut down on
syscalls, we could also do the same for the postmaster pipe with
pipe2().

Here also is a tiny archeological cleanup to avoid creating
contradictory claims about whether all computers have O_CLOEXEC.

I toyed with the idea of a tiny Linux-only regression test using "COPY
fds FROM PROGRAM 'ls /proc/self/fd'" expecting 0, 1, 2, 3 (3 being
ls's opendir()), but that's probably a little too cute; and also
showed me that pg_regress.c leaks its log file, the fix for which is
to add "e" to its fdopen(), but that's another POSIX-next feature[1]
that seems a little harder to detect, and I gave up on that.

[1] https://wiki.freebsd.org/AtomicCloseOnExec
From 86bf540c917b49a60ecd7f170ee5e8f1c7df6802 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 21 Feb 2023 14:47:19 +1300
Subject: [PATCH v4 1/5] Remove obsolete coding for macOS < 10.7.

Commits 04cad8f7 and 0c088568 supported old macOS systems that didn't
define O_CLOEXEC or O_DSYNC yet, but those arrived in macOS releases
10.7 and 10.6 (respectively), which reached EOL around a decade ago.
We've already made use of other POSIX features that early macOS vintages
can't compile (for example commits 623cc673, d2e15083).

A proposed future commit will use O_CLOEXEC on POSIX systems so it would
seem strange to pretend here that it's optional, and we might as well
give O_DSYNC the same treatment since the reference is also guarded by a
test for a macOS-specific macro, and we know that current Macs have it.

Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 src/backend/storage/file/fd.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 926d000f2e..ea690f05c6 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1025,7 +1025,9 @@ tryAgain:
 	 */
 	StaticAssertStmt((PG_O_DIRECT &
 	  (O_APPEND |
+	   O_CLOEXEC |
 	   O_CREAT |
+	   O_DSYNC |
 	   O_EXCL |
 	   O_RDWR |
 	   O_RDONLY |
@@ -1033,15 +1035,6 @@ tryAgain:
 	   O_TRUNC |
 	   O_WRONLY)) == 0,
 	 "PG_O_DIRECT value collides with standard flag");
-#if defined(O_CLOEXEC)
-	StaticAssertStmt((PG_O_DIRECT & O_CLOEXEC) == 0,
-	 "PG_O_DIRECT value collides with O_CLOEXEC");
-#endif
-#if defined(O_DSYNC)
-	StaticAssertStmt((PG_O_DIRECT & O_DSYNC) == 0,
-	 "PG_O_DIRECT value collides with O_DSYNC");
-#endif
-
 	fd = open(fileName, fileFlags & ~PG_O_DIRECT, fileMode);
 #else
 	fd = open(fileName, fileFlags, fileMode);
-- 
2.39.1

From 1f8265b2971ae7bfc686aafb9d3fdeae2a0dcc1d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 20 Feb 2023 23:26:36 +1300
Subject: [PATCH v4 2/5] Don't leak descriptors into subprograms.

Open long-lived data and WAL file descriptors with O_CLOEXEC.  This flag
was introduced by SUSv4 (POSIX.1-2008), and by now all of our target
Unix systems have it.  Our open() implementation for Windows already had
that behavior, so provide a dummy O_CLOEXEC flag on that platform.

Do the same for sockets and the postmaster pipe with FD_CLOEXEC.  Later
commits might use modern interfaces to remove these extra fcntl() calls
where possible, but we'll need them as a fallback for a couple of
systems, so do it that way in this initial commit.

With this change, subprograms executed for archiving, copying etc will
no longer have access to the server's descriptors, other than the ones
that we decide to pass down.

Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 9 ++---

Re: File descriptors in exec'd subprocesses

2023-02-20 Thread Thomas Munro
I had missed one: the "watch" end of the postmaster pipe also needs FD_CLOEXEC.
From 36f8ed2406307f8b1578ae85510c5a07718e1ea8 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 20 Feb 2023 23:26:36 +1300
Subject: [PATCH v3 1/3] Don't leak descriptors into subprograms.

Open data and WAL files with O_CLOEXEC, from SUSv4 (POSIX.1-2008).  All
of our target systems have it, except Windows.  Our open()
implementation on Windows already has that behavior, so provide a dummy
O_CLOEXEC flag on that platform.

Do the same for sockets with FD_CLOEXEC.  (Later commits might improve
this with SOCK_CLOEXEC on systems that have it, but we'll need the
FD_CLOEXEC path as a fallback anyway, so do it this way in this initial
commit.)

Likewise for the postmaster death pipe.

This means that programs executed by COPY and archiving commands etc,
will not inherit access to those descriptors.

Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c |  9 ++---
 src/backend/libpq/pqcomm.c| 10 ++
 src/backend/storage/file/fd.c |  2 --
 src/backend/storage/smgr/md.c |  9 +
 src/backend/utils/init/miscinit.c |  8 
 src/include/port/win32_port.h |  7 +++
 6 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9f0f6db8d..87af608d15 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2936,7 +2936,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 * Try to use existent file (checkpoint maker may have created it already)
 	 */
 	*added = false;
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+	   get_sync_bit(sync_method));
 	if (fd < 0)
 	{
 		if (errno != ENOENT)
@@ -3097,7 +3098,8 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
 		return fd;
 
 	/* Now open original target segment (might not be file I just made) */
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+	   get_sync_bit(sync_method));
 	if (fd < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
@@ -3328,7 +3330,8 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli)
 
 	XLogFilePath(path, tli, segno, wal_segment_size);
 
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+	   get_sync_bit(sync_method));
 	if (fd < 0)
 		ereport(PANIC,
 (errcode_for_file_access(),
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 864c9debe8..0ccec33af4 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -200,6 +200,16 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
+#ifndef WIN32
+
+	/*
+	 * Also make sure that any subprocesses that execute a new program don't
+	 * inherit the socket.
+	 */
+	if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0)
+		elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
+#endif
+
 	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, FeBeWaitSetNEvents);
 	socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
    MyProcPort->sock, NULL, NULL);
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 926d000f2e..933f17b398 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1033,10 +1033,8 @@ tryAgain:
 	   O_TRUNC |
 	   O_WRONLY)) == 0,
 	 "PG_O_DIRECT value collides with standard flag");
-#if defined(O_CLOEXEC)
 	StaticAssertStmt((PG_O_DIRECT & O_CLOEXEC) == 0,
 	 "PG_O_DIRECT value collides with O_CLOEXEC");
-#endif
 #if defined(O_DSYNC)
 	StaticAssertStmt((PG_O_DIRECT & O_DSYNC) == 0,
 	 "PG_O_DIRECT value collides with O_DSYNC");
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 8da813600c..2070eba4a2 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -205,14 +205,15 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 
 	path = relpath(reln->smgr_rlocator, forknum);
 
-	fd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
+	fd = PathNameOpenFile(path,
+		  O_RDWR | O_CREAT | O_EXCL | PG_BINARY | O_CLOEXEC);
 
 	if (fd < 0)
 	{
 		int			save_errno = errno;
 
 		if (isRedo)
-			fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);
+			fd = PathNameOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC);
 		if (fd < 0)
 		{
 			/* be sure to report the error reported by create, not open */
@@ -523,7 +524,7 @@ mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
 
 	path = relpath(reln->smgr_rlocator, forknum);
 
-	fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);
+	fd = PathNameOpenFile(path, O_RDWR | 

Re: File descriptors in exec'd subprocesses

2023-02-05 Thread Thomas Munro
On Mon, Feb 6, 2023 at 3:29 AM Andres Freund  wrote:
> On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro 
>  wrote:
> >Are there any more descriptors we need to think about?
>
> Postmaster's listen sockets? Saves a bunch of syscalls, at least.

Assuming you mean accepted sockets, yeah, I see how two save two
syscalls there, and since you nerd-sniped me into looking into the
SOCK_CLOEXEC landscape, I like it even more now that I've understood
that accept4() is rubber-stamped for the next revision of POSIX[1] and
is already accepted almost everywhere.  It's not just window dressing,
you need it to write multi-threaded programs that fork/exec without
worrying about the window between fd creation and fcntl(FD_CLOEXEC) in
another thread; hopefully one day we will care about that sort of
thing in some places too!  Here's a separate patch for that.

I *guess* we need HAVE_DECL_ACCEPT4 for the guarded availability
system (cf pwritev) when Apple gets the memo, but see below.  Hard to
say if AIX is still receiving memos (cf recent speculation in the
Register).  All other target OSes seem to have had this stuff for a
while.

Since client connections already do fcntl(FD_CLOEXEC), postgres_fdw
connections didn't have this problem.  It seems reasonable to want to
skip a couple of system calls there too; also, client programs might
also be interested in future-POSIX's atomic race-free close-on-exec
socket fabrication.  So here also is a patch to use SOCK_CLOEXEC on
that end too, if available.

But ... hmph, all we can do here is test for the existence of
SOCK_NONBLOCK and SOCK_CLOEXEC, since there is no new function to test
for.  Maybe we should just assume accept4() also exists if these exist
(it's hard to imagine that Apple or IBM would address atomicity on one
end but not the other of a socket), but predictions are so difficult,
especially about the future!  Anyone want to guess if it's better to
leave the meson/configure probe in for the accept4 end or just roll
with the macros?

> Logging collector pipe write end, in backends?

The write end of the logging pipe is already closed, after dup2'ing it
to STDOUT_FILENO to STDERR_FILENO, so archive commands and suchlike do
receive the handle, but they want them.  It's the intended and
documented behaviour that anything written to that will finish up in
the log.

As for pipe2(O_CLOEXEC), I see the point of it in a multi-threaded
application.  It's not terribly useful for us though, because we
usually want to close only one end, except in the case of the
self-pipe.  But the self-pipe is no longer used on the systems that
have pipe2()-from-the-future.

I haven't tested this under EXEC_BACKEND yet.

[1] https://www.austingroupbugs.net/view.php?id=411
From c9a61ba8eaddb197c31095a163f7efdf2e3ea797 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 5 Feb 2023 12:18:14 +1300
Subject: [PATCH v2 1/3] Don't leak descriptors into subprograms.

Open data and WAL files with O_CLOEXEC (POSIX 2018, SUSv4).  All of our
target systems have it, except Windows.  Windows already has that
behavior, because we wrote our own open() implementation.

Do the same for sockets with FD_CLOEXEC.  (A separate commit will
improve this with SOCK_CLOEXEC on some systems, but we'll need the
FD_CLOEXEC path as a fallback.)

This means that programs executed by COPY and archiving commands etc,
will not be able to mess with those descriptors (of course nothing stops
them from opening files, so this is more about tidyness and preventing
accidental problems than security).

Reviewed-by: Tom Lane 
Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 9 ++---
 src/backend/libpq/pqcomm.c| 9 +
 src/backend/storage/file/fd.c | 2 --
 src/backend/storage/smgr/md.c | 9 +
 src/include/port/win32_port.h | 7 +++
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..3b44a0f237 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2937,7 +2937,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 * Try to use existent file (checkpoint maker may have created it already)
 	 */
 	*added = false;
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+			 get_sync_bit(sync_method));
 	if (fd < 0)
 	{
 		if (errno != ENOENT)
@@ -3098,7 +3099,8 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
 		return fd;
 
 	/* Now open original target segment (might not be file I just made) */
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method));
+	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
+			 get_sync_bit(sync_method));
 	if (fd < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
@@ 

Re: File descriptors in exec'd subprocesses

2023-02-05 Thread Andres Freund
Hi,

On 2023-02-05 11:06:13 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro 
> >  wrote:
> >> Are there any more descriptors we need to think about?
> 
> > Postmaster's listen sockets?
> 
> I wonder whether O_CLOEXEC on that would be inherited by the
> client-communication sockets, though.

I'd be very suprised if it were.



Nope, at least not on linux. Verified by looking at /proc/*/fdinfo/n
after adding SOCK_CLOEXEC to just the socket() call. 'flags' changes
from 02 -> 0202 for the listen socket, but stays at 04002 for the
client socket. If I add SOCK_CLOEXEC to accept() (well, accept4()), it
does change from 04002 to 02004002.

Greetings,

Andres Freund




Re: File descriptors in exec'd subprocesses

2023-02-05 Thread Tom Lane
Andres Freund  writes:
> On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro 
>  wrote:
>> Are there any more descriptors we need to think about?

> Postmaster's listen sockets?

I wonder whether O_CLOEXEC on that would be inherited by the
client-communication sockets, though.  That's fine ... unless you
are doing EXEC_BACKEND.

regards, tom lane




Re: File descriptors in exec'd subprocesses

2023-02-05 Thread Andres Freund
Hi, 

Unsurprisingly I'm in favor of this. 

On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro  
wrote:
>Are there any more descriptors we need to think about?

Postmaster's listen sockets? Saves a bunch of syscalls, at least. Logging 
collector pipe write end, in backends?

Greetings,

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: File descriptors in exec'd subprocesses

2023-02-04 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Feb 5, 2023 at 1:00 PM Thomas Munro  wrote:
>> SUSv3 (POSIX 2008)

> Oh, oops, 2008 actually corresponds to SUSv4.  Hmm.

Worst case, if we come across some allegedly-supported platform without
O_CLOEXEC, we #define that to zero.  Said platform is no worse off
than it was before.

regards, tom lane




Re: File descriptors in exec'd subprocesses

2023-02-04 Thread Thomas Munro
On Sun, Feb 5, 2023 at 1:00 PM Thomas Munro  wrote:
> SUSv3 (POSIX 2008)

Oh, oops, 2008 actually corresponds to SUSv4.  Hmm.