Re: Refactoring backend fork+exec code

2024-05-17 Thread Thomas Munro
On Mon, Mar 18, 2024 at 10:41 PM Heikki Linnakangas  wrote:
> Committed, with some final cosmetic cleanups. Thanks everyone!

Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be
a bit off, from a branch of mine):

../src/backend/postmaster/launch_backend.c:772:2: runtime error: null
pointer passed as argument 2, which is declared to never be null
==13303==Using libbacktrace symbolizer.
#0 0x564b0202 in save_backend_variables
../src/backend/postmaster/launch_backend.c:772
#1 0x564b0242 in internal_forkexec
../src/backend/postmaster/launch_backend.c:311
#2 0x564b0bdd in postmaster_child_launch
../src/backend/postmaster/launch_backend.c:244
#3 0x564b3121 in StartChildProcess
../src/backend/postmaster/postmaster.c:3928
#4 0x564b933a in PostmasterMain
../src/backend/postmaster/postmaster.c:1357
#5 0x562de4ad in main ../src/backend/main/main.c:197
#6 0x7667ad09 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
#7 0x55e34279 in _start
(/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279)

This silences it:

-   memcpy(param->startup_data, startup_data, startup_data_len);
+   if (startup_data_len > 0)
+   memcpy(param->startup_data, startup_data, startup_data_len);

(I found that out by testing EXEC_BACKEND on CI.  I also learned that
the Mac and FreeBSD tasks fail with EXEC_BACKEND because of SysV shmem
bleating.  We probably should go and crank up the relevant sysctls in
the .cirrus.tasks.yml...)




Re: Refactoring backend fork+exec code

2024-05-01 Thread Anton A. Melnikov



On 28.04.2024 22:36, Heikki Linnakangas wrote:

Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.


Didn't check that is already fixed in the current master. Sorry!
Thanks for pointing this out!

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Refactoring backend fork+exec code

2024-04-28 Thread Heikki Linnakangas

On 27/04/2024 11:27, Anton A. Melnikov wrote:

Hello!

Maybe add PGDLLIMPORT to
extern bool LoadedSSL;
and
extern struct ClientSocket *MyClientSocket;
definitions in the src/include/postmaster/postmaster.h ?

Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-04-27 Thread Anton A. Melnikov

Hello!

Maybe add PGDLLIMPORT to
extern bool LoadedSSL;
and
extern struct ClientSocket *MyClientSocket;
definitions in the src/include/postmaster/postmaster.h ?

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Refactoring backend fork+exec code

2024-03-21 Thread Jelte Fennema-Nio
On Wed, 20 Mar 2024 at 08:16, Heikki Linnakangas  wrote:
> Yeah, it's not a very valuable assertion. Removed, thanks!

How about we add it as a static assert instead of removing it, like we
have for many other similar arrays.


v1-0001-Add-child_process_kinds-static-assert.patch
Description: Binary data


Re: Refactoring backend fork+exec code

2024-03-20 Thread Heikki Linnakangas

On 20/03/2024 07:37, Tom Lane wrote:

A couple of buildfarm animals don't like these tests:

Assert(child_type >= 0 && child_type < lengthof(child_process_kinds));

for example

  ayu   | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: 
comparison of constant 16 with expression of type 'BackendType' (aka 'enum 
BackendType') is always true [-Wtautological-constant-out-of-range-compare]
  ayu   | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: 
comparison of constant 16 with expression of type 'BackendType' (aka 'enum 
BackendType') is always true [-Wtautological-constant-out-of-range-compare]

I'm not real sure why it's moaning about the "<" check but not the
">= 0" check, which ought to be equally tautological given the
assumption that the input is a valid member of the enum.  But
in any case, exactly how much value do these assertions carry?
If you're intent on keeping them, perhaps casting child_type to
int here would suppress the warnings.  But personally I think
I'd lose the Asserts.


Yeah, it's not a very valuable assertion. Removed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-03-19 Thread Tom Lane
Heikki Linnakangas  writes:
> Committed, with some final cosmetic cleanups. Thanks everyone!

A couple of buildfarm animals don't like these tests:

Assert(child_type >= 0 && child_type < lengthof(child_process_kinds));

for example

 ayu   | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: 
comparison of constant 16 with expression of type 'BackendType' (aka 'enum 
BackendType') is always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: 
comparison of constant 16 with expression of type 'BackendType' (aka 'enum 
BackendType') is always true [-Wtautological-constant-out-of-range-compare]

I'm not real sure why it's moaning about the "<" check but not the
">= 0" check, which ought to be equally tautological given the
assumption that the input is a valid member of the enum.  But
in any case, exactly how much value do these assertions carry?
If you're intent on keeping them, perhaps casting child_type to
int here would suppress the warnings.  But personally I think
I'd lose the Asserts.

regards, tom lane




Re: Refactoring backend fork+exec code

2024-03-18 Thread Heikki Linnakangas

On 13/03/2024 09:30, Heikki Linnakangas wrote:

Attached is a new version of the remaining patches.


Committed, with some final cosmetic cleanups. Thanks everyone!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-03-05 Thread Tristan Partin

On Mon Mar 4, 2024 at 3:05 AM CST, Heikki Linnakangas wrote:
I've now completed many of the side-quests, here are the patches that 
remain.


The first three patches form a logical unit. They move the 
initialization of the Port struct from postmaster to the backend 
process. Currently, that work is split between the postmaster and the 
backend process so that postmaster fills in the socket and some other 
fields, and the backend process fills the rest after reading the startup 
packet. With these patches, there is a new much smaller ClientSocket 
struct that is passed from the postmaster to the child process, which 
contains just the fields that postmaster initializes. The Port struct is 
allocated in the child process. That makes the backend startup easier to 
understand. I plan to commit those three patches next if there are no 
objections.


That leaves the rest of the patches. I think they're in pretty good 
shape too, and I've gotten some review on those earlier and have 
addressed the comments I got so far, but would still appreciate another 
round of review.



- * *MyProcPort, because ConnCreate() allocated that space with malloc()
- * ... else we'd need to copy the Port data first.  Also, subsidiary 
data
- * such as the username isn't lost either; see ProcessStartupPacket().
+ * *MyProcPort, because that space is allocated in stack ... else we'd
+ * need to copy the Port data first.  Also, subsidiary data such as the
+ * username isn't lost either; see ProcessStartupPacket().


s/allocated in/allocated on the

The first 3 patches seem good to go, in my opinion.


@@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket 
*client_sock, BackgroundW
 return -1;
 }

-/* Make sure caller set up argv properly */
-Assert(argc >= 3);
-Assert(argv[argc] == NULL);
-Assert(strncmp(argv[1], "--fork", 6) == 0);
-Assert(argv[2] == NULL);
-
-/* Insert temp file name after --fork argument */
+/* set up argv properly */
+argv[0] = "postgres";
+snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind);
+argv[1] = forkav;
+/* Insert temp file name after --forkchild argument */
 argv[2] = tmpfilename;
+argv[3] = NULL;


Should we use postgres_exec_path instead of the naked "postgres" here?


+/* in postmaster, fork failed ... */
+ereport(LOG,
+(errmsg("could not fork worker process: %m")));
+/* undo what assign_backendlist_entry did */
+ReleasePostmasterChildSlot(rw->rw_child_slot);
+rw->rw_child_slot = 0;
+pfree(rw->rw_backend);
+rw->rw_backend = NULL;
+/* mark entry as crashed, so we'll try again later */
+rw->rw_crashed_at = GetCurrentTimestamp();
+return false;


I think the error message should include the word "background." It would 
be more consistent with the log message above it.



+typedef struct
+{
+intsyslogFile;
+intcsvlogFile;
+intjsonlogFile;
+} syslogger_startup_data;


It would be nice if all of these startup data structs were named 
similarly. For instance, a previous one was BackendStartupInfo. It would 
help with greppability.


I noticed there were a few XXX comments left that you created. I'll 
highlight them here for more visibility.



+/* XXX: where does this belong? */
+extern bool LoadedSSL;


Perhaps near the My* variables or maybe in the Port struct?


+#ifdef EXEC_BACKEND
+
+/*
+ * Need to reinitialize the SSL library in the backend, since the 
context
+ * structures contain function pointers and cannot be passed through 
the
+ * parameter file.
+ *
+ * If for some reason reload fails (maybe the user installed broken key
+ * files), soldier on without SSL; that's better than all connections
+ * becoming impossible.
+ *
+ * XXX should we do this in all child processes?  For the moment it's
+ * enough to do it in backend children. XXX good question indeed
+ */
+#ifdef USE_SSL
+if (EnableSSL)
+{
+if (secure_initialize(false) == 0)
+LoadedSSL = true;
+else
+ereport(LOG,
+(errmsg("SSL configuration could not be 
loaded in child process")));
+}
+#endif
+#endif


Here you added the "good question indeed." I am not sure what the best 
answer is either! :)



+/* XXX: translation? */
+ereport(LOG,
+(errmsg("could not fork %s process: %m", 
PostmasterChildName(type;


I assume you are referring to the child name 

Re: Refactoring backend fork+exec code

2024-03-05 Thread Heikki Linnakangas

On 05/03/2024 11:44, Richard Guo wrote:

I noticed that there are still three places in backend_status.c where
pgstat_get_beentry_by_backend_id() is referenced.  I think we should
replace them with pgstat_get_beentry_by_proc_number().


Fixed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-03-05 Thread Richard Guo
On Mon, Mar 4, 2024 at 1:40 AM Heikki Linnakangas  wrote:

> On 22/02/2024 02:37, Heikki Linnakangas wrote:
> > Here's another patch version that does that. Yeah, I agree it's nicer in
> > the end.
> >
> > I'm pretty happy with this now. I'll read through these patches myself
> > again after sleeping over it and try to get this committed by the end of
> > the week, but another pair of eyes wouldn't hurt.
>
> And pushed. Thanks for the reviews!


I noticed that there are still three places in backend_status.c where
pgstat_get_beentry_by_backend_id() is referenced.  I think we should
replace them with pgstat_get_beentry_by_proc_number().

Thanks
Richard


Re: Refactoring backend fork+exec code

2024-03-03 Thread Heikki Linnakangas

On 22/02/2024 02:37, Heikki Linnakangas wrote:

On 15/02/2024 07:09, Robert Haas wrote:

On Thu, Feb 15, 2024 at 3:07 AM Andres Freund  wrote:

I think the last remaining question here is about the 0- vs 1-based indexing
of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
it, should we reserve PGPROC 0. I'm on the fence on this one.


I lean towards it being a good idea. Having two internal indexing schemes was
bad enough so far, but at least one would fairly quickly notice if one used
the wrong one. If they're just offset by 1, it might end up taking longer,
because that'll often also be a valid id.


Yeah, I think making everything 0-based is probably the best way
forward long term. It might require more cleanup work to get there,
but it's just a lot simpler in the end, IMHO.


Here's another patch version that does that. Yeah, I agree it's nicer in
the end.

I'm pretty happy with this now. I'll read through these patches myself
again after sleeping over it and try to get this committed by the end of
the week, but another pair of eyes wouldn't hurt.


And pushed. Thanks for the reviews!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-02-14 Thread Robert Haas
On Thu, Feb 15, 2024 at 3:07 AM Andres Freund  wrote:
> > I think the last remaining question here is about the 0- vs 1-based indexing
> > of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
> > it, should we reserve PGPROC 0. I'm on the fence on this one.
>
> I lean towards it being a good idea. Having two internal indexing schemes was
> bad enough so far, but at least one would fairly quickly notice if one used
> the wrong one. If they're just offset by 1, it might end up taking longer,
> because that'll often also be a valid id.

Yeah, I think making everything 0-based is probably the best way
forward long term. It might require more cleanup work to get there,
but it's just a lot simpler in the end, IMHO.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Refactoring backend fork+exec code

2024-02-14 Thread Andres Freund
Hi,

On 2024-02-08 13:19:53 +0200, Heikki Linnakangas wrote:
> > > - /*
> > > -  * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
> > > -  * have a BackendId, the slot is statically allocated based on the
> > > -  * auxiliary process type (MyAuxProcType).  Backends use slots indexed 
> > > in
> > > -  * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
> > > -  * AuxProcType + 1 as the index of the slot for an auxiliary process.
> > > -  *
> > > -  * This will need rethinking if we ever want more than one of a 
> > > particular
> > > -  * auxiliary process type.
> > > -  */
> > > - ProcSignalInit(MaxBackends + MyAuxProcType + 1);
> > > + ProcSignalInit();
> >
> > Now that we don't need the offset here, we could move ProcSignalInit() into
> > BsaeInit() I think?
>
> Hmm, doesn't feel right to me. BaseInit() is mostly concerned with setting
> up backend-private structures, and it's also called for a standalone
> backend.

It already initializes a lot of shared subsystems (pgstat, replication slots
and arguable things like the buffer pool, temporary file access and WAL). And
note that it already requires that MyProc is already set (but it's not yet
"added" to the procarray, i.e. doesn't do visibility stuff at that stage).

I don't think that BaseInit() being called by standalone backends really poses
a problem? So is InitPostgres(), which does call ProcSignalInit() in
standalone processes.

My mental model is that BaseInit() is for stuff that's shared between
processes that do attach to databases and those that don't. Right now the
initialization flow is something like this ascii diagram:

standalone: \   
 /-> StartupXLOG() \
 -> InitProcess()  -\ /-> ProcArrayAdd() -> 
SharedInvalBackendInit() -> ProcSignalInit()-   -> 
pgstat_beinit() -> attach to db -> pgstat_bestart()
normal backend: /\   /
  -> BaseInit() -
aux process:InitAuxiliaryProcess()  -/   \--
 -> ProcSignalInit()-> 
pgstat_beinit() -> pgstat_bestart()


The only reason ProcSignalInit() happens kinda late is that historically we
used BackendIds as the index, which were only assigned in
SharedInvalBackendInit() for normal processes. But that doesn't make sense
anymore after your changes.

Similarly, we do pgstat_beinit() quite late, but that's again only because it
uses MyBackendId, which today is only assigned during
SharedInvalBackendInit().  I don't think we can do pgstat_bestart() earlier
though, which is a shame, given the four calls to it inside InitPostgres().


> I feel the process initialization codepaths could use some cleanup in
> general. Not sure what exactly.

Very much agreed.


> > > +/*
> > > + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
> > > + *
> > > + * The result may be out of date arbitrarily quickly, so the caller
> > > + * must be careful about how this information is used.  NULL is
> > > + * returned if the backend is not active.
> > > + */
> > > +PGPROC *
> > > +BackendIdGetProc(int backendID)
> > > +{
> > > + PGPROC *result;
> > > +
> > > + if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> > > + return NULL;
> >
> > Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
> > about being out of date or such.
>
> Perhaps. I just followed the example of the old implementation, which also
> returns NULL on bogus inputs.

Fair enough. Makes it harder to not notice bugs, but that's not on this 
patchset to fix...



> I think the last remaining question here is about the 0- vs 1-based indexing
> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
> it, should we reserve PGPROC 0. I'm on the fence on this one.

I lean towards it being a good idea. Having two internal indexing schemes was
bad enough so far, but at least one would fairly quickly notice if one used
the wrong one. If they're just offset by 1, it might end up taking longer,
because that'll often also be a valid id.  But I think you have the author's
prerogative on this one.

If we do so, I think it might be better to standardize on MyProcNumber instead
of MyBackendId. That'll force looking at code where indexing shifts by 1 - and
it also seems more descriptive, as inside postgres it's imo clearer what a
"proc number" is than what a "backend id" is. Particularly because the latter
is also used for things that aren't backends...


The only exception are SQL level users, for those I think it might make sense
to keep the current 1 based indexing, there's just a few functions where we'd
need to translate.



> @@ -791,6 +792,7 @@ ProcArrayEndTransactionInternal(PGPROC *proc, 
> TransactionId latestXid)
>  static 

Re: Refactoring backend fork+exec code

2024-02-08 Thread Heikki Linnakangas

On 07/02/2024 20:25, Andres Freund wrote:

On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote:

 From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Jan 2024 23:15:55 +0200
Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC

It was always just the index of the PGPROC entry from the beginning of
the proc array. Introduce a macro to compute it from the pointer
instead.


Hm. The pointer math here is bit more expensive than in some other cases, as
the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding
more math into loops like in TransactionGroupUpdateXidStatus() might end up
showing up.


I added a MyProcNumber global variable that is set to 
GetNumberFromPGProc(MyProc). I'm not really concerned about the extra 
math, but with MyProcNumber it should definitely not be an issue. The 
few GetNumberFromPGProc() invocations that remain are in less 
performance-critical paths.


(In later patch, I switch backend ids to 0-based indexing, which 
replaces MyProcNumber references with MyBackendId)



Is this really related to the rest of the series?


It's not strictly necessary, but it felt prudent to remove it now, since 
I'm removing the backendID field too.



You can now convert a backend ID into an index into the PGPROC array
simply by subtracting 1. We still use 0-based "pgprocnos" in various
places, for indexes into the PGPROC array, but the only difference now
is that backend IDs start at 1 while pgprocnos start at 0.


Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so
there'd not be a conflict, right?


Correct. I was being conservative and didn't dare to change the old 
convention. The backend ids are visible in a few places like "pg_temp_0" 
schema names, and pg_stat_get_*() functions.


One alternative would be to reserve and waste allProcs[0]. Then pgprocno 
and backend ID could both be direct indexes to the array, but 0 would 
not be used.


If we switch to 0-based indexing, it begs the question: why don't we 
merge the concepts of "pgprocno" and "BackendId" completely and call it 
the same thing everywhere? It probably would be best in the long run. It 
feels like a lot of churn though.


Anyway, I switched to 0-based indexing in the attached new version, to 
see what it looks like.



@@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId 
xid, const char *gid,
Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));

Assert(gxact != NULL);
-   proc = >allProcs[gxact->pgprocno];
+   proc = GetPGProcByNumber(gxact->pgprocno);

/* Initialize the PGPROC entry */
MemSet(proc, 0, sizeof(PGPROC));


This set of changes is independent of this commit, isn't it?


Yes. It's just for symmetry, now that we use GetNumberFromPGProc() to 
get the pgprocno.



diff --git a/src/backend/postmaster/auxprocess.c 
b/src/backend/postmaster/auxprocess.c
index ab86e802f21..39171fea06b 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)

BaseInit();

-   /*
-* Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
-* have a BackendId, the slot is statically allocated based on the
-* auxiliary process type (MyAuxProcType).  Backends use slots indexed 
in
-* the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
-* AuxProcType + 1 as the index of the slot for an auxiliary process.
-*
-* This will need rethinking if we ever want more than one of a 
particular
-* auxiliary process type.
-*/
-   ProcSignalInit(MaxBackends + MyAuxProcType + 1);
+   ProcSignalInit();


Now that we don't need the offset here, we could move ProcSignalInit() into
BsaeInit() I think?


Hmm, doesn't feel right to me. BaseInit() is mostly concerned with 
setting up backend-private structures, and it's also called for a 
standalone backend.


I feel the process initialization codepaths could use some cleanup in 
general. Not sure what exactly.



+/*
+ * BackendIdGetProc -- get a backend's PGPROC given its backend ID
+ *
+ * The result may be out of date arbitrarily quickly, so the caller
+ * must be careful about how this information is used.  NULL is
+ * returned if the backend is not active.
+ */
+PGPROC *
+BackendIdGetProc(int backendID)
+{
+   PGPROC *result;
+
+   if (backendID < 1 || backendID > ProcGlobal->allProcCount)
+   return NULL;


Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
about being out of date or such.


Perhaps. I just followed the example of the old implementation, which 
also returns NULL on bogus inputs.



+/*
+ * BackendIdGetTransactionIds -- get a backend's transaction status
+ *
+ * Get the xid, xmin, nsubxid and overflow status of the backend.  The
+ * 

Re: Refactoring backend fork+exec code

2024-02-07 Thread Andres Freund
Hi,

On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote:
> I spent some more time on the 'lastBackend' optimization in sinvaladt.c. I
> realized that it became very useless with these patches, because aux
> processes are allocated pgprocno's after all the slots for regular backends.
> There are always aux processes running, so lastBackend would always have a
> value close to the max anyway. I replaced that with a dense 'pgprocnos'
> array that keeps track of the exact slots that are in use. I'm not 100% sure
> this is worth the effort; does any real world workload send shared
> invalidations so frequently that this matters? In any case, this should
> avoid the regression if such a workload exists.
>
> New patch set attached. I think these are ready to be committed, but would
> appreciate a final review.


> From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Wed, 24 Jan 2024 23:15:55 +0200
> Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC
>
> It was always just the index of the PGPROC entry from the beginning of
> the proc array. Introduce a macro to compute it from the pointer
> instead.

Hm. The pointer math here is bit more expensive than in some other cases, as
the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding
more math into loops like in TransactionGroupUpdateXidStatus() might end up
showing up.

I've been thinking that we likely should pad PGPROC to some more convenient
boundary, but...


Is this really related to the rest of the series?


> From 4e0121e064804b73ef8a5dc10be27b85968ea1af Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Mon, 29 Jan 2024 23:50:34 +0200
> Subject: [PATCH v8 2/5] Redefine backend ID to be an index into the proc
>  array.
>
> Previously, backend ID was an index into the ProcState array, in the
> shared cache invalidation manager (sinvaladt.c). The entry in the
> ProcState array was reserved at backend startup by scanning the array
> for a free entry, and that was also when the backend got its backend
> ID. Things becomes slightly simpler if we redefine backend ID to be
> the index into the PGPROC array, and directly use it also as an index
> to the ProcState array. This uses a little more memory, as we reserve
> a few extra slots in the ProcState array for aux processes that don't
> need them, but the simplicity is worth it.

> Aux processes now also have a backend ID. This simplifies the
> reservation of BackendStatusArray and ProcSignal slots.
>
> You can now convert a backend ID into an index into the PGPROC array
> simply by subtracting 1. We still use 0-based "pgprocnos" in various
> places, for indexes into the PGPROC array, but the only difference now
> is that backend IDs start at 1 while pgprocnos start at 0.

Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so
there'd not be a conflict, right?


> One potential downside of this patch is that the ProcState array might
> get less densely packed, as we we don't try so hard to assign
> low-numbered backend ID anymore. If it's less densely packed,
> lastBackend will stay at a higher value, and SIInsertDataEntries() and
> SICleanupQueue() need to scan over more unused entries. I think that's
> fine. They are performance critical enough to matter, and there was no
> guarantee on dense packing before either: If you launched a lot of
> backends concurrently, and kept the last one open, lastBackend would
> also stay at a high value.

It's perhaps worth noting here that there's a future patch that also addresses
this to some degree?


> @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, 
> TransactionId xid, const char *gid,
>   Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));
>
>   Assert(gxact != NULL);
> - proc = >allProcs[gxact->pgprocno];
> + proc = GetPGProcByNumber(gxact->pgprocno);
>
>   /* Initialize the PGPROC entry */
>   MemSet(proc, 0, sizeof(PGPROC));

This set of changes is independent of this commit, isn't it?


> diff --git a/src/backend/postmaster/auxprocess.c 
> b/src/backend/postmaster/auxprocess.c
> index ab86e802f21..39171fea06b 100644
> --- a/src/backend/postmaster/auxprocess.c
> +++ b/src/backend/postmaster/auxprocess.c
> @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)
>
>   BaseInit();
>
> - /*
> -  * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
> -  * have a BackendId, the slot is statically allocated based on the
> -  * auxiliary process type (MyAuxProcType).  Backends use slots indexed 
> in
> -  * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
> -  * AuxProcType + 1 as the index of the slot for an auxiliary process.
> -  *
> -  * This will need rethinking if we ever want more than one of a 
> particular
> -  * auxiliary process type.
> -  */
> - ProcSignalInit(MaxBackends + MyAuxProcType + 1);
> 

Re: Refactoring backend fork+exec code

2024-02-01 Thread Heikki Linnakangas

On 30/01/2024 02:08, Heikki Linnakangas wrote:

On 29/01/2024 17:54, reid.thomp...@crunchydata.com wrote:

On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:


And here we go. BackendID is now a 1-based index directly into the
PGPROC array.


Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
enum table?


Yeah, that might be in order. Although looking closer, it's only used in
IsAuxProcess, which is only used in one sanity check in
AuxProcessMain(). And even that gets refactored away by the later
patches in this thread. So on second thoughts, I'll just remove it
altogether.

I spent some more time on the 'lastBackend' optimization in sinvaladt.c.
I realized that it became very useless with these patches, because aux
processes are allocated pgprocno's after all the slots for regular
backends. There are always aux processes running, so lastBackend would
always have a value close to the max anyway. I replaced that with a
dense 'pgprocnos' array that keeps track of the exact slots that are in
use. I'm not 100% sure this is worth the effort; does any real world
workload send shared invalidations so frequently that this matters? In
any case, this should avoid the regression if such a workload exists.

New patch set attached. I think these are ready to be committed, but
would appreciate a final review.


contrib/amcheck 003_cic_2pc.pl test failures revealed a bug that 
required some reworking:


In a PGPROC entry for a prepared xact, the PGPROC's backendID needs to 
be the original backend's ID, because the prepared xact is holding the 
lock on the original virtual transaction id. When a transaction's 
ownership is moved from the original backend's PGPROC entry to the 
prepared xact PGPROC entry, the backendID needs to be copied over. My 
patch removed the field altogether, so it was not copied over, which 
made it look like it the original VXID lock was released at prepare.


I fixed that by adding back the backendID field. For regular backends, 
it's always equal to pgprocno + 1, but for prepared xacts, it's the 
original backend's ID. To make that less confusing, I moved the 
backendID and lxid fields together under a 'vxid' struct. The two fields 
together form the virtual transaction ID, and that's the only context 
where the 'backendID' field should now be looked at.


I also squashed the 'lastBackend' changes in sinvaladt.c to the main patch.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 96c583b32db843fb07d38fd78f1e205882a78b01 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Jan 2024 23:15:55 +0200
Subject: [PATCH v9 1/4] Remove superfluous 'pgprocno' field from PGPROC

It was always just the index of the PGPROC entry from the beginning of
the proc array. Introduce a macro to compute it from the pointer
instead.

Reviewed-by: XXX
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407%40iki.fi
---
 src/backend/access/transam/clog.c |  4 ++--
 src/backend/access/transam/twophase.c | 11 +--
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/postmaster/bgwriter.c |  2 +-
 src/backend/postmaster/pgarch.c   |  2 +-
 src/backend/postmaster/walsummarizer.c|  2 +-
 src/backend/storage/buffer/bufmgr.c   |  6 +++---
 src/backend/storage/ipc/procarray.c   |  6 +++---
 src/backend/storage/lmgr/condition_variable.c | 12 ++--
 src/backend/storage/lmgr/lwlock.c |  6 +++---
 src/backend/storage/lmgr/predicate.c  |  2 +-
 src/backend/storage/lmgr/proc.c   |  1 -
 src/include/storage/lock.h|  2 +-
 src/include/storage/proc.h|  6 +-
 14 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc9..7550309c25a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -458,7 +458,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * less efficiently.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
-			ProcGlobal->allProcs[nextidx].clogGroupMemberPage != proc->clogGroupMemberPage)
+			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
 		{
 			/*
 			 * Ensure that this proc is not a member of any clog group that
@@ -473,7 +473,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(>clogGroupFirst,
 		   ,
-		   (uint32) proc->pgprocno))
+		   (uint32) GetNumberFromPGProc(proc)))
 			break;
 	}
 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8426458f7f5..234c8d08ebc 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -284,7 +284,7 @@ 

Re: Refactoring backend fork+exec code

2024-01-29 Thread Heikki Linnakangas

On 29/01/2024 17:54, reid.thomp...@crunchydata.com wrote:

On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:


And here we go. BackendID is now a 1-based index directly into the
PGPROC array.


Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
enum table?


Yeah, that might be in order. Although looking closer, it's only used in 
IsAuxProcess, which is only used in one sanity check in 
AuxProcessMain(). And even that gets refactored away by the later 
patches in this thread. So on second thoughts, I'll just remove it 
altogether.


I spent some more time on the 'lastBackend' optimization in sinvaladt.c. 
I realized that it became very useless with these patches, because aux 
processes are allocated pgprocno's after all the slots for regular 
backends. There are always aux processes running, so lastBackend would 
always have a value close to the max anyway. I replaced that with a 
dense 'pgprocnos' array that keeps track of the exact slots that are in 
use. I'm not 100% sure this is worth the effort; does any real world 
workload send shared invalidations so frequently that this matters? In 
any case, this should avoid the regression if such a workload exists.


New patch set attached. I think these are ready to be committed, but 
would appreciate a final review.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Jan 2024 23:15:55 +0200
Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC

It was always just the index of the PGPROC entry from the beginning of
the proc array. Introduce a macro to compute it from the pointer
instead.

Reviewed-by: XXX
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407%40iki.fi
---
 src/backend/access/transam/clog.c |  4 ++--
 src/backend/access/transam/twophase.c | 11 +--
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/postmaster/bgwriter.c |  2 +-
 src/backend/postmaster/pgarch.c   |  2 +-
 src/backend/postmaster/walsummarizer.c|  2 +-
 src/backend/storage/buffer/bufmgr.c   |  6 +++---
 src/backend/storage/ipc/procarray.c   |  6 +++---
 src/backend/storage/lmgr/condition_variable.c | 12 ++--
 src/backend/storage/lmgr/lwlock.c |  6 +++---
 src/backend/storage/lmgr/predicate.c  |  2 +-
 src/backend/storage/lmgr/proc.c   |  1 -
 src/include/storage/lock.h|  2 +-
 src/include/storage/proc.h|  6 +-
 14 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc9..7550309c25a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -458,7 +458,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * less efficiently.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
-			ProcGlobal->allProcs[nextidx].clogGroupMemberPage != proc->clogGroupMemberPage)
+			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
 		{
 			/*
 			 * Ensure that this proc is not a member of any clog group that
@@ -473,7 +473,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(>clogGroupFirst,
 		   ,
-		   (uint32) proc->pgprocno))
+		   (uint32) GetNumberFromPGProc(proc)))
 			break;
 	}
 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8426458f7f5..234c8d08ebc 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -284,7 +284,7 @@ TwoPhaseShmemInit(void)
 			TwoPhaseState->freeGXacts = [i];
 
 			/* associate it with a PGPROC assigned by InitProcGlobal */
-			gxacts[i].pgprocno = PreparedXactProcs[i].pgprocno;
+			gxacts[i].pgprocno = GetNumberFromPGProc([i]);
 
 			/*
 			 * Assign a unique ID for each dummy proc, so that the range of
@@ -461,7 +461,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 
 	/* Initialize the PGPROC entry */
 	MemSet(proc, 0, sizeof(PGPROC));
-	proc->pgprocno = gxact->pgprocno;
 	dlist_node_init(>links);
 	proc->waitStatus = PROC_WAIT_STATUS_OK;
 	if (LocalTransactionIdIsValid(MyProc->lxid))
@@ -780,7 +779,7 @@ pg_prepared_xact(PG_FUNCTION_ARGS)
 	while (status->array != NULL && status->currIdx < status->ngxacts)
 	{
 		GlobalTransaction gxact = >array[status->currIdx++];
-		PGPROC	   *proc = >allProcs[gxact->pgprocno];
+		PGPROC	   *proc = GetPGProcByNumber(gxact->pgprocno);
 		Datum		values[5] = {0};
 		bool		nulls[5] = {0};
 		HeapTuple	tuple;
@@ -935,7 +934,7 @@ TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
 {
 	GlobalTransaction gxact = 

Re: Refactoring backend fork+exec code

2024-01-29 Thread reid . thompson
On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
> 
> And here we go. BackendID is now a 1-based index directly into the 
> PGPROC array.
> 

Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
enum table?

  /*
  
  ¦* Auxiliary processes. These have PGPROC entries, but they are not   
  
  ¦* attached to any particular database. There can be only one of each of  
  
  ¦* these running at a time.   
  
  ¦*
  
  ¦* If you modify these, make sure to update NUM_AUXILIARY_PROCS and the   
  
  ¦* glossary in the docs.  
  
  ¦*/   
  
  B_ARCHIVER,   
  
  B_BG_WRITER,  
  
  B_CHECKPOINTER,   
  
  B_STARTUP,
  
  B_WAL_RECEIVER,   
  
  B_WAL_SUMMARIZER, 
  
  B_WAL_WRITER, 
  
  } BackendType;
  


  #define BACKEND_NUM_TYPES (B_WAL_WRITER + 1)

  
  extern PGDLLIMPORT BackendType MyBackendType;   

  
  #define FIRST_AUX_PROC B_ARCHIVER   
  #define IsAuxProcess(type)  (MyBackendType >= FIRST_AUX_PROC)




Re: Refactoring backend fork+exec code

2024-01-24 Thread Heikki Linnakangas

On 23/01/2024 21:50, Andres Freund wrote:

On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote:

On 22/01/2024 23:07, Andres Freund wrote:

diff --git a/src/backend/utils/activity/backend_status.c 
b/src/backend/utils/activity/backend_status.c
index 1a1050c8da1..92f24db4e18 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -257,17 +257,16 @@ pgstat_beinit(void)
else
{
/* Must be an auxiliary process */
-   Assert(MyAuxProcType != NotAnAuxProcess);
+   Assert(IsAuxProcess(MyBackendType));
/*
 * Assign the MyBEEntry for an auxiliary process.  Since it 
doesn't
 * have a BackendId, the slot is statically allocated based on 
the
-* auxiliary process type (MyAuxProcType).  Backends use slots 
indexed
-* in the range from 0 to MaxBackends (exclusive), so we use
-* MaxBackends + AuxProcType as the index of the slot for an 
auxiliary
-* process.
+* auxiliary process type.  Backends use slots indexed in the 
range
+* from 0 to MaxBackends (exclusive), and aux processes use the 
slots
+* after that.
 */
-   MyBEEntry = [MaxBackends + MyAuxProcType];
+   MyBEEntry = [MaxBackends + MyBackendType - 
FIRST_AUX_PROC];
}


Hm, this seems less than pretty. It's probably ok for now, but it seems like a
better fix might be to just start assigning backend ids to aux procs or switch
to indexing by pgprocno?


Using pgprocno is a good idea. Come to think of it, why do we even have a
concept of backend ID that's separate from pgprocno? backend ID is used to
index the ProcState array, but AFAICS we could use pgprocno as the index to
that, too.


I think we should do that. There are a few processes not participating in
sinval, but it doesn't make enough of a difference to make sinval slower. And
I think there'd be far bigger efficiency improvements to sinvaladt than not
having a handful more entries.


And here we go. BackendID is now a 1-based index directly into the 
PGPROC array.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 87db35c877f4492b98848dfcd0baa49cc8100c1b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Jan 2024 23:15:55 +0200
Subject: [PATCH v7 1/4] Remove superfluous 'pgprocno' field from PGPROC

It was always just the index of the PGPROC entry from the beginning of
the proc array. Introduce a macro to compute it from the pointer
instead.

Reviewed-by: XXX
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407%40iki.fi
---
 src/backend/access/transam/clog.c |  4 ++--
 src/backend/access/transam/twophase.c | 11 +--
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/postmaster/bgwriter.c |  2 +-
 src/backend/postmaster/pgarch.c   |  2 +-
 src/backend/postmaster/walsummarizer.c|  2 +-
 src/backend/storage/buffer/bufmgr.c   |  6 +++---
 src/backend/storage/ipc/procarray.c   |  6 +++---
 src/backend/storage/lmgr/condition_variable.c | 12 ++--
 src/backend/storage/lmgr/lwlock.c |  6 +++---
 src/backend/storage/lmgr/predicate.c  |  2 +-
 src/backend/storage/lmgr/proc.c   |  1 -
 src/include/storage/lock.h|  2 +-
 src/include/storage/proc.h|  6 +-
 14 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc9..7550309c25a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -458,7 +458,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * less efficiently.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
-			ProcGlobal->allProcs[nextidx].clogGroupMemberPage != proc->clogGroupMemberPage)
+			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
 		{
 			/*
 			 * Ensure that this proc is not a member of any clog group that
@@ -473,7 +473,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(>clogGroupFirst,
 		   ,
-		   (uint32) proc->pgprocno))
+		   (uint32) GetNumberFromPGProc(proc)))
 			break;
 	}
 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8426458f7f5..234c8d08ebc 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -284,7 +284,7 @@ TwoPhaseShmemInit(void)
 			TwoPhaseState->freeGXacts = [i];
 
 			/* associate it with a PGPROC assigned by InitProcGlobal */
-			gxacts[i].pgprocno = PreparedXactProcs[i].pgprocno;
+			gxacts[i].pgprocno = GetNumberFromPGProc([i]);
 
 			/*
 			 * Assign a unique ID for each dummy 

Re: Refactoring backend fork+exec code

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote:
> On 22/01/2024 23:07, Andres Freund wrote:
> > > diff --git a/src/backend/utils/activity/backend_status.c 
> > > b/src/backend/utils/activity/backend_status.c
> > > index 1a1050c8da1..92f24db4e18 100644
> > > --- a/src/backend/utils/activity/backend_status.c
> > > +++ b/src/backend/utils/activity/backend_status.c
> > > @@ -257,17 +257,16 @@ pgstat_beinit(void)
> > >   else
> > >   {
> > >   /* Must be an auxiliary process */
> > > - Assert(MyAuxProcType != NotAnAuxProcess);
> > > + Assert(IsAuxProcess(MyBackendType));
> > >   /*
> > >* Assign the MyBEEntry for an auxiliary process.  
> > > Since it doesn't
> > >* have a BackendId, the slot is statically allocated 
> > > based on the
> > > -  * auxiliary process type (MyAuxProcType).  Backends use slots 
> > > indexed
> > > -  * in the range from 0 to MaxBackends (exclusive), so we use
> > > -  * MaxBackends + AuxProcType as the index of the slot for an 
> > > auxiliary
> > > -  * process.
> > > +  * auxiliary process type.  Backends use slots indexed in the 
> > > range
> > > +  * from 0 to MaxBackends (exclusive), and aux processes use the 
> > > slots
> > > +  * after that.
> > >*/
> > > - MyBEEntry = [MaxBackends + MyAuxProcType];
> > > + MyBEEntry = [MaxBackends + MyBackendType - 
> > > FIRST_AUX_PROC];
> > >   }
> > 
> > Hm, this seems less than pretty. It's probably ok for now, but it seems 
> > like a
> > better fix might be to just start assigning backend ids to aux procs or 
> > switch
> > to indexing by pgprocno?
> 
> Using pgprocno is a good idea. Come to think of it, why do we even have a
> concept of backend ID that's separate from pgprocno? backend ID is used to
> index the ProcState array, but AFAICS we could use pgprocno as the index to
> that, too.

I think we should do that. There are a few processes not participating in
sinval, but it doesn't make enough of a difference to make sinval slower. And
I think there'd be far bigger efficiency improvements to sinvaladt than not
having a handful more entries.

Greetings,

Andres Freund




Re: Refactoring backend fork+exec code

2024-01-23 Thread Heikki Linnakangas

On 22/01/2024 23:07, Andres Freund wrote:

On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote:

@@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type)
errno = save_errno;
switch (type)
{
-   case StartupProcess:
+   case B_STARTUP:
ereport(LOG,
(errmsg("could not fork startup 
process: %m")));
break;
-   case ArchiverProcess:
+   case B_ARCHIVER:
ereport(LOG,
(errmsg("could not fork archiver 
process: %m")));
break;
-   case BgWriterProcess:
+   case B_BG_WRITER:
ereport(LOG,
(errmsg("could not fork background 
writer process: %m")));
break;
-   case CheckpointerProcess:
+   case B_CHECKPOINTER:
ereport(LOG,
(errmsg("could not fork checkpointer 
process: %m")));
break;
-   case WalWriterProcess:
+   case B_WAL_WRITER:
ereport(LOG,
(errmsg("could not fork WAL writer 
process: %m")));
break;
-   case WalReceiverProcess:
+   case B_WAL_RECEIVER:
ereport(LOG,
(errmsg("could not fork WAL receiver 
process: %m")));
break;
-   case WalSummarizerProcess:
+   case B_WAL_SUMMARIZER:
ereport(LOG,
(errmsg("could not fork WAL 
summarizer process: %m")));
break;


Seems we should replace this with something slightly more generic one of these
days...


The later patches in this thread will turn these into

ereport(LOG,
(errmsg("could not fork %s process: %m", 
PostmasterChildName(type;



diff --git a/src/backend/utils/activity/backend_status.c 
b/src/backend/utils/activity/backend_status.c
index 1a1050c8da1..92f24db4e18 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -257,17 +257,16 @@ pgstat_beinit(void)
else
{
/* Must be an auxiliary process */
-   Assert(MyAuxProcType != NotAnAuxProcess);
+   Assert(IsAuxProcess(MyBackendType));
  
  		/*

 * Assign the MyBEEntry for an auxiliary process.  Since it 
doesn't
 * have a BackendId, the slot is statically allocated based on 
the
-* auxiliary process type (MyAuxProcType).  Backends use slots 
indexed
-* in the range from 0 to MaxBackends (exclusive), so we use
-* MaxBackends + AuxProcType as the index of the slot for an 
auxiliary
-* process.
+* auxiliary process type.  Backends use slots indexed in the 
range
+* from 0 to MaxBackends (exclusive), and aux processes use the 
slots
+* after that.
 */
-   MyBEEntry = [MaxBackends + MyAuxProcType];
+   MyBEEntry = [MaxBackends + MyBackendType - 
FIRST_AUX_PROC];
}


Hm, this seems less than pretty. It's probably ok for now, but it seems like a
better fix might be to just start assigning backend ids to aux procs or switch
to indexing by pgprocno?


Using pgprocno is a good idea. Come to think of it, why do we even have 
a concept of backend ID that's separate from pgprocno? backend ID is 
used to index the ProcState array, but AFAICS we could use pgprocno as 
the index to that, too.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2024-01-22 Thread Andres Freund
Hi,

On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote:
> Here's a patch that gets rid of AuxProcType. It's independent of the other
> patches in this thread; if this is committed, I'll rebase the rest of the
> patches over this and get rid of the new PMC_* enum.
> 
> Three patches, actually. The first one fixes an existing comment that I
> noticed to be incorrect while working on this. I'll push that soon, barring
> objections. The second one gets rid of AuxProcType, and the third one
> replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and
> IsAutoVacuumWorkerProcess() with checks on MyBackendType. So MyBackendType
> is now the primary way to check what kind of a process the current process
> is.
> 
> 'am_walsender' would also be fairly straightforward to remove and replace
> with AmWalSenderProcess(). I didn't do that because we also have
> am_db_walsender and am_cascading_walsender which cannot be directly replaced
> with MyBackendType. Given that, it might be best to keep am_walsender for
> symmetry.


> @@ -561,13 +561,13 @@ static void ShmemBackendArrayAdd(Backend *bn);
>  static void ShmemBackendArrayRemove(Backend *bn);
>  #endif   /* EXEC_BACKEND 
> */
>  
> -#define StartupDataBase()StartChildProcess(StartupProcess)
> -#define StartArchiver()  
> StartChildProcess(ArchiverProcess)
> -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> -#define StartCheckpointer()  StartChildProcess(CheckpointerProcess)
> -#define StartWalWriter() StartChildProcess(WalWriterProcess)
> -#define StartWalReceiver()   StartChildProcess(WalReceiverProcess)
> -#define StartWalSummarizer() StartChildProcess(WalSummarizerProcess)
> +#define StartupDataBase()StartChildProcess(B_STARTUP)
> +#define StartArchiver()  StartChildProcess(B_ARCHIVER)
> +#define StartBackgroundWriter() StartChildProcess(B_BG_WRITER)
> +#define StartCheckpointer()  StartChildProcess(B_CHECKPOINTER)
> +#define StartWalWriter() StartChildProcess(B_WAL_WRITER)
> +#define StartWalReceiver()   StartChildProcess(B_WAL_RECEIVER)
> +#define StartWalSummarizer() StartChildProcess(B_WAL_SUMMARIZER)

Not for this commit, but we ought to rip out these macros - all they do is to
make it harder to understand the code.




> @@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type)
>   errno = save_errno;
>   switch (type)
>   {
> - case StartupProcess:
> + case B_STARTUP:
>   ereport(LOG,
>   (errmsg("could not fork startup 
> process: %m")));
>   break;
> - case ArchiverProcess:
> + case B_ARCHIVER:
>   ereport(LOG,
>   (errmsg("could not fork 
> archiver process: %m")));
>   break;
> - case BgWriterProcess:
> + case B_BG_WRITER:
>   ereport(LOG,
>   (errmsg("could not fork 
> background writer process: %m")));
>   break;
> - case CheckpointerProcess:
> + case B_CHECKPOINTER:
>   ereport(LOG,
>   (errmsg("could not fork 
> checkpointer process: %m")));
>   break;
> - case WalWriterProcess:
> + case B_WAL_WRITER:
>   ereport(LOG,
>   (errmsg("could not fork WAL 
> writer process: %m")));
>   break;
> - case WalReceiverProcess:
> + case B_WAL_RECEIVER:
>   ereport(LOG,
>   (errmsg("could not fork WAL 
> receiver process: %m")));
>   break;
> - case WalSummarizerProcess:
> + case B_WAL_SUMMARIZER:
>   ereport(LOG,
>   (errmsg("could not fork WAL 
> summarizer process: %m")));
>   break;

Seems we should replace this with something slightly more generic one of these
days...


> diff --git a/src/backend/utils/activity/backend_status.c 
> b/src/backend/utils/activity/backend_status.c
> index 1a1050c8da1..92f24db4e18 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -257,17 +257,16 @@ pgstat_beinit(void)
>   else
>   {
>   /* Must be an auxiliary process */
> - 

Re: Refactoring backend fork+exec code

2024-01-10 Thread Heikki Linnakangas

On 08/12/2023 14:33, Heikki Linnakangas wrote:

+   [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
+   [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
+   [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
+   [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
+
+   [PMC_STARTUP] = {"startup", StartupProcessMain, true},
+   [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
+   [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
+   [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
+   [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
+   [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
+};


It feels like we have too many different ways of documenting the type of a
process. This new PMC_ stuff, enum AuxProcType, enum BackendType.

Agreed. And "am_walsender" and such variables.


Here's a patch that gets rid of AuxProcType. It's independent of the 
other patches in this thread; if this is committed, I'll rebase the rest 
of the patches over this and get rid of the new PMC_* enum.


Three patches, actually. The first one fixes an existing comment that I 
noticed to be incorrect while working on this. I'll push that soon, 
barring objections. The second one gets rid of AuxProcType, and the 
third one replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and 
IsAutoVacuumWorkerProcess() with checks on MyBackendType. So 
MyBackendType is now the primary way to check what kind of a process the 
current process is.


'am_walsender' would also be fairly straightforward to remove and 
replace with AmWalSenderProcess(). I didn't do that because we also have 
am_db_walsender and am_cascading_walsender which cannot be directly 
replaced with MyBackendType. Given that, it might be best to keep 
am_walsender for symmetry.


--
Heikki Linnakangas
Neon (https://neon.tech)
From a0ecc54763252de74907fd0e4aed30fdb753ff86 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 10 Jan 2024 13:46:23 +0200
Subject: [PATCH v6 1/3] Fix incorrect comment on how BackendStatusArray is
 indexed

The comment was copy-pasted from the call to ProcSignalInit() in
AuxiliaryProcessMain(), which uses a similar scheme of having reserved
slots for aux processes after MaxBackends slots for backends. However,
ProcSignalInit() indexing starts from 1, whereas BackendStatusArray
starts from 0. The code is correct, but the comment was wrong.

Reviewed-by: XXX
Discussion: XXX
---
 src/backend/utils/activity/backend_status.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 3dac79c30eb..1a1050c8da1 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -263,9 +263,9 @@ pgstat_beinit(void)
 		 * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
 		 * have a BackendId, the slot is statically allocated based on the
 		 * auxiliary process type (MyAuxProcType).  Backends use slots indexed
-		 * in the range from 1 to MaxBackends (inclusive), so we use
-		 * MaxBackends + AuxBackendType + 1 as the index of the slot for an
-		 * auxiliary process.
+		 * in the range from 0 to MaxBackends (exclusive), so we use
+		 * MaxBackends + AuxProcType as the index of the slot for an auxiliary
+		 * process.
 		 */
 		MyBEEntry = [MaxBackends + MyAuxProcType];
 	}
-- 
2.39.2

From 68b8c662af229f93244b45282f3112d9a5d809b9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 10 Jan 2024 14:00:09 +0200
Subject: [PATCH v6 2/3] Remove MyAuxProcType, use MyBackendType instead

MyAuxProcType was redundant with MyBackendType.

Reviewed-by: XXX
Discussion: XXX
---
 src/backend/postmaster/auxprocess.c | 68 +---
 src/backend/postmaster/postmaster.c | 36 +--
 src/backend/utils/activity/backend_status.c | 11 ++--
 src/include/miscadmin.h | 71 ++---
 src/include/postmaster/auxprocess.h |  2 +-
 src/tools/pgindent/typedefs.list|  1 -
 6 files changed, 74 insertions(+), 115 deletions(-)

diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index ab86e802f21..bd3815b63d0 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -38,14 +38,6 @@
 static void ShutdownAuxiliaryProcess(int code, Datum arg);
 
 
-/* 
- *		global variables
- * 
- */
-
-AuxProcType MyAuxProcType = NotAnAuxProcess;	/* declared in miscadmin.h */
-
-
 /*
  *	 AuxiliaryProcessMain
  *
@@ -55,39 +47,13 @@ AuxProcType MyAuxProcType = NotAnAuxProcess;	/* declared in miscadmin.h */
  *	 This code is here just because of historical reasons.
  */
 void
-AuxiliaryProcessMain(AuxProcType auxtype)
+AuxiliaryProcessMain(BackendType auxtype)
 {
 	Assert(IsUnderPostmaster);
 
-	

Re: Refactoring backend fork+exec code

2023-12-03 Thread Heikki Linnakangas

On 02/12/2023 05:00, Alexander Lakhin wrote:

What bothered me additionally, is an error detected after server start. I
couldn't see it without the patches applied. I mean, on HEAD I now see
`make check` passing, but with the patches it fails:
...
# parallel group (20 tests):  interval date numerology polygon box macaddr8 
macaddr multirangetypes line timestamp
timetz timestamptz time circle strings lseg inet md5 path point
not ok 22    + strings  1048 ms
# (test process exited with exit code 2)
not ok 23    + md5  1052 ms
# (test process exited with exit code 2)
...
src/test/regress/log/postmaster.log contains:
==00:00:00:30.730 1713480== Syscall param write(buf) points to uninitialised 
byte(s)
==00:00:00:30.730 1713480==    at 0x5245A37: write (write.c:26)
==00:00:00:30.730 1713480==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 
(fileops.c:1180)
==00:00:00:30.730 1713480==    by 0x51BC84F: new_do_write (fileops.c:448)
==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_new_file_xsputn 
(fileops.c:1254)
==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 
(fileops.c:1196)
==00:00:00:30.730 1713480==    by 0x51B1056: fwrite (iofwrite.c:39)
==00:00:00:30.730 1713480==    by 0x5540CF: internal_forkexec 
(postmaster.c:4526)
==00:00:00:30.730 1713480==    by 0x5543C0: bgworker_forkexec 
(postmaster.c:5624)
==00:00:00:30.730 1713480==    by 0x555477: do_start_bgworker 
(postmaster.c:5665)
==00:00:00:30.730 1713480==    by 0x555738: maybe_start_bgworkers 
(postmaster.c:5928)
==00:00:00:30.730 1713480==    by 0x556072: process_pm_pmsignal 
(postmaster.c:5080)
==00:00:00:30.730 1713480==    by 0x556610: ServerLoop (postmaster.c:1761)
==00:00:00:30.730 1713480==    by 0x557BE2: PostmasterMain (postmaster.c:1469)
==00:00:00:30.730 1713480==    by 0x47216B: main (main.c:198)
==00:00:00:30.730 1713480==  Address 0x1ffeffd8c0 is on thread 1's stack
==00:00:00:30.730 1713480==  in frame #4, created by internal_forkexec 
(postmaster.c:4482)
==00:00:00:30.730 1713480==
...


Ack, I see this too. I fixed it by adding MCXT_ALLOC_ZERO to the 
allocation of the BackendWorker struct. That's a little heavy-handed, 
like with the previous failures the uninitialized padding bytes are 
written to the file and read back, but not accessed after that. But it 
seems like the simplest fix. This isn't performance critical after all.


I also renamed the 'has_worker' field to 'has_bgworker', per Tristan's 
suggestion. Pushed with those changes.


Thanks for the reviews!


2023-12-02 05:14:30.751 MSK client backend[1713740] pg_regress/rangetypes 
FATAL:  terminating connection due to
unexpected postmaster exit
2023-12-02 05:14:31.033 MSK client backend[1713734] pg_regress/numeric FATAL:  
postmaster exited during a parallel
transaction
TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", 
Line: 591, PID: 1713734

I haven't looked deeper yet, but it seems that we see two issues here (and
Assert is not directly caused by the patches set.)


I have not been able to reproduce this one.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2023-12-01 Thread Alexander Lakhin

Hello Heikki,

01.12.2023 23:44, Heikki Linnakangas wrote:



With memset(param, 0, sizeof(*param)); added at the beginning of
save_backend_variables(), server starts successfully, but then
`make check` fails with other Valgrind error.


That's actually a pre-existing issue, I'm seeing that even on unpatched 
'master'.


Thank you for fixing that!

Yes, I had discovered it before, but yesterday I decided to check whether
your patches improve the situation...

What bothered me additionally, is an error detected after server start. I
couldn't see it without the patches applied. I mean, on HEAD I now see
`make check` passing, but with the patches it fails:
...
# parallel group (20 tests):  interval date numerology polygon box macaddr8 macaddr multirangetypes line timestamp 
timetz timestamptz time circle strings lseg inet md5 path point

not ok 22    + strings  1048 ms
# (test process exited with exit code 2)
not ok 23    + md5  1052 ms
# (test process exited with exit code 2)
...
src/test/regress/log/postmaster.log contains:
==00:00:00:30.730 1713480== Syscall param write(buf) points to uninitialised 
byte(s)
==00:00:00:30.730 1713480==    at 0x5245A37: write (write.c:26)
==00:00:00:30.730 1713480==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 
(fileops.c:1180)
==00:00:00:30.730 1713480==    by 0x51BC84F: new_do_write (fileops.c:448)
==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_new_file_xsputn 
(fileops.c:1254)
==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 
(fileops.c:1196)
==00:00:00:30.730 1713480==    by 0x51B1056: fwrite (iofwrite.c:39)
==00:00:00:30.730 1713480==    by 0x5540CF: internal_forkexec 
(postmaster.c:4526)
==00:00:00:30.730 1713480==    by 0x5543C0: bgworker_forkexec 
(postmaster.c:5624)
==00:00:00:30.730 1713480==    by 0x555477: do_start_bgworker 
(postmaster.c:5665)
==00:00:00:30.730 1713480==    by 0x555738: maybe_start_bgworkers 
(postmaster.c:5928)
==00:00:00:30.730 1713480==    by 0x556072: process_pm_pmsignal 
(postmaster.c:5080)
==00:00:00:30.730 1713480==    by 0x556610: ServerLoop (postmaster.c:1761)
==00:00:00:30.730 1713480==    by 0x557BE2: PostmasterMain (postmaster.c:1469)
==00:00:00:30.730 1713480==    by 0x47216B: main (main.c:198)
==00:00:00:30.730 1713480==  Address 0x1ffeffd8c0 is on thread 1's stack
==00:00:00:30.730 1713480==  in frame #4, created by internal_forkexec 
(postmaster.c:4482)
==00:00:00:30.730 1713480==
...
2023-12-02 05:14:30.751 MSK client backend[1713740] pg_regress/rangetypes FATAL:  terminating connection due to 
unexpected postmaster exit
2023-12-02 05:14:31.033 MSK client backend[1713734] pg_regress/numeric FATAL:  postmaster exited during a parallel 
transaction

TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", 
Line: 591, PID: 1713734

I haven't looked deeper yet, but it seems that we see two issues here (and
Assert is not directly caused by the patches set.)

Best regards,
Alexander




Re: Refactoring backend fork+exec code

2023-12-01 Thread Tom Lane
"Tristan Partin"  writes:
> On Fri Dec 1, 2023 at 2:44 PM CST, Heikki Linnakangas wrote:
>> Committed a fix with memset(). I'm not sure what our policy with 
>> backpatching this kind of issues is. This goes back to all supported 
>> versions, but given the lack of complaints, I chose to not backpatch.

> Seems like a harmless think to backpatch. It is conceivable that someone 
> might want to run Valgrind on something other than HEAD.

FWIW, I agree with Heikki's conclusion.  EXEC_BACKEND on non-Windows
is already a niche developer-only setup, and given the lack of complaints,
nobody's that interested in running Valgrind with it.  Fixing it on HEAD
seems like plenty.

regards, tom lane




Re: Refactoring backend fork+exec code

2023-12-01 Thread Tristan Partin

On Fri Dec 1, 2023 at 2:44 PM CST, Heikki Linnakangas wrote:

On 01/12/2023 16:00, Alexander Lakhin wrote:
> Maybe you could look at issues with running `make check` under Valgrind
> when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND":
> # +++ regress check in src/test/regress +++
> # initializing database system by copying initdb template
> # postmaster failed, examine ".../src/test/regress/log/postmaster.log" for 
the reason
> Bail out!make[1]: ***
> 
> ...

> 2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG:  listening on Unix socket 
"/tmp/pg_regress-pPFNk0/.s.PGSQL.55312"
> ==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised 
byte(s)
> ==00:00:00:01.692 1259396==    at 0x5245A37: write (write.c:26)
> ==00:00:00:01.692 1259396==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 
(fileops.c:1180)
> ==00:00:00:01.692 1259396==    by 0x51BC84F: new_do_write (fileops.c:448)
> ==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_new_file_xsputn 
(fileops.c:1254)
> ==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 
(fileops.c:1196)
> ==00:00:00:01.692 1259396==    by 0x51B1056: fwrite (iofwrite.c:39)
> ==00:00:00:01.692 1259396==    by 0x552E21: internal_forkexec 
(postmaster.c:4518)
> ==00:00:00:01.692 1259396==    by 0x5546A1: postmaster_forkexec 
(postmaster.c:)
> ==00:00:00:01.692 1259396==    by 0x55471C: StartChildProcess 
(postmaster.c:5275)
> ==00:00:00:01.692 1259396==    by 0x557B61: PostmasterMain (postmaster.c:1454)
> ==00:00:00:01.692 1259396==    by 0x472136: main (main.c:198)
> ==00:00:00:01.692 1259396==  Address 0x1ffeffdc11 is on thread 1's stack
> ==00:00:00:01.692 1259396==  in frame #4, created by internal_forkexec 
(postmaster.c:4482)
> ==00:00:00:01.692 1259396==
> 
> With memset(param, 0, sizeof(*param)); added at the beginning of

> save_backend_variables(), server starts successfully, but then
> `make check` fails with other Valgrind error.

That's actually a pre-existing issue, I'm seeing that even on unpatched 
'master'.


In a nutshell, the problem is that the uninitialized padding bytes in 
BackendParameters are written to the file. When we read the file back, 
we don't access the padding bytes, so that's harmless. But Valgrind 
doesn't know that.


On Windows, the file is created with 
CreateFileMapping(INVALID_HANDLE_VALUE, ...) and we write the variables 
directly to the mapping. If I understand the Windows API docs correctly, 
it is guaranteed to be initialized to zeros, so we don't have this 
problem on Windows, only on other platforms with EXEC_BACKEND. I think 
it makes sense to clear the memory on other platforms too, since that's 
what we do on Windows.


Committed a fix with memset(). I'm not sure what our policy with 
backpatching this kind of issues is. This goes back to all supported 
versions, but given the lack of complaints, I chose to not backpatch.


Seems like a harmless think to backpatch. It is conceivable that someone 
might want to run Valgrind on something other than HEAD.


--
Tristan Partin
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

2023-12-01 Thread Heikki Linnakangas

On 01/12/2023 16:00, Alexander Lakhin wrote:

Maybe you could look at issues with running `make check` under Valgrind
when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND":
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the 
reason
Bail out!make[1]: ***

...
2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG:  listening on Unix socket 
"/tmp/pg_regress-pPFNk0/.s.PGSQL.55312"
==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised 
byte(s)
==00:00:00:01.692 1259396==    at 0x5245A37: write (write.c:26)
==00:00:00:01.692 1259396==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 
(fileops.c:1180)
==00:00:00:01.692 1259396==    by 0x51BC84F: new_do_write (fileops.c:448)
==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_new_file_xsputn 
(fileops.c:1254)
==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 
(fileops.c:1196)
==00:00:00:01.692 1259396==    by 0x51B1056: fwrite (iofwrite.c:39)
==00:00:00:01.692 1259396==    by 0x552E21: internal_forkexec 
(postmaster.c:4518)
==00:00:00:01.692 1259396==    by 0x5546A1: postmaster_forkexec 
(postmaster.c:)
==00:00:00:01.692 1259396==    by 0x55471C: StartChildProcess 
(postmaster.c:5275)
==00:00:00:01.692 1259396==    by 0x557B61: PostmasterMain (postmaster.c:1454)
==00:00:00:01.692 1259396==    by 0x472136: main (main.c:198)
==00:00:00:01.692 1259396==  Address 0x1ffeffdc11 is on thread 1's stack
==00:00:00:01.692 1259396==  in frame #4, created by internal_forkexec 
(postmaster.c:4482)
==00:00:00:01.692 1259396==

With memset(param, 0, sizeof(*param)); added at the beginning of
save_backend_variables(), server starts successfully, but then
`make check` fails with other Valgrind error.


That's actually a pre-existing issue, I'm seeing that even on unpatched 
'master'.


In a nutshell, the problem is that the uninitialized padding bytes in 
BackendParameters are written to the file. When we read the file back, 
we don't access the padding bytes, so that's harmless. But Valgrind 
doesn't know that.


On Windows, the file is created with 
CreateFileMapping(INVALID_HANDLE_VALUE, ...) and we write the variables 
directly to the mapping. If I understand the Windows API docs correctly, 
it is guaranteed to be initialized to zeros, so we don't have this 
problem on Windows, only on other platforms with EXEC_BACKEND. I think 
it makes sense to clear the memory on other platforms too, since that's 
what we do on Windows.


Committed a fix with memset(). I'm not sure what our policy with 
backpatching this kind of issues is. This goes back to all supported 
versions, but given the lack of complaints, I chose to not backpatch.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2023-12-01 Thread Tristan Partin

On Fri Dec 1, 2023 at 6:10 AM CST, Heikki Linnakangas wrote:

On 30/11/2023 20:44, Tristan Partin wrote:
> Patches 1-3 seem committable as-is.

Thanks for the review! I'm focusing on patches 1-3 now, and will come 
back to the rest after committing 1-3.


There was one test failure with EXEC_BACKEND from patch 2, in 
'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is 
empty to decide if the BackgroundWorker struct is filled in or not, but 
it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably 
should, I think that's an oversight in 'test_shm_mq', but that's a 
separate issue.


I did some more refactoring of patch 2, to fix that and to improve it in 
general. The BackgroundWorker struct is now passed through the 
fork-related functions similarly to the Port struct. That seems more 
consistent.


Attached is new version of these patches. For easier review, I made the 
new refactorings compared in a new commit 0003. I will squash that 
before pushing, but this makes it easier to see what changed.


Barring any new feedback or issues, I will commit these.


My only thought is that perhaps has_bg_worker is a better name than 
has_worker, but I agree that having a flag is better than checking 
bgw_name.


--
Tristan Partin
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

2023-12-01 Thread Alexander Lakhin

Hello Heikki,

01.12.2023 15:10, Heikki Linnakangas wrote:
Attached is new version of these patches. For easier review, I made the new refactorings compared in a new commit 
0003. I will squash that before pushing, but this makes it easier to see what changed.


Barring any new feedback or issues, I will commit these.



Maybe you could look at issues with running `make check` under Valgrind
when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND":
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the 
reason
Bail out!make[1]: ***

...
2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG:  listening on Unix socket 
"/tmp/pg_regress-pPFNk0/.s.PGSQL.55312"
==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised 
byte(s)
==00:00:00:01.692 1259396==    at 0x5245A37: write (write.c:26)
==00:00:00:01.692 1259396==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 
(fileops.c:1180)
==00:00:00:01.692 1259396==    by 0x51BC84F: new_do_write (fileops.c:448)
==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_new_file_xsputn 
(fileops.c:1254)
==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 
(fileops.c:1196)
==00:00:00:01.692 1259396==    by 0x51B1056: fwrite (iofwrite.c:39)
==00:00:00:01.692 1259396==    by 0x552E21: internal_forkexec 
(postmaster.c:4518)
==00:00:00:01.692 1259396==    by 0x5546A1: postmaster_forkexec 
(postmaster.c:)
==00:00:00:01.692 1259396==    by 0x55471C: StartChildProcess 
(postmaster.c:5275)
==00:00:00:01.692 1259396==    by 0x557B61: PostmasterMain (postmaster.c:1454)
==00:00:00:01.692 1259396==    by 0x472136: main (main.c:198)
==00:00:00:01.692 1259396==  Address 0x1ffeffdc11 is on thread 1's stack
==00:00:00:01.692 1259396==  in frame #4, created by internal_forkexec 
(postmaster.c:4482)
==00:00:00:01.692 1259396==

With memset(param, 0, sizeof(*param)); added at the beginning of
save_backend_variables(), server starts successfully, but then
`make check` fails with other Valgrind error.

Best regards,
Alexander




Re: Refactoring backend fork+exec code

2023-12-01 Thread Heikki Linnakangas

On 30/11/2023 20:44, Tristan Partin wrote:

Patches 1-3 seem committable as-is.


Thanks for the review! I'm focusing on patches 1-3 now, and will come 
back to the rest after committing 1-3.


There was one test failure with EXEC_BACKEND from patch 2, in 
'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is 
empty to decide if the BackgroundWorker struct is filled in or not, but 
it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably 
should, I think that's an oversight in 'test_shm_mq', but that's a 
separate issue.


I did some more refactoring of patch 2, to fix that and to improve it in 
general. The BackgroundWorker struct is now passed through the 
fork-related functions similarly to the Port struct. That seems more 
consistent.


Attached is new version of these patches. For easier review, I made the 
new refactorings compared in a new commit 0003. I will squash that 
before pushing, but this makes it easier to see what changed.


Barring any new feedback or issues, I will commit these.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 6dd44d7057e535eb441c9ab8fda1bbdd8079f2fb Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 30 Nov 2023 00:01:25 +0200
Subject: [PATCH v4 1/4] Refactor CreateSharedMemoryAndSemaphores

For clarity, have separate functions for *creating* the shared memory
and semaphores at postmaster or single-user backend startup, and
for *attaching* to existing shared memory structures in EXEC_BACKEND
case. CreateSharedMemoryAndSemaphores() is now called only at
postmaster startup, and a new AttachSharedMemoryStructs() function is
called at backend startup in EXEC_BACKEND mode.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce...@iki.fi
---
 src/backend/postmaster/postmaster.c   |  20 +--
 src/backend/replication/walreceiver.c |   3 +-
 src/backend/storage/ipc/ipci.c| 178 +++---
 src/backend/storage/lmgr/proc.c   |   2 +-
 src/include/storage/ipc.h |   3 +
 5 files changed, 117 insertions(+), 89 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7a5cd06c5c9..b03e88e6702 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4909,11 +4909,11 @@ SubPostmasterMain(int argc, char *argv[])
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
 
-		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		/* Need a PGPROC to run AttachSharedMemoryStructs */
 		InitProcess();
 
 		/* Attach process to shared data structures */
-		CreateSharedMemoryAndSemaphores();
+		AttachSharedMemoryStructs();
 
 		/* And run the backend */
 		BackendRun();		/* does not return */
@@ -4927,11 +4927,11 @@ SubPostmasterMain(int argc, char *argv[])
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
 
-		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		/* Need a PGPROC to run AttachSharedMemoryStructs */
 		InitAuxiliaryProcess();
 
 		/* Attach process to shared data structures */
-		CreateSharedMemoryAndSemaphores();
+		AttachSharedMemoryStructs();
 
 		auxtype = atoi(argv[3]);
 		AuxiliaryProcessMain(auxtype);	/* does not return */
@@ -4941,11 +4941,11 @@ SubPostmasterMain(int argc, char *argv[])
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
 
-		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		/* Need a PGPROC to run AttachSharedMemoryStructs */
 		InitProcess();
 
 		/* Attach process to shared data structures */
-		CreateSharedMemoryAndSemaphores();
+		AttachSharedMemoryStructs();
 
 		AutoVacLauncherMain(argc - 2, argv + 2);	/* does not return */
 	}
@@ -4954,11 +4954,11 @@ SubPostmasterMain(int argc, char *argv[])
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
 
-		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		/* Need a PGPROC to run AttachSharedMemoryStructs */
 		InitProcess();
 
 		/* Attach process to shared data structures */
-		CreateSharedMemoryAndSemaphores();
+		AttachSharedMemoryStructs();
 
 		AutoVacWorkerMain(argc - 2, argv + 2);	/* does not return */
 	}
@@ -4972,11 +4972,11 @@ SubPostmasterMain(int argc, char *argv[])
 		/* Restore basic shared memory pointers */
 		InitShmemAccess(UsedShmemSegAddr);
 
-		/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
+		/* Need a PGPROC to run AttachSharedMemoryStructs */
 		InitProcess();
 
 		/* Attach process to shared data structures */
-		CreateSharedMemoryAndSemaphores();
+		AttachSharedMemoryStructs();
 
 		/* Fetch MyBgworkerEntry from shared memory */
 		shmem_slot = atoi(argv[1] + 15);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2398167f495..26ded928a71 100644
--- a/src/backend/replication/walreceiver.c
+++ 

Re: Refactoring backend fork+exec code

2023-11-30 Thread Andres Freund
Hi,

On 2023-12-01 01:36:13 +0200, Heikki Linnakangas wrote:
> On 30/11/2023 22:26, Andres Freund wrote:
> > On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
> > >  From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
> > > From: Heikki Linnakangas 
> > > Date: Thu, 30 Nov 2023 00:01:25 +0200
> > > Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores
> > >
> > > For clarity, have separate functions for *creating* the shared memory
> > > and semaphores at postmaster or single-user backend startup, and
> > > for *attaching* to existing shared memory structures in EXEC_BACKEND
> > > case. CreateSharedMemoryAndSemaphores() is now called only at
> > > postmaster startup, and a new AttachSharedMemoryStructs() function is
> > > called at backend startup in EXEC_BACKEND mode.
> >
> > I assume CreateSharedMemoryAndSemaphores() is still called during crash
> > restart?
>
> Yes.
>
> >  I wonder if it shouldn't split three ways:
> > 1) create
> > 2) initialize
> > 3) attach
>
> Why? What would be the difference between create and initialize phases?

Mainly because I somehow mis-remembered how we deal with the shared memory
allocation when crashing. I somehow had remembered that we reused the same
allocation across restarts, but reinitialized it from scratch. There's a
kernel of truth to that, because we can end up re-attaching to an existing
sysv shared memory segment. But not more. Perhaps I was confusing things with
the listen sockets?

Andres




Re: Refactoring backend fork+exec code

2023-11-30 Thread Heikki Linnakangas

On 30/11/2023 22:26, Andres Freund wrote:

On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:

 From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 30 Nov 2023 00:01:25 +0200
Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores

For clarity, have separate functions for *creating* the shared memory
and semaphores at postmaster or single-user backend startup, and
for *attaching* to existing shared memory structures in EXEC_BACKEND
case. CreateSharedMemoryAndSemaphores() is now called only at
postmaster startup, and a new AttachSharedMemoryStructs() function is
called at backend startup in EXEC_BACKEND mode.


I assume CreateSharedMemoryAndSemaphores() is still called during crash
restart?


Yes.


 I wonder if it shouldn't split three ways:
1) create
2) initialize
3) attach


Why? What would be the difference between create and initialize phases?

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2023-11-30 Thread Heikki Linnakangas

On 30/11/2023 22:26, Andres Freund wrote:

Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call
InitLWLockAccess().


Yeah that caught my eye too.

It seems to have been an oversight in commit 1c6821be31f. Before that, 
in 9.4, the lwlock stats were printed for aux processes too, on shutdown.


Committed a fix for that to master.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Refactoring backend fork+exec code

2023-11-30 Thread Thomas Munro
On Fri, Dec 1, 2023 at 9:31 AM Andres Freund  wrote:
> On 2023-11-30 12:44:33 -0600, Tristan Partin wrote:
> > >  +/*
> > >  + * Set reference point for stack-depth checking.  This might 
> > > seem
> > >  + * redundant in !EXEC_BACKEND builds; but it's not because the 
> > > postmaster
> > >  + * launches its children from signal handlers, so we might be 
> > > running on
> > >  + * an alternative stack. XXX still true?
> > >  + */
> > >  +(void) set_stack_base();
> >
> > Looks like there is still this XXX left. Can't say I completely understand
> > the second sentence either.
>
> We used to start some child processes of postmaster in signal handlers. That
> was fixed in
>
> commit 7389aad6366
> Author: Thomas Munro 
> Date:   2023-01-12 12:34:23 +1300
>
> Use WaitEventSet API for postmaster's event loop.
>
>
> In some cases signal handlers run on a separate stack, which meant that the
> set_stack_base() we did in postmaster would yield a completely bogus stack
> depth estimation.  So this comment should likely have been removed. Thomas?

Right, I should delete that comment in master and 16.  While wondering
what to write instead, my first thought is that it is better to leave
the actual call there though, because otherwise there is a small
difference in stack reference point between EXEC_BACKEND and
!EXEC_BACKEND builds, consumed by a few postmaster stack frames.  So
the new comment would just say that.

(I did idly wonder if there is a longjmp trick to zap those frames
post-fork, not looked into and probably doesn't really improve
anything we care about...)




Re: Refactoring backend fork+exec code

2023-11-30 Thread Andres Freund
Hi,

On 2023-11-30 12:44:33 -0600, Tristan Partin wrote:
> >  +/*
> >  + * Set reference point for stack-depth checking.  This might seem
> >  + * redundant in !EXEC_BACKEND builds; but it's not because the 
> > postmaster
> >  + * launches its children from signal handlers, so we might be 
> > running on
> >  + * an alternative stack. XXX still true?
> >  + */
> >  +(void) set_stack_base();
> 
> Looks like there is still this XXX left. Can't say I completely understand
> the second sentence either.

We used to start some child processes of postmaster in signal handlers. That
was fixed in

commit 7389aad6366
Author: Thomas Munro 
Date:   2023-01-12 12:34:23 +1300
 
Use WaitEventSet API for postmaster's event loop.


In some cases signal handlers run on a separate stack, which meant that the
set_stack_base() we did in postmaster would yield a completely bogus stack
depth estimation.  So this comment should likely have been removed. Thomas?

Greetings,

Andres Freund




Re: Refactoring backend fork+exec code

2023-11-30 Thread Andres Freund
Hi,

On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
> - patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
> CreateSharedMemoryAndSemaphores is now only called at postmaster startup,
> and a new function called AttachSharedMemoryStructs() is called in backends
> in EXEC_BACKEND mode. I extracted the common parts of those functions to a
> new static function. (Some of this refactoring used to be part of the 3rd
> patch in the series, but it seems useful on its own, so I split it out.)

I like that idea.



> From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Thu, 30 Nov 2023 00:01:25 +0200
> Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores
>
> For clarity, have separate functions for *creating* the shared memory
> and semaphores at postmaster or single-user backend startup, and
> for *attaching* to existing shared memory structures in EXEC_BACKEND
> case. CreateSharedMemoryAndSemaphores() is now called only at
> postmaster startup, and a new AttachSharedMemoryStructs() function is
> called at backend startup in EXEC_BACKEND mode.

I assume CreateSharedMemoryAndSemaphores() is still called during crash
restart?  I wonder if it shouldn't split three ways:
1) create
2) initialize
3) attach


> From 3478cafcf74a5c8d649e0287e6c72669a29c0e70 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Thu, 30 Nov 2023 00:02:03 +0200
> Subject: [PATCH v3 2/7] Pass BackgroundWorker entry in the parameter file in
>  EXEC_BACKEND mode
>
> This makes it possible to move InitProcess later in SubPostmasterMain
> (in next commit), as we no longer need to access shared memory to read
> background worker entry.

>  static void read_backend_variables(char *id, Port *port);
> @@ -4831,7 +4833,7 @@ SubPostmasterMain(int argc, char *argv[])
>   strcmp(argv[1], "--forkavlauncher") == 0 ||
>   strcmp(argv[1], "--forkavworker") == 0 ||
>   strcmp(argv[1], "--forkaux") == 0 ||
> - strncmp(argv[1], "--forkbgworker=", 15) == 0)
> + strncmp(argv[1], "--forkbgworker", 14) == 0)
>   PGSharedMemoryReAttach();
>   else
>   PGSharedMemoryNoReAttach();
> @@ -4962,10 +4964,8 @@ SubPostmasterMain(int argc, char *argv[])
>
>   AutoVacWorkerMain(argc - 2, argv + 2);  /* does not return */
>   }
> - if (strncmp(argv[1], "--forkbgworker=", 15) == 0)
> + if (strncmp(argv[1], "--forkbgworker", 14) == 0)


Now that we don't need to look at parameters anymore, these should probably be
just a strcmp(), like the other cases?


> From 0d071474e12a70ff8113c7b0731c5b97fec45007 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Wed, 29 Nov 2023 23:47:25 +0200
> Subject: [PATCH v3 3/7] Refactor how InitProcess is called
>
> The order of process initialization steps is now more consistent
> between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called
> at the same place in either mode. We can now also move the
> AttachSharedMemoryStructs() call into InitProcess() itself. This
> reduces the number of "#ifdef EXEC_BACKEND" blocks.

Yay.


> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
> index cdfdd6fbe1d..6c708777dde 100644
> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -461,6 +461,12 @@ InitProcess(void)
>*/
>   InitLWLockAccess();
>   InitDeadLockChecking();
> +
> +#ifdef EXEC_BACKEND
> + /* Attach process to shared data structures */
> + if (IsUnderPostmaster)
> + AttachSharedMemoryStructs();
> +#endif
>  }
>
>  /*
> @@ -614,6 +620,12 @@ InitAuxiliaryProcess(void)
>* Arrange to clean up at process exit.
>*/
>   on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype));
> +
> +#ifdef EXEC_BACKEND
> + /* Attach process to shared data structures */
> + if (IsUnderPostmaster)
> + AttachSharedMemoryStructs();
> +#endif
>  }

Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call
InitLWLockAccess().


I think a short comment explaining why we can attach to shmem structs after
already accessing shared memory earlier in the function would be worthwhile.


> From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Thu, 30 Nov 2023 00:07:34 +0200
> Subject: [PATCH v3 7/7] Refactor postmaster child process launching
>
> - Move code related to launching backend processes to new source file,
>   launch_backend.c
>
> - Introduce new postmaster_child_launch() function that deals with the
>   differences between EXEC_BACKEND and fork mode.
>
> - Refactor the mechanism of passing information from the parent to
>   child process. Instead of using different command-line arguments when
>   launching the child process in EXEC_BACKEND mode, pass a
>   variable-length blob of data along with all the global variables. The
>   contents of that blob depend 

Re: Refactoring backend fork+exec code

2023-11-30 Thread Tristan Partin

On Wed Nov 29, 2023 at 5:36 PM CST, Heikki Linnakangas wrote:

On 11/10/2023 14:12, Heikki Linnakangas wrote:
Here's another rebased patch set. Compared to previous version, I did a 
little more refactoring around CreateSharedMemoryAndSemaphores and 
InitProcess:


- patch 1 splits CreateSharedMemoryAndSemaphores into two functions: 
CreateSharedMemoryAndSemaphores is now only called at postmaster 
startup, and a new function called AttachSharedMemoryStructs() is called 
in backends in EXEC_BACKEND mode. I extracted the common parts of those 
functions to a new static function. (Some of this refactoring used to be 
part of the 3rd patch in the series, but it seems useful on its own, so 
I split it out.)


- patch 3 moves the call to AttachSharedMemoryStructs() to 
InitProcess(), reducing the boilerplate code a little.



The patches are incrementally useful, so if you don't have time to 
review all of them, a review on a subset would be useful too.



 From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
 From: Heikki Linnakangas 
 Date: Thu, 30 Nov 2023 00:04:22 +0200
 Subject: [PATCH v3 4/7] Pass CAC as argument to backend process


For me, being new to the code, it would be nice to have more of an 
explanation as to why this is "better." I don't doubt it; it would just 
help me and future readers of this commit in the future. More of an 
explanation in the commit message would suffice.


My other comment on this commit is that we now seem to have lost the 
context on what CAC stands for. Before we had the member variable to 
explain it. A comment on the enum would be great or changing cac named 
variables to canAcceptConnections. I did notice in patch 7 that there 
are still some variables named canAcceptConnections around, so I'll 
leave this comment up to you.



  From 98f8397b32a0b36e221475b32697c9c5bbca86a0 Mon Sep 17 00:00:00 2001
  From: Heikki Linnakangas 
  Date: Wed, 11 Oct 2023 13:38:06 +0300
  Subject: [PATCH v3 5/7] Remove ConnCreate and ConnFree, and allocate Port in
   stack


I like it separate.


 From 79aab42705a8cb0e16e61c33052fc56fdd4fca76 Mon Sep 17 00:00:00 2001
 From: Heikki Linnakangas 
 Date: Wed, 11 Oct 2023 13:38:10 +0300
 Subject: [PATCH v3 6/7] Introduce ClientSocket, rename some funcs



 +static intBackendStartup(ClientSocket *port);


s/port/client_sock


 -port->remote_hostname = strdup(remote_host);
 +port->remote_hostname = pstrdup(remote_host);
 +MemoryContextSwitchTo(oldcontext);


Something funky with the whitespace here, but my eyes might also be 
playing tricks on me. Mixing spaces in tabs like what do in this 
codebase makes it difficult to review :).



 From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
 From: Heikki Linnakangas 
 Date: Thu, 30 Nov 2023 00:07:34 +0200
 Subject: [PATCH v3 7/7] Refactor postmaster child process launching



 +entry_kinds[child_type].main_fn(startup_data, 
startup_data_len);
 +Assert(false);


Seems like you want the pg_unreachable() macro here instead of 
Assert(false). Similar comment at the end of SubPostmasterMain().



 +if (fwrite(param, paramsz, 1, fp) != 1)
 +{
 +ereport(LOG,
 +(errcode_for_file_access(),
 + errmsg("could not write to file \"%s\": %m", 
tmpfilename)));
 +FreeFile(fp);
 +return -1;
 +}
 +
 +/* Release file */
 +if (FreeFile(fp))
 +{
 +ereport(LOG,
 +(errcode_for_file_access(),
 + errmsg("could not write to file \"%s\": %m", 
tmpfilename)));
 +return -1;
 +}


Two pieces of feedback here. I generally find write(2) more useful than 
fwrite(3) because write(2) will report a useful errno, whereas fwrite(2) 
just uses ferror(3). The additional errno information might be valuable 
context in the log message. Up to you if you think it is also valuable.


The log message if FreeFile() fails doesn't seem to make sense to me. 
I didn't see any file writing in that code path, but it is possible that 
I missed something.



 +/*
 + * Set reference point for stack-depth checking.  This might seem
 + * redundant in !EXEC_BACKEND builds; but it's not because the 
postmaster
 + * launches its children from signal handlers, so we might be running 
on
 + * an alternative stack. XXX still true?
 + */
 +(void) set_stack_base();


Looks like there is still this XXX left. Can't say I completely 
understand the second sentence either.



 +/*
 + * make sure stderr is in binary mode before anything can possibly be
 + * written to it, in case it's actually the syslogger pipe, so the 
pipe
 + * chunking protocol isn't disturbed. Non-logpipe data gets 
translated on
 +   

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-10-06 Thread Michael Paquier
On Fri, Oct 06, 2023 at 10:27:22AM +0300, Heikki Linnakangas wrote:
> I did a quick test with syslogger enabled before committing, but didn't
> notice the segfault. I missed it because syslogger gets restarted and then
> it worked.

Thanks, Heikki.
--
Michael


signature.asc
Description: PGP signature


Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-10-06 Thread Heikki Linnakangas

On 06/10/2023 09:50, Michael Paquier wrote:

On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:

Okay, the backtrace is not that useful.  I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.


And here you go:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
*((const uint64 *) ((const char *) pointer - sizeof(uint64)));
(gdb) bt
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
#1  0x557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
#2  0x557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at 
postmaster.c:2571
#3  0x557d03e93ac2 in SysLogger_Start () at syslogger.c:686
#4  0x557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
at postmaster.c:1148
#5  0x557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
(gdb) up 2
#2  0x557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
postmaster.c:2571
2571 pfree(ListenSockets);
(gdb) p ListenSockets $1 = (pgsocket *) 0x0


Fixed, thanks!

I did a quick test with syslogger enabled before committing, but didn't 
notice the segfault. I missed it because syslogger gets restarted and 
then it worked.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-10-06 Thread Michael Paquier
On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:
> Okay, the backtrace is not that useful.  I'll see if I can get
> something better, still it seems like this has broken the way the
> syslogger closes these ports.

And here you go:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
*((const uint64 *) ((const char *) pointer - sizeof(uint64)));
(gdb) bt
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
#1  0x557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
#2  0x557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at 
postmaster.c:2571
#3  0x557d03e93ac2 in SysLogger_Start () at syslogger.c:686
#4  0x557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
at postmaster.c:1148
#5  0x557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
(gdb) up 2
#2  0x557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
postmaster.c:2571
2571 pfree(ListenSockets);
(gdb) p ListenSockets $1 = (pgsocket *) 0x0
--
Michael


signature.asc
Description: PGP signature


Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-10-05 Thread Michael Paquier
On Thu, Oct 05, 2023 at 03:08:37PM +0300, Heikki Linnakangas wrote:
> This seems pretty uncontroversial, and I heard no objections, so I went
> ahead and committed that.

It looks like e29c4643951 is causing issues here.  While doing
benchmarking on a cluster compiled with -O2, I got a crash:
LOG:  system logger process (PID 27924) was terminated by signal 11: 
Segmentation fault 

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x55ef3b9aed20 in pfree ()
(gdb) bt
#0  0x55ef3b9aed20 in pfree ()
#1  0x55ef3b7e0e41 in ClosePostmasterPorts ()
#2  0x55ef3b7e6649 in SysLogger_Start ()
#3  0x55ef3b7e4413 in PostmasterMain () 

Okay, the backtrace is not that useful.  I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.
--
Michael


signature.asc
Description: PGP signature


Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-10-05 Thread Heikki Linnakangas

On 29/08/2023 09:58, Heikki Linnakangas wrote:

On 29/08/2023 09:21, Heikki Linnakangas wrote:

Thinking about this some more, the ListenSockets array is a bit silly
anyway. We fill the array starting from index 0, always append to the
end, and never remove entries from it. It would seem more
straightforward to keep track of the used size of the array. Currently
we always loop through the unused parts too, and e.g.
ConfigurePostmasterWaitSet() needs to walk the array to count how many
elements are in use.


Like this.


This seems pretty uncontroversial, and I heard no objections, so I went 
ahead and committed that.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-29 Thread Heikki Linnakangas

On 29/08/2023 09:21, Heikki Linnakangas wrote:

Thinking about this some more, the ListenSockets array is a bit silly
anyway. We fill the array starting from index 0, always append to the
end, and never remove entries from it. It would seem more
straightforward to keep track of the used size of the array. Currently
we always loop through the unused parts too, and e.g.
ConfigurePostmasterWaitSet() needs to walk the array to count how many
elements are in use.


Like this.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 796280f07dd5dbf50897c9895715ab5e2dbf187b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 29 Aug 2023 09:53:08 +0300
Subject: [PATCH 1/1] Refactor ListenSocket array.

Keep track of the used size of the array. That avoids looping through
the whole array in a few places. It doesn't matter from a performance
point of view since the array is small anyway, but this feels less
surprising and is a little less code. Now that we have an explicit
NumListenSockets variable that is statically initialized to 0, we
don't need the loop to initialize the array.

Allocate the array in PostmasterContext. The array isn't needed in
child processes, so this allows reusing that memory. We could easily
make the array resizable now, but we haven't heard any complaints
about the current 64 sockets limit.
---
 src/backend/libpq/pqcomm.c  | 18 +++
 src/backend/postmaster/postmaster.c | 76 +++--
 src/include/libpq/libpq.h   |  2 +-
 3 files changed, 36 insertions(+), 60 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8d217b0645..48ae7704fb 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -311,8 +311,9 @@ socket_close(int code, Datum arg)
  * specified.  For TCP ports, hostName is either NULL for all interfaces or
  * the interface to listen on, and unixSocketDir is ignored (can be NULL).
  *
- * Successfully opened sockets are added to the ListenSocket[] array (of
- * length MaxListen), at the first position that isn't PGINVALID_SOCKET.
+ * Successfully opened sockets are appended to the ListenSockets[] array.  The
+ * current size of the array is *NumListenSockets, it is updated to reflect
+ * the opened sockets.  MaxListen is the allocated size of the array.
  *
  * RETURNS: STATUS_OK or STATUS_ERROR
  */
@@ -320,7 +321,7 @@ socket_close(int code, Datum arg)
 int
 StreamServerPort(int family, const char *hostName, unsigned short portNumber,
  const char *unixSocketDir,
- pgsocket ListenSocket[], int MaxListen)
+ pgsocket ListenSockets[], int *NumListenSockets, int MaxListen)
 {
 	pgsocket	fd;
 	int			err;
@@ -335,7 +336,6 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 	struct addrinfo *addrs = NULL,
 			   *addr;
 	struct addrinfo hint;
-	int			listen_index = 0;
 	int			added = 0;
 	char		unixSocketPath[MAXPGPATH];
 #if !defined(WIN32) || defined(IPV6_V6ONLY)
@@ -401,12 +401,7 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 		}
 
 		/* See if there is still room to add 1 more socket. */
-		for (; listen_index < MaxListen; listen_index++)
-		{
-			if (ListenSocket[listen_index] == PGINVALID_SOCKET)
-break;
-		}
-		if (listen_index >= MaxListen)
+		if (*NumListenSockets == MaxListen)
 		{
 			ereport(LOG,
 	(errmsg("could not bind to all requested addresses: MAXLISTEN (%d) exceeded",
@@ -573,7 +568,8 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 	(errmsg("listening on %s address \"%s\", port %d",
 			familyDesc, addrDesc, (int) portNumber)));
 
-		ListenSocket[listen_index] = fd;
+		ListenSockets[*NumListenSockets] = fd;
+		(*NumListenSockets)++;
 		added++;
 	}
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d7bfb28ff3..2659329b82 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -226,7 +226,8 @@ int			ReservedConnections;
 
 /* The socket(s) we're listening to. */
 #define MAXLISTEN	64
-static pgsocket ListenSocket[MAXLISTEN];
+static int NumListenSockets = 0;
+static pgsocket *ListenSockets = NULL;
 
 /* still more option variables */
 bool		EnableSSL = false;
@@ -587,7 +588,6 @@ PostmasterMain(int argc, char *argv[])
 	int			status;
 	char	   *userDoption = NULL;
 	bool		listen_addr_saved = false;
-	int			i;
 	char	   *output_config_variable = NULL;
 
 	InitProcessGlobals();
@@ -1141,17 +1141,6 @@ PostmasterMain(int argc, char *argv[])
  errmsg("could not remove file \"%s\": %m",
 		LOG_METAINFO_DATAFILE)));
 
-	/*
-	 * Initialize input sockets.
-	 *
-	 * Mark them all closed, and set up an on_proc_exit function that's
-	 * charged with closing the sockets again at postmaster shutdown.
-	 */
-	for (i = 0; i < MAXLISTEN; i++)
-		ListenSocket[i] = PGINVALID_SOCKET;
-
-	on_proc_exit(CloseServerPorts, 0);
-
 	/*
 	 * If enabled, start up syslogger collection 

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-29 Thread Heikki Linnakangas

On 29/08/2023 01:28, Michael Paquier wrote:


In case you've not noticed:
https://www.postgresql.org/message-id/zovvuqe0rdj2s...@paquier.xyz
But it does not really matter now ;)


Ah sorry, missed that thread.


Yes, so it seems. Syslogger is started before the ListenSockets array is
initialized, so its still all zeros. When syslogger is forked, the child
process tries to close all the listen sockets, which are all zeros. So
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
array initialization earlier.


Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?


Ok, pushed that way.

I checked the history of this: it goes back to commit 9a86f03b4e in 
version 13. The SysLogger_Start() call used to be later, after setting p 
ListenSockets, but that commit moved it. So I backpatched this to v13.



The first close(0) actually does have an effect: it closes stdin, which is
fd 0. That is surely accidental, but I wonder if we should indeed close
stdin in child processes? The postmaster process doesn't do anything with
stdin either, although I guess a library might try to read a passphrase from
stdin before starting up, for example.


We would have heard about that, wouldn't we?  I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.


Yes, syslogger is the only process that closes stdin. After moving the 
initialization, it doesn't close it either.


Thinking about this some more, the ListenSockets array is a bit silly 
anyway. We fill the array starting from index 0, always append to the 
end, and never remove entries from it. It would seem more 
straightforward to keep track of the used size of the array. Currently 
we always loop through the unused parts too, and e.g. 
ConfigurePostmasterWaitSet() needs to walk the array to count how many 
elements are in use.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Michael Paquier
On Mon, Aug 28, 2023 at 11:52:15PM +0300, Heikki Linnakangas wrote:
> On 28/08/2023 18:55, Jeff Janes wrote:
>> Since this commit, I'm getting a lot (63 per restart) of messages:
>> 
>>   LOG:  could not close client or listen socket: Bad file descriptor
>> All I have to do to get the message is turn logging_collector = on and
>> restart.
>> 
>> The close failure condition existed before the commit, it just wasn't
>> logged before.  So, did the extra logging added here just uncover a
>> pre-existing bug?

In case you've not noticed:
https://www.postgresql.org/message-id/zovvuqe0rdj2s...@paquier.xyz
But it does not really matter now ;)

> Yes, so it seems. Syslogger is started before the ListenSockets array is
> initialized, so its still all zeros. When syslogger is forked, the child
> process tries to close all the listen sockets, which are all zeros. So
> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
> array initialization earlier.

Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?

> The first close(0) actually does have an effect: it closes stdin, which is
> fd 0. That is surely accidental, but I wonder if we should indeed close
> stdin in child processes? The postmaster process doesn't do anything with
> stdin either, although I guess a library might try to read a passphrase from
> stdin before starting up, for example.

We would have heard about that, wouldn't we?  I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.
--
Michael


signature.asc
Description: PGP signature


Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Heikki Linnakangas

On 28/08/2023 18:55, Jeff Janes wrote:

Since this commit, I'm getting a lot (63 per restart) of messages:

  LOG:  could not close client or listen socket: Bad file descriptor
All I have to do to get the message is turn logging_collector = on and 
restart.


The close failure condition existed before the commit, it just wasn't 
logged before.  So, did the extra logging added here just uncover a  
pre-existing bug?


Yes, so it seems. Syslogger is started before the ListenSockets array is 
initialized, so its still all zeros. When syslogger is forked, the child 
process tries to close all the listen sockets, which are all zeros. So 
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the 
array initialization earlier.


The first close(0) actually does have an effect: it closes stdin, which 
is fd 0. That is surely accidental, but I wonder if we should indeed 
close stdin in child processes? The postmaster process doesn't do 
anything with stdin either, although I guess a library might try to read 
a passphrase from stdin before starting up, for example.


--
Heikki Linnakangas
Neon (https://neon.tech)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 41bccb46a87..6b9e10bffa0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -596,6 +596,9 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+	for (i = 0; i < MAXLISTEN; i++)
+		ListenSocket[i] = PGINVALID_SOCKET;
+
 	/*
 	 * Start our win32 signal implementation
 	 */
@@ -1176,12 +1179,9 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Establish input sockets.
 	 *
-	 * First, mark them all closed, and set up an on_proc_exit function that's
-	 * charged with closing the sockets again at postmaster shutdown.
+	 * Set up an on_proc_exit function that's charged with closing the sockets
+	 * again at postmaster shutdown.
 	 */
-	for (i = 0; i < MAXLISTEN; i++)
-		ListenSocket[i] = PGINVALID_SOCKET;
-
 	on_proc_exit(CloseServerPorts, 0);
 
 	if (ListenAddresses)


Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Jeff Janes
On Thu, Aug 24, 2023 at 10:05 AM Heikki Linnakangas  wrote:

> On 24/08/2023 15:48, Thomas Munro wrote:
> > LGTM.  I vaguely recall thinking that it might be better to keep
> > EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> > didn't try this one, but it looks fine with the comment to explain, as
> > you have it.  (It's a shame we can't use O_CLOFORK.)
>
> Yeah, O_CLOFORK would be nice..
>
> Committed, thanks!
>
>
Since this commit, I'm getting a lot (63 per restart) of messages:

 LOG:  could not close client or listen socket: Bad file descriptor

All I have to do to get the message is turn logging_collector = on and
restart.

The close failure condition existed before the commit, it just wasn't
logged before.  So, did the extra logging added here just uncover a
pre-existing bug?

The LOG message is sent to the terminal, not to the log file.

Cheers,

Jeff


Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-24 Thread Heikki Linnakangas

On 24/08/2023 15:48, Thomas Munro wrote:

LGTM.  I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it.  (It's a shame we can't use O_CLOFORK.)


Yeah, O_CLOFORK would be nice..

Committed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-24 Thread Thomas Munro
On Thu, Aug 24, 2023 at 11:41 PM Heikki Linnakangas  wrote:
> On 11/07/2023 01:50, Andres Freund wrote:
> >> From: Heikki Linnakangas 
> >> Date: Mon, 12 Jun 2023 16:33:20 +0300
> >> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
> >>
> >> We went through some effort to close them in the child process. Better to
> >> not hand them down to the child process in the first place.
> >
> > I think Thomas has a larger version of this patch:
> > https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com
>
> Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option
> to the *accepted* socket in commit 1da569ca1f. That was part of that
> thread. This patch adds the option to the *listen* sockets. That was not
> discussed in that thread, but it's certainly in the same vein.
>
> Thomas: What do you think of the attached?

LGTM.  I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it.  (It's a shame we can't use O_CLOFORK.)

There was some question in the other thread about whether doing that
to the server socket might affect accepted sockets too on some OS, but
I can at least confirm that your patch works fine on FreeBSD in an
EXEC_BACKEND build.  I think there were some historical disagreements
about which socket properties were inherited, but not that.




Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-24 Thread Heikki Linnakangas

Focusing on this one patch in this series:

On 11/07/2023 01:50, Andres Freund wrote:

From: Heikki Linnakangas 
Date: Mon, 12 Jun 2023 16:33:20 +0300
Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

We went through some effort to close them in the child process. Better to
not hand them down to the child process in the first place.


I think Thomas has a larger version of this patch:
https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com


Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option 
to the *accepted* socket in commit 1da569ca1f. That was part of that 
thread. This patch adds the option to the *listen* sockets. That was not 
discussed in that thread, but it's certainly in the same vein.


Thomas: What do you think of the attached?

On 11/07/2023 00:07, Tristan Partin wrote:

@@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
 void
 StreamClose(pgsocket sock)
 {
-   closesocket(sock);
+   if (closesocket(sock) != 0)
+   elog(LOG, "closesocket failed: %m");
 }

 /*


Do you think WARNING would be a more appropriate log level?


No, WARNING is for messages that you expect the client to receive. This 
failure is unexpected at the system level, the message is for the 
administrator. The distinction isn't always very clear, but LOG seems 
more appropriate in this case.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 95bd11720c8e88a1f65b9d600f50d71662241ba9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 24 Aug 2023 13:53:33 +0300
Subject: [PATCH v2 1/1] Use FD_CLOEXEC on ListenSockets

It's good hygiene if e.g. an extension launches a subprogram when
being loaded. And we went through some effort to close them in the
child process in EXEC_BACKEND mode; better to not hand them down to
the child process in the first place. We still need to close them
after fork when !EXEC_BACKEND, but it's a little simpler.

In the passing, LOG a message if closing the client connection or
listen socket fails. Shouldn't happen, but if it does, would be nice
to know.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce...@iki.fi
---
 src/backend/libpq/pqcomm.c  |  6 +-
 src/backend/postmaster/postmaster.c | 14 ++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 1bdcbe4f636..8d217b06455 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -458,6 +458,9 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 		}
 
 #ifndef WIN32
+		/* Don't give the listen socket to any subprograms we execute. */
+		if (fcntl(fd, F_SETFD, FD_CLOEXEC) < 0)
+			elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
 
 		/*
 		 * Without the SO_REUSEADDR flag, a new postmaster can't be started
@@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
 void
 StreamClose(pgsocket sock)
 {
-	closesocket(sock);
+	if (closesocket(sock) != 0)
+		elog(LOG, "could not close client or listen socket: %m");
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 07d376d77ec..41bccb46a87 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -496,7 +496,6 @@ typedef struct
 	Port		port;
 	InheritableSocket portsocket;
 	char		DataDir[MAXPGPATH];
-	pgsocket	ListenSocket[MAXLISTEN];
 	int32		MyCancelKey;
 	int			MyPMChildSlot;
 #ifndef WIN32
@@ -2545,8 +2544,6 @@ ConnFree(Port *port)
 void
 ClosePostmasterPorts(bool am_syslogger)
 {
-	int			i;
-
 	/* Release resources held by the postmaster's WaitEventSet. */
 	if (pm_wait_set)
 	{
@@ -2573,8 +2570,12 @@ ClosePostmasterPorts(bool am_syslogger)
 	/*
 	 * Close the postmaster's listen sockets.  These aren't tracked by fd.c,
 	 * so we don't call ReleaseExternalFD() here.
+	 *
+	 * The listen sockets are marked as FD_CLOEXEC, so this isn't needed in
+	 * EXEC_BACKEND mode.
 	 */
-	for (i = 0; i < MAXLISTEN; i++)
+#ifndef EXEC_BACKEND
+	for (int i = 0; i < MAXLISTEN; i++)
 	{
 		if (ListenSocket[i] != PGINVALID_SOCKET)
 		{
@@ -2582,6 +2583,7 @@ ClosePostmasterPorts(bool am_syslogger)
 			ListenSocket[i] = PGINVALID_SOCKET;
 		}
 	}
+#endif
 
 	/*
 	 * If using syslogger, close the read side of the pipe.  We don't bother
@@ -6038,8 +6040,6 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	strlcpy(param->DataDir, DataDir, MAXPGPATH);
 
-	memcpy(>ListenSocket, , sizeof(ListenSocket));
-
 	param->MyCancelKey = MyCancelKey;
 	param->MyPMChildSlot = MyPMChildSlot;
 
@@ -6271,8 +6271,6 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	SetDataDir(param->DataDir);
 
-	memcpy(, >ListenSocket, sizeof(ListenSocket));
-
 	MyCancelKey = param->MyCancelKey;
 	MyPMChildSlot = param->MyPMChildSlot;
 
-- 
2.39.2



Re: Refactoring backend fork+exec code

2023-07-10 Thread Andres Freund
Hi,

On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote:
> I started to look at the code in postmaster.c related to launching child
> processes. I tried to reduce the difference between EXEC_BACKEND and
> !EXEC_BACKEND code paths, and put the code that needs to differ behind a
> better abstraction. I started doing this to help with implementing
> multi-threading, but it doesn't introduce anything thread-related yet and I
> think this improves readability anyway.

Yes please!  This code is absolutely awful.


> From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Sun, 18 Jun 2023 11:00:21 +0300
> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.
> 
> Moves InitProcess calls a little later in EXEC_BACKEND case.

What's the reason for this part? ISTM that we'd really want to get away from
plastering duplicated InitProcess() etc everywhere.

I think this might be easier to understand if you just changed did the
CreateSharedMemoryAndSemaphores() -> AttachSharedMemoryAndSemaphores() piece
in this commit, and the rest later.


> +void
> +AttachSharedMemoryAndSemaphores(void)
> +{
> + /* InitProcess must've been called already */

Perhaps worth an assertion to make it easier to see that the order is wrong?


> From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Mon, 12 Jun 2023 16:33:20 +0300
> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
> 
> We went through some effort to close them in the child process. Better to
> not hand them down to the child process in the first place.

I think Thomas has a larger version of this patch:
https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com



> From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Sun, 18 Jun 2023 13:56:44 +0300
> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs
> 
> - Move more of the work on a client socket to the child process.
> 
> - Reduce the amount of data that needs to be passed from postmaster to
>   child. (Used to pass a full Port struct, although most of the fields were
>   empty. Now we pass the much slimmer ClientSocket.)

I think there might be extensions accessing Port. Not sure if it's worth
worrying about, but ...


> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -476,8 +476,8 @@ AutoVacLauncherMain(int argc, char *argv[])
>   pqsignal(SIGCHLD, SIG_DFL);
>  
>   /*
> -  * Create a per-backend PGPROC struct in shared memory. We must do
> -  * this before we can use LWLocks.
> +  * Create a per-backend PGPROC struct in shared memory. We must do this
> +  * before we can use LWLocks.
>*/
>   InitProcess();
>

Don't think this was intentional?


> From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Sun, 18 Jun 2023 13:59:48 +0300
> Subject: [PATCH 9/9] Refactor postmaster child process launching
> 
> - Move code related to launching backend processes to new source file,
>   process_start.c

I think you might have renamed this to launch_backend.c?


> - Introduce new postmaster_child_launch() function that deals with the
>   differences between EXEC_BACKEND and fork mode.
> 
> - Refactor the mechanism of passing informaton from the parent to
>   child process. Instead of using different command-line arguments
>   when launching the child process in EXEC_BACKEND mode, pass a
>   variable-length blob of data along with all the global
>   variables. The contents of that blob depends on the kind of child
>   process being launched. In !EXEC_BACKEND mode, we use the same blob,
>   but it's simply inherited from the parent to child process.


> +constPMChildEntry entry_kinds[] = {
> + {"backend", BackendMain, true},
> +
> + {"autovacuum launcher", AutoVacLauncherMain, true},
> + {"autovacuum worker", AutoVacWorkerMain, true},
> + {"bgworker", BackgroundWorkerMain, true},
> + {"syslogger", SysLoggerMain, false},
> +
> + {"startup", StartupProcessMain, true},
> + {"bgwriter", BackgroundWriterMain, true},
> + {"archiver", PgArchiverMain, true},
> + {"checkpointer", CheckpointerMain, true},
> + {"wal_writer", WalWriterMain, true},
> + {"wal_receiver", WalReceiverMain, true},
> +};

I'd assign them with the PostmasterChildType as index, so there's no danger of
getting out of order.

constPMChildEntry entry_kinds = {
  [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
  ...
}

or such should work.


I'd also use designated initializers for the fields, it's otherwise hard to
know what true means etc.

I think it might be good to put more into array. If we e.g. knew whether a
particular child type is a backend-like, and aux process or syslogger, we
could avoid the duplicated 

Re: Refactoring backend fork+exec code

2023-07-10 Thread Tristan Partin
> From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Mon, 12 Jun 2023 16:33:20 +0300
> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

> @@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
>  void
>  StreamClose(pgsocket sock)
>  {
> -   closesocket(sock);
> +   if (closesocket(sock) != 0)
> +   elog(LOG, "closesocket failed: %m");
>  }
> 
>  /*

Do you think WARNING would be a more appropriate log level?

> From 2f518be9e96cfed1a1a49b4af8f7cb4a837aa784 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Mon, 12 Jun 2023 18:07:54 +0300
> Subject: [PATCH 5/9] Move "too many clients already" et al. checks from
>  ProcessStartupPacket.

This seems like a change you could push already (assuming another
maintainer agrees with you), which makes reviews for this patchset even
easier.

> From c25b67c045018a2bf05e6ff53819d26e561fc83f Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Sun, 18 Jun 2023 14:11:16 +0300
> Subject: [PATCH 6/9] Pass CAC as argument to backend process.

Could you expand a bit more on why it is better to pass it as a separate
argument? Does it not fit well conceptually in struct Port?

> @@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[])
>   * returns the pid of the fork/exec'd process, or -1 on failure
>   */
>  static pid_t
> -backend_forkexec(Port *port)
> +backend_forkexec(Port *port, CAC_state cac)
>  {
> -   char   *av[4];
> +   char   *av[5];
> int ac = 0;
> +   charcacbuf[10];
> 
> av[ac++] = "postgres";
> av[ac++] = "--forkbackend";
> av[ac++] = NULL;/* filled in by 
> internal_forkexec */
> 
> +   snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac);
> +   av[ac++] = cacbuf;

Might be worth a sanity check that there wasn't any truncation into
cacbuf, which is an impossibility as the code is written, but still
useful for catching a future developer error.

Is it worth adding a command line option at all instead of having the
naked positional argument? It would help anybody who might read the
command line what the seemingly random integer stands for.

> @@ -4910,7 +4926,10 @@ SubPostmasterMain(int argc, char *argv[])
> /* Run backend or appropriate child */
> if (strcmp(argv[1], "--forkbackend") == 0)
> {
> -   Assert(argc == 3);  /* shouldn't be any more args 
> */
> +   CAC_state   cac;
> +
> +   Assert(argc == 4);
> +   cac = (CAC_state) atoi(argv[3]);

Perhaps an assert or full error checking that atoi succeeds would be
useful for similar reasons to my previous comment.

> From 658cba5cdb2e5c45faff84566906d2fcaa8a3674 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Mon, 12 Jun 2023 18:03:03 +0300
> Subject: [PATCH 7/9] Remove ConnCreate and ConnFree, and allocate Port in
>  stack.

Again, seems like another patch that could be pushed separately assuming
others don't have any comments.

> From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Sun, 18 Jun 2023 13:56:44 +0300
> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs

> @@ -1499,7 +1499,7 @@ CloseServerPorts(int status, Datum arg)
> {
> if (ListenSocket[i] != PGINVALID_SOCKET)
> {
> -   StreamClose(ListenSocket[i]);
> +   closesocket(ListenSocket[i]);
> ListenSocket[i] = PGINVALID_SOCKET;
> }
> }

I see you have been adding log messages in the case of closesocket()
failing. Do you think it is worth doing here as well?

One strange part about this patch is that in patch 4, you edit
StreamClose() to emit a log message in the case of closesocket()
failure, but then this patch just completely removes it.

> @@ -4407,11 +4420,11 @@ BackendInitialize(Port *port, CAC_state cac)
>   * Doesn't return at all.
>   */
>  static void
> -BackendRun(Port *port)
> +BackendRun(void)
>  {
> /*
> -* Create a per-backend PGPROC struct in shared memory. We must do
> -* this before we can use LWLocks (in 
> AttachSharedMemoryAndSemaphores).
> +* Create a per-backend PGPROC struct in shared memory. We must do 
> this
> +* before we can use LWLocks (in AttachSharedMemoryAndSemaphores).
>  */
> InitProcess();

This comment reflow probably fits better in the patch that added the
AttachSharedMemoryAndSemaphores function.

> From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Sun, 18 Jun 2023 13:59:48 +0300
> Subject: [PATCH 9/9] Refactor postmaster child process launching

> - Move code related to launching backend processes to new source file,
>   process_start.c

Since this seems pretty