subject:"Re\: O\(n\) tasks cause lengthy startups and checkpoints"

Re: O(n) tasks cause lengthy startups and checkpoints

2023-07-04 Thread Nathan Bossart

On Tue, Jul 04, 2023 at 09:30:43AM +0200, Daniel Gustafsson wrote:
>> On 4 Apr 2023, at 05:36, Nathan Bossart  wrote:
>> 
>> I sent this one to the next commitfest and marked it as waiting-on-author
>> and targeted for v17.  I'm aiming to have something that addresses the
>> latest feedback ready for the July commitfest.
> 
> Have you had a chance to look at this such that there is something ready?

Not yet, sorry.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-07-04 Thread Daniel Gustafsson

> On 4 Apr 2023, at 05:36, Nathan Bossart  wrote:
> 
> I sent this one to the next commitfest and marked it as waiting-on-author
> and targeted for v17.  I'm aiming to have something that addresses the
> latest feedback ready for the July commitfest.

Have you had a chance to look at this such that there is something ready?

--
Daniel Gustafsson

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-03 Thread Nathan Bossart

I sent this one to the next commitfest and marked it as waiting-on-author
and targeted for v17.  I'm aiming to have something that addresses the
latest feedback ready for the July commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart

On Sun, Apr 02, 2023 at 04:37:38PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> It's been a little while since I dug into this, but I do see your point
>> that the wraparound risk could be higher in some cases.  For example, if
>> you have a billion temp files to clean up, the custodian could be stuck on
>> that task for a long time.  I will give this some further thought.  I'm all
>> ears if anyone has ideas about how to reduce this risk.
> 
> I wonder if a single long-lived custodian task is the right model at all.
> At least for RemovePgTempFiles, it'd make more sense to write it as a
> background worker that spawns, does its work, and then exits,
> independently of anything else.  Of course, then you need some mechanism
> for ensuring that a bgworker slot is available when needed, but that
> doesn't seem horridly difficult --- we could have a few "reserved
> bgworker" slots, perhaps.  An idle bgworker slot doesn't cost much.

This has crossed my mind.  Even if we use the custodian for several
different tasks, perhaps it could shut down while not in use.  For many
servers, the custodian process will be used sparingly, if at all.  And if
we introduce something like custodian_max_workers, perhaps we could dodge
the wraparound issue a bit by setting the default to the number of
supported tasks.  That being said, this approach adds some complexity.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart

On Sun, Apr 02, 2023 at 04:23:05PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote:
>>> * Why does LookupCustodianFunctions think it needs to search the
>>> constant array?
> 
>> The order of the tasks in the array isn't guaranteed to match the order in
>> the CustodianTask enum.
> 
> Why not?  It's a constant array, we can surely manage to make its
> order match the enum.

Alright.  I'll change this.

>>> * The original proposal included moving RemovePgTempFiles into this
>>> mechanism, which I thought was probably the most useful bit of the
>>> whole thing.  I'm sad to see that gone, what became of it?
> 
>> I postponed that based on advice from upthread [1].  I was hoping to start
>> a dedicated thread for that immediately after the custodian infrastructure
>> was committed.  FWIW I agree that it's the most useful task of what's
>> proposed thus far.
> 
> Hmm, given Andres' objections there's little point in moving forward
> without that task.

Yeah.  I should probably tackle that one first and leave the logical tasks
for later, given there is some prerequisite work required.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane

Nathan Bossart  writes:
> It's been a little while since I dug into this, but I do see your point
> that the wraparound risk could be higher in some cases.  For example, if
> you have a billion temp files to clean up, the custodian could be stuck on
> that task for a long time.  I will give this some further thought.  I'm all
> ears if anyone has ideas about how to reduce this risk.

I wonder if a single long-lived custodian task is the right model at all.
At least for RemovePgTempFiles, it'd make more sense to write it as a
background worker that spawns, does its work, and then exits,
independently of anything else.  Of course, then you need some mechanism
for ensuring that a bgworker slot is available when needed, but that
doesn't seem horridly difficult --- we could have a few "reserved
bgworker" slots, perhaps.  An idle bgworker slot doesn't cost much.

regards, tom lane

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane

Nathan Bossart  writes:
> On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote:
>> * Why does LookupCustodianFunctions think it needs to search the
>> constant array?

> The order of the tasks in the array isn't guaranteed to match the order in
> the CustodianTask enum.

Why not?  It's a constant array, we can surely manage to make its
order match the enum.

>> * The original proposal included moving RemovePgTempFiles into this
>> mechanism, which I thought was probably the most useful bit of the
>> whole thing.  I'm sad to see that gone, what became of it?

> I postponed that based on advice from upthread [1].  I was hoping to start
> a dedicated thread for that immediately after the custodian infrastructure
> was committed.  FWIW I agree that it's the most useful task of what's
> proposed thus far.

Hmm, given Andres' objections there's little point in moving forward
without that task.

regards, tom lane

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart

On Sun, Apr 02, 2023 at 11:42:26AM -0700, Andres Freund wrote:
> Just want to note that I've repeatedly objected to 0002 and 0003, i.e. moving
> serialized logical decoding snapshots and mapping files, to custodian, and
> still do. Without further work it increases wraparound risks (the filenames
> contain xids), and afaict nothing has been done to ameliorate that.

>From your feedback earlier [0], I was under the (perhaps false) impression
that adding a note about this existing issue in the commit message was
sufficient, at least initially.  I did add such a note in 0003, but it's
missing from 0002 for some reason.  I suspect I left it out because the
serialized snapshot file names do not contain XIDs.  You cleared that up
earlier [1], so this is my bad.

It's been a little while since I dug into this, but I do see your point
that the wraparound risk could be higher in some cases.  For example, if
you have a billion temp files to clean up, the custodian could be stuck on
that task for a long time.  I will give this some further thought.  I'm all
ears if anyone has ideas about how to reduce this risk.

[0] https://postgr.es/m/20220702225456.zit5kjdtdfqmjujt%40alap3.anarazel.de
[1] https://postgr.es/m/20220217065938.x2esfdppzypegn5j%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart

On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote:
> I took a brief look through v20, and generally liked what I saw,
> but there are a few things troubling me:

Thanks for taking a look.

> * The comments for CustodianEnqueueTask claim that it won't enqueue an
> already-queued task, but I don't think I believe that, because it stops
> scanning as soon as it finds an empty slot.  That data structure seems
> quite oddly designed in any case.  Why isn't it simply an array of
> need-to-run-this-one booleans indexed by the CustodianTask enum?
> Fairness of dispatch could be ensured by the same state variable that
> CustodianGetNextTask already uses to track which array element to
> inspect next.  While that wouldn't guarantee that tasks A and B are
> dispatched in the same order they were requested in, I'm not sure why
> we should care.

That works.  Will update.

> * I don't much like cust_lck, mainly because you didn't bother to
> document what it protects (in general, CustodianShmemStruct deserves
> more than zero commentary).  Do we need it at all?  If the task-needed
> flags were sig_atomic_t not bool, we probably don't need it for the
> basic job of tracking which tasks remain to be run.  I see that some
> of the tasks have possibly-non-atomically-assigned parameters to be
> transmitted, but restricting cust_lck to protect those seems like a
> better idea.

Will do.

> * Not quite convinced about handle_arg_func, mainly because the Datum
> API would be pretty inconvenient for any task with more than one arg.
> Why do we need that at all, rather than saying that callers should
> set up any required parameters separately before invoking
> RequestCustodian?

I had done it this way earlier, but added the Datum argument based on
feedback upthread [0].  It presently has only one proposed use, anyway, so
I think it would be fine to switch it back for now.

> * Why does LookupCustodianFunctions think it needs to search the
> constant array?

The order of the tasks in the array isn't guaranteed to match the order in
the CustodianTask enum.

> * The original proposal included moving RemovePgTempFiles into this
> mechanism, which I thought was probably the most useful bit of the
> whole thing.  I'm sad to see that gone, what became of it?

I postponed that based on advice from upthread [1].  I was hoping to start
a dedicated thread for that immediately after the custodian infrastructure
was committed.  FWIW I agree that it's the most useful task of what's
proposed thus far.

[0] https://postgr.es/m/20220703172732.wembjsb55xl63vuw%40awork3.anarazel.de
[1] 
https://postgr.es/m/CANbhV-EagKLoUH7tLEfg__VcLu37LY78F8gvLMzHrRZyZKm6sw%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Andres Freund

Hi,

On 2023-04-02 13:40:05 -0400, Tom Lane wrote:
> Nathan Bossart  writes:
> > another rebase for cfbot
> 
> I took a brief look through v20, and generally liked what I saw,
> but there are a few things troubling me:

Just want to note that I've repeatedly objected to 0002 and 0003, i.e. moving
serialized logical decoding snapshots and mapping files, to custodian, and
still do. Without further work it increases wraparound risks (the filenames
contain xids), and afaict nothing has been done to ameliorate that.

Without those, the current patch series does not have any tasks:

> * The original proposal included moving RemovePgTempFiles into this
> mechanism, which I thought was probably the most useful bit of the
> whole thing.  I'm sad to see that gone, what became of it?

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane

Nathan Bossart  writes:
> another rebase for cfbot

I took a brief look through v20, and generally liked what I saw,
but there are a few things troubling me:

* The comments for CustodianEnqueueTask claim that it won't enqueue an
already-queued task, but I don't think I believe that, because it stops
scanning as soon as it finds an empty slot.  That data structure seems
quite oddly designed in any case.  Why isn't it simply an array of
need-to-run-this-one booleans indexed by the CustodianTask enum?
Fairness of dispatch could be ensured by the same state variable that
CustodianGetNextTask already uses to track which array element to
inspect next.  While that wouldn't guarantee that tasks A and B are
dispatched in the same order they were requested in, I'm not sure why
we should care.

* I don't much like cust_lck, mainly because you didn't bother to
document what it protects (in general, CustodianShmemStruct deserves
more than zero commentary).  Do we need it at all?  If the task-needed
flags were sig_atomic_t not bool, we probably don't need it for the
basic job of tracking which tasks remain to be run.  I see that some
of the tasks have possibly-non-atomically-assigned parameters to be
transmitted, but restricting cust_lck to protect those seems like a
better idea.

* Not quite convinced about handle_arg_func, mainly because the Datum
API would be pretty inconvenient for any task with more than one arg.
Why do we need that at all, rather than saying that callers should
set up any required parameters separately before invoking
RequestCustodian?

* Why does LookupCustodianFunctions think it needs to search the
constant array?

* The original proposal included moving RemovePgTempFiles into this
mechanism, which I thought was probably the most useful bit of the
whole thing.  I'm sad to see that gone, what became of it?

regards, tom lane

Re: O(n) tasks cause lengthy startups and checkpoints

2023-02-17 Thread Nathan Bossart

On Thu, Feb 02, 2023 at 09:48:08PM -0800, Nathan Bossart wrote:
> rebased for cfbot

another rebase for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 1c9b95cae7adcc57b7544a44ff16a26e71c6c736 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v20 1/4] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 doc/src/sgml/glossary.sgml  |  11 +
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 377 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/pgstat_io.c  |   4 +-
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 15 files changed, 491 insertions(+), 6 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 7c01a541fe..ad3f53e2a3 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -144,6 +144,7 @@
  (but not the autovacuum workers),
  the background writer,
  the checkpointer,
+ the custodian,
  the logger,
  the startup process,
  the WAL archiver,
@@ -484,6 +485,16 @@

   
 
+  
+   Custodian (process)
+   
+
+ An auxiliary process
+ that is responsible for executing assorted cleanup tasks.
+
+   
+  
+
   
Data area

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 047448b34e..5f4dde85cf 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index cae6feb356..a1f042f13a 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..98bb9efcfd
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,377 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(void);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunct

Re: O(n) tasks cause lengthy startups and checkpoints

2023-02-02 Thread Nathan Bossart

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From abbd26a3bcfcc828e196187e9f6abf6af64f3393 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v19 1/4] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 doc/src/sgml/glossary.sgml  |  11 +
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 377 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 14 files changed, 488 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 7c01a541fe..ad3f53e2a3 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -144,6 +144,7 @@
  (but not the autovacuum workers),
  the background writer,
  the checkpointer,
+ the custodian,
  the logger,
  the startup process,
  the WAL archiver,
@@ -484,6 +485,16 @@

   
 
+  
+   Custodian (process)
+   
+
+ An auxiliary process
+ that is responsible for executing assorted cleanup tasks.
+
+   
+  
+
   
Data area

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index cae6feb356..a1f042f13a 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..98bb9efcfd
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,377 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(void);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queu

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-05 Thread Bharath Rupireddy

On Sat, Dec 3, 2022 at 12:45 AM Nathan Bossart  wrote:
>
> On Fri, Dec 02, 2022 at 12:11:35PM +0530, Bharath Rupireddy wrote:
> > On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart  
> > wrote:
> >> The test appears to reliably create snapshot and mapping files, so if the
> >> directories are empty at some point after the checkpoint at the end, we can
> >> be reasonably certain the custodian took action.  I didn't add explicit
> >> checks that there are files in the directories before the checkpoint
> >> because a concurrent checkpoint could make such checks unreliable.
> >
> > I think you're right. I added sqls to see if the snapshot and mapping
> > files count > 0, see [1] and the cirrus-ci members are happy too -
> > https://github.com/BRupireddy/postgres/tree/custodian_review_2. I
> > think we can consider adding these count > 0 checks to tests.
>
> My worry about adding "count > 0" checks is that a concurrent checkpoint
> could make them unreliable.  In other words, those checks might ordinarily
> work, but if an automatic checkpoint causes the files be cleaned up just
> beforehand, they will fail.

Hm. It would have been better with a TAP test module for testing the
custodian code reliably. Anyway, that mustn't stop the patch getting
in. If required, we can park the TAP test module for later - IMO.
Others may have different thoughts here.

> > Having said above, I'm okay to process ShutdownRequestPending as early
> > as possible, however, should we also add CHECK_FOR_INTERRUPTS()
> > alongside ShutdownRequestPending?
>
> I'm not seeing a need for CHECK_FOR_INTERRUPTS.  Do you see one?

Since the custodian has SignalHandlerForShutdownRequest as SIGINT and
SIGTERM handlers, unlike StatementCancelHandler and die respectively,
no need of CFI I guess. And also none of the CFI signal handler flags
applies to the custodian.

> > While thinking about this, one thing that really struck me is what
> > happens if we let the custodian exit, say after processing
> > ShutdownRequestPending immediately or after a restart, leaving other
> > queued tasks? The custodian will never get to work on those tasks
> > unless the requestors (say checkpoint or some other process) requests
> > it to do so after restart. Maybe, we don't need to worry about it.
> > Maybe we need to worry about it. Maybe it's an overkill to save the
> > custodian's task state to disk so that it can come up and do the
> > leftover tasks upon restart.
>
> Yes, tasks will need to be retried when the server starts again.  The ones
> in this patch set should be requested again during the next checkpoint.
> Temporary file cleanup would always be requested during server start, so
> that should be handled as well.  Even today, the server might abruptly shut
> down while executing these tasks, and we don't have any special handling
> for that.

Right.

The v18 patch set posted upthread
https://www.postgresql.org/message-id/20221201214026.GA1799688%40nathanxps13
looks good to me. I see the CF entry is marked RfC -
https://commitfest.postgresql.org/41/3448/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-02 Thread Nathan Bossart

On Fri, Dec 02, 2022 at 12:11:35PM +0530, Bharath Rupireddy wrote:
> On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart  
> wrote:
>> The test appears to reliably create snapshot and mapping files, so if the
>> directories are empty at some point after the checkpoint at the end, we can
>> be reasonably certain the custodian took action.  I didn't add explicit
>> checks that there are files in the directories before the checkpoint
>> because a concurrent checkpoint could make such checks unreliable.
> 
> I think you're right. I added sqls to see if the snapshot and mapping
> files count > 0, see [1] and the cirrus-ci members are happy too -
> https://github.com/BRupireddy/postgres/tree/custodian_review_2. I
> think we can consider adding these count > 0 checks to tests.

My worry about adding "count > 0" checks is that a concurrent checkpoint
could make them unreliable.  In other words, those checks might ordinarily
work, but if an automatic checkpoint causes the files be cleaned up just
beforehand, they will fail.

> Having said above, I'm okay to process ShutdownRequestPending as early
> as possible, however, should we also add CHECK_FOR_INTERRUPTS()
> alongside ShutdownRequestPending?

I'm not seeing a need for CHECK_FOR_INTERRUPTS.  Do you see one?

> While thinking about this, one thing that really struck me is what
> happens if we let the custodian exit, say after processing
> ShutdownRequestPending immediately or after a restart, leaving other
> queued tasks? The custodian will never get to work on those tasks
> unless the requestors (say checkpoint or some other process) requests
> it to do so after restart. Maybe, we don't need to worry about it.
> Maybe we need to worry about it. Maybe it's an overkill to save the
> custodian's task state to disk so that it can come up and do the
> leftover tasks upon restart.

Yes, tasks will need to be retried when the server starts again.  The ones
in this patch set should be requested again during the next checkpoint.
Temporary file cleanup would always be requested during server start, so
that should be handled as well.  Even today, the server might abruptly shut
down while executing these tasks, and we don't have any special handling
for that.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-01 Thread Bharath Rupireddy

On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart  wrote:
>
> >> 4. Is it a good idea to add log messages in the DoCustodianTasks()
> >> loop? Maybe at a debug level? The log message can say the current task
> >> the custodian is processing. And/Or setting the custodian's status on
> >> the ps display is also a good idea IMO.
>
> I'd like to pick these up in a new thread if/when this initial patch set is
> committed.  The tasks already do some logging, and the checkpointer process
> doesn't update the ps display for these tasks today.

It'll be good to have some kind of dedicated monitoring for the
custodian process as it can do a "good" amount of work at times and
users will have a way to know what it currently is doing - it can be
logs at debug level, progress reporting via
ereport_startup_progress()-sort of mechanism, ps display,
pg_stat_custodian or a special function that tells some details, or
some other. In any case, I agree to park this for later.

> >> 0002 and 0003:
> >> 1.
> >> +CHECKPOINT;
> >> +DO $$
> >> I think we need to ensure that there are some snapshot files before
> >> the checkpoint. Otherwise, it may happen that the above test case
> >> exits without the custodian process doing anything.
> >>
> >> 2. I think the best way to test the custodian process code is by
> >> adding a TAP test module to see actually the custodian process kicks
> >> in. Perhaps, add elog(DEBUGX,...) messages to various custodian
> >> process functions and see if we see the logs in server logs.
>
> The test appears to reliably create snapshot and mapping files, so if the
> directories are empty at some point after the checkpoint at the end, we can
> be reasonably certain the custodian took action.  I didn't add explicit
> checks that there are files in the directories before the checkpoint
> because a concurrent checkpoint could make such checks unreliable.

I think you're right. I added sqls to see if the snapshot and mapping
files count > 0, see [1] and the cirrus-ci members are happy too -
https://github.com/BRupireddy/postgres/tree/custodian_review_2. I
think we can consider adding these count > 0 checks to tests.

> >> 0004:
> >> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
> >> Otherwise the patch LGTM.
>
> I'm keeping this one separate because I've received conflicting feedback
> about the idea.

If we classify custodian as a process doing non-critical tasks that
have nothing to do with regular server functioning, then processing
ShutdownRequestPending looks okay. However, delaying these
non-critical tasks such as file removals which reclaims disk space
might impact the server overall especially when it's reaching 100%
disk usage and we want the custodian to do its job fully before we
shutdown the server.

If we delay processing shutdown requests, that can impact the server
overall (might delay restarts, failovers etc.), because at times there
can be a lot of tasks with a good amount of work pending in the
custodian's task queue.

Having said above, I'm okay to process ShutdownRequestPending as early
as possible, however, should we also add CHECK_FOR_INTERRUPTS()
alongside ShutdownRequestPending?

Also, I think it's enough to just have ShutdownRequestPending check in
DoCustodianTasks(void)'s main loop and we can let
RemoveOldSerializedSnapshots() and RemoveOldLogicalRewriteMappings()
do their jobs to the fullest as they do today.

While thinking about this, one thing that really struck me is what
happens if we let the custodian exit, say after processing
ShutdownRequestPending immediately or after a restart, leaving other
queued tasks? The custodian will never get to work on those tasks
unless the requestors (say checkpoint or some other process) requests
it to do so after restart. Maybe, we don't need to worry about it.
Maybe we need to worry about it. Maybe it's an overkill to save the
custodian's task state to disk so that it can come up and do the
leftover tasks upon restart.

> > Another comment:
> > IIUC, there's no custodian_delay GUC as we want to avoid unnecessary
> > wakeups for power savings (being discussed in the other thread).
> > However, can it happen that the custodian missed to capture SetLatch
> > wakeups by other backends? In other words, can the custodian process
> > be sleeping when there's work to do?
>
> I'm not aware of any way this could happen, but if there is one, I think we
> should treat it as a bug instead of relying on the custodian process to
> periodically wake up and check for work to do.

One possible scenario is that the requestor adds its task details to
the queue and sets the latch, the custodian can miss this SetLatch()
when it's in the midst of processing a task. However, it guarantees
the requester that it'll process the added task after it completes the
current task. And, I don't know the other reasons when the custodian
can miss SetLatch().

[1]
diff --git a/contrib/test_decoding/expected/rewrite.out
b/contrib/test_decoding/expected/rewrite

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-01 Thread Nathan Bossart

On Wed, Nov 30, 2022 at 05:27:10PM +0530, Bharath Rupireddy wrote:
> On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy
>  wrote:
>> Thanks for the patches. I spent some time on reviewing v17 patch set
>> and here are my comments:

Thanks for reviewing!

>> 0001:
>> 1. I think the custodian process needs documentation - it needs a
>> definition in glossary.sgml and perhaps a dedicated page describing
>> what tasks it takes care of.

Good catch.  I added this in v18.  I stopped short of adding a dedicated
page to describe the tasks because 1) there are no parameters for the
custodian and 2) AFAICT none of its tasks are described in the docs today.

>> 2.
>> +LWLockReleaseAll();
>> +ConditionVariableCancelSleep();
>> +AbortBufferIO();
>> +UnlockBuffers();
>> +ReleaseAuxProcessResources(false);
>> +AtEOXact_Buffers(false);
>> +AtEOXact_SMgr();
>> +AtEOXact_Files(false);
>> +AtEOXact_HashTables(false);
>> Do we need all of these in the exit path? Isn't the stuff that
>> ShutdownAuxiliaryProcess() does enough for the custodian process?
>> AFAICS, the custodian process uses LWLocks (which the
>> ShutdownAuxiliaryProcess() takes care of) and it doesn't access shared
>> buffers and so on.
>> Having said that, I'm fine to keep them for future use and all of
>> those cleanup functions exit if nothing related occurs.

Yeah, I don't think we need a few of these.  In v18, I've kept the
following:
* LWLockReleaseAll()
* ConditionVariableCancelSleep()
* ReleaseAuxProcessResources(false)
* AtEOXact_Files(false)

>> 3.
>> + * Advertise out latch that backends can use to wake us up while we're
>> Typo - %s/out/our

fixed

>> 4. Is it a good idea to add log messages in the DoCustodianTasks()
>> loop? Maybe at a debug level? The log message can say the current task
>> the custodian is processing. And/Or setting the custodian's status on
>> the ps display is also a good idea IMO.

I'd like to pick these up in a new thread if/when this initial patch set is
committed.  The tasks already do some logging, and the checkpointer process
doesn't update the ps display for these tasks today.

>> 0002 and 0003:
>> 1.
>> +CHECKPOINT;
>> +DO $$
>> I think we need to ensure that there are some snapshot files before
>> the checkpoint. Otherwise, it may happen that the above test case
>> exits without the custodian process doing anything.
>>
>> 2. I think the best way to test the custodian process code is by
>> adding a TAP test module to see actually the custodian process kicks
>> in. Perhaps, add elog(DEBUGX,...) messages to various custodian
>> process functions and see if we see the logs in server logs.

The test appears to reliably create snapshot and mapping files, so if the
directories are empty at some point after the checkpoint at the end, we can
be reasonably certain the custodian took action.  I didn't add explicit
checks that there are files in the directories before the checkpoint
because a concurrent checkpoint could make such checks unreliable.

>> 0004:
>> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
>> Otherwise the patch LGTM.

I'm keeping this one separate because I've received conflicting feedback
about the idea.

>> 1. I think we can trivially extend the custodian process to remove any
>> future WAL files on the old timeline, something like the attached
>> 0001-Move-removal-of-future-WAL-files-on-the-old-timeline.text file).
>> While this offloads the recovery a bit, the server may archive such
>> WAL files before the custodian removes them. We can do a bit more to
>> stop the server from archiving such WAL files, but that needs more
>> coding. I don't think we need to do all that now, perhaps, we can give
>> it a try once the basic custodian stuff gets in.
>> 2. Moving RemovePgTempFiles() to the custodian can bring up the server
>> soon. The idea is that the postmaster just renames the temp
>> directories and informs the custodian so that it can go delete such
>> temp files and directories. I have personally seen cases where the
>> server spent a good amount of time cleaning up temp files. We can park
>> it for later.
>> 3. Moving RemoveOldXlogFiles() to the custodian can make checkpoints faster.
>> 4. PreallocXlogFiles() - if we ever have plans to make pre-allocation
>> more aggressive (pre-allocate more than 1 WAL file), perhaps letting
>> custodian do that is a good idea. Again, too many tasks for a single
>> process.

I definitely want to do #2.  І have some patches for that upthread, but I
removed them for now based on Simon's feedback.  I intend to pick that up
in a new thread.  I haven't thought too much about the others yet.

> Another comment:
> IIUC, there's no custodian_delay GUC as we want to avoid unnecessary
> wakeups for power savings (being discussed in the other thread).
> However, can it happen that the custodian missed to capture SetLatch
> wakeups by other backends? In o

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-30 Thread Bharath Rupireddy

On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy
 wrote:
>
> On Wed, Nov 30, 2022 at 10:48 AM Nathan Bossart
>  wrote:
> >
> >
> > cfbot is not happy with v16.  AFAICT this is just due to poor placement, so
> > here is another attempt with the tests moved to a new location.  Apologies
> > for the noise.
>
> Thanks for the patches. I spent some time on reviewing v17 patch set
> and here are my comments:
>
> 0001:
> 1. I think the custodian process needs documentation - it needs a
> definition in glossary.sgml and perhaps a dedicated page describing
> what tasks it takes care of.
>
> 2.
> +LWLockReleaseAll();
> +ConditionVariableCancelSleep();
> +AbortBufferIO();
> +UnlockBuffers();
> +ReleaseAuxProcessResources(false);
> +AtEOXact_Buffers(false);
> +AtEOXact_SMgr();
> +AtEOXact_Files(false);
> +AtEOXact_HashTables(false);
> Do we need all of these in the exit path? Isn't the stuff that
> ShutdownAuxiliaryProcess() does enough for the custodian process?
> AFAICS, the custodian process uses LWLocks (which the
> ShutdownAuxiliaryProcess() takes care of) and it doesn't access shared
> buffers and so on.
> Having said that, I'm fine to keep them for future use and all of
> those cleanup functions exit if nothing related occurs.
>
> 3.
> + * Advertise out latch that backends can use to wake us up while we're
> Typo - %s/out/our
>
> 4. Is it a good idea to add log messages in the DoCustodianTasks()
> loop? Maybe at a debug level? The log message can say the current task
> the custodian is processing. And/Or setting the custodian's status on
> the ps display is also a good idea IMO.
>
> 0002 and 0003:
> 1.
> +CHECKPOINT;
> +DO $$
> I think we need to ensure that there are some snapshot files before
> the checkpoint. Otherwise, it may happen that the above test case
> exits without the custodian process doing anything.
>
> 2. I think the best way to test the custodian process code is by
> adding a TAP test module to see actually the custodian process kicks
> in. Perhaps, add elog(DEBUGX,...) messages to various custodian
> process functions and see if we see the logs in server logs.
>
> 0004:
> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
> Otherwise the patch LGTM.
>
> Few thoughts:
> 1. I think we can trivially extend the custodian process to remove any
> future WAL files on the old timeline, something like the attached
> 0001-Move-removal-of-future-WAL-files-on-the-old-timeline.text file).
> While this offloads the recovery a bit, the server may archive such
> WAL files before the custodian removes them. We can do a bit more to
> stop the server from archiving such WAL files, but that needs more
> coding. I don't think we need to do all that now, perhaps, we can give
> it a try once the basic custodian stuff gets in.
> 2. Moving RemovePgTempFiles() to the custodian can bring up the server
> soon. The idea is that the postmaster just renames the temp
> directories and informs the custodian so that it can go delete such
> temp files and directories. I have personally seen cases where the
> server spent a good amount of time cleaning up temp files. We can park
> it for later.
> 3. Moving RemoveOldXlogFiles() to the custodian can make checkpoints faster.
> 4. PreallocXlogFiles() - if we ever have plans to make pre-allocation
> more aggressive (pre-allocate more than 1 WAL file), perhaps letting
> custodian do that is a good idea. Again, too many tasks for a single
> process.

Another comment:
IIUC, there's no custodian_delay GUC as we want to avoid unnecessary
wakeups for power savings (being discussed in the other thread).
However, can it happen that the custodian missed to capture SetLatch
wakeups by other backends? In other words, can the custodian process
be sleeping when there's work to do?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-30 Thread Bharath Rupireddy

On Wed, Nov 30, 2022 at 10:48 AM Nathan Bossart
 wrote:
>
>
> cfbot is not happy with v16.  AFAICT this is just due to poor placement, so
> here is another attempt with the tests moved to a new location.  Apologies
> for the noise.

Thanks for the patches. I spent some time on reviewing v17 patch set
and here are my comments:

0001:
1. I think the custodian process needs documentation - it needs a
definition in glossary.sgml and perhaps a dedicated page describing
what tasks it takes care of.

2.
+LWLockReleaseAll();
+ConditionVariableCancelSleep();
+AbortBufferIO();
+UnlockBuffers();
+ReleaseAuxProcessResources(false);
+AtEOXact_Buffers(false);
+AtEOXact_SMgr();
+AtEOXact_Files(false);
+AtEOXact_HashTables(false);
Do we need all of these in the exit path? Isn't the stuff that
ShutdownAuxiliaryProcess() does enough for the custodian process?
AFAICS, the custodian process uses LWLocks (which the
ShutdownAuxiliaryProcess() takes care of) and it doesn't access shared
buffers and so on.
Having said that, I'm fine to keep them for future use and all of
those cleanup functions exit if nothing related occurs.

3.
+ * Advertise out latch that backends can use to wake us up while we're
Typo - %s/out/our

4. Is it a good idea to add log messages in the DoCustodianTasks()
loop? Maybe at a debug level? The log message can say the current task
the custodian is processing. And/Or setting the custodian's status on
the ps display is also a good idea IMO.

0002 and 0003:
1.
+CHECKPOINT;
+DO $$
I think we need to ensure that there are some snapshot files before
the checkpoint. Otherwise, it may happen that the above test case
exits without the custodian process doing anything.

2. I think the best way to test the custodian process code is by
adding a TAP test module to see actually the custodian process kicks
in. Perhaps, add elog(DEBUGX,...) messages to various custodian
process functions and see if we see the logs in server logs.

0004:
I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
Otherwise the patch LGTM.

Few thoughts:
1. I think we can trivially extend the custodian process to remove any
future WAL files on the old timeline, something like the attached
0001-Move-removal-of-future-WAL-files-on-the-old-timeline.text file).
While this offloads the recovery a bit, the server may archive such
WAL files before the custodian removes them. We can do a bit more to
stop the server from archiving such WAL files, but that needs more
coding. I don't think we need to do all that now, perhaps, we can give
it a try once the basic custodian stuff gets in.
2. Moving RemovePgTempFiles() to the custodian can bring up the server
soon. The idea is that the postmaster just renames the temp
directories and informs the custodian so that it can go delete such
temp files and directories. I have personally seen cases where the
server spent a good amount of time cleaning up temp files. We can park
it for later.
3. Moving RemoveOldXlogFiles() to the custodian can make checkpoints faster.
4. PreallocXlogFiles() - if we ever have plans to make pre-allocation
more aggressive (pre-allocate more than 1 WAL file), perhaps letting
custodian do that is a good idea. Again, too many tasks for a single
process.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs

On Wed, 30 Nov 2022 at 03:56, Nathan Bossart  wrote:
>
> On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote:
> > The last important point for me is tests, in src/test/modules
> > probably. It might be possible to reuse the final state of other
> > modules' tests to test cleanup, or at least integrate a custodian test
> > into each module.
>
> Of course.  I found some existing tests for the test_decoding plugin that
> appear to reliably generate the files we want the custodian to clean up, so
> I added them there.

Thanks for adding the tests; I can see they run clean.

The only minor thing I would personally add is a note in each piece of
code to explain where the tests are for each one and/or something in
the main custodian file that says tests exist within src/test/module.

Otherwise, ready for committer.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Nathan Bossart

On Tue, Nov 29, 2022 at 07:56:53PM -0800, Nathan Bossart wrote:
> On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote:
>> The last important point for me is tests, in src/test/modules
>> probably. It might be possible to reuse the final state of other
>> modules' tests to test cleanup, or at least integrate a custodian test
>> into each module.
> 
> Of course.  I found some existing tests for the test_decoding plugin that
> appear to reliably generate the files we want the custodian to clean up, so
> I added them there.

cfbot is not happy with v16.  AFAICT this is just due to poor placement, so
here is another attempt with the tests moved to a new location.  Apologies
for the noise.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d8342d121d39d04e995986b4244abf369b833730 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v17 1/4] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 382 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 482 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..a94381bc21
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,382 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(void);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTas

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Nathan Bossart

On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote:
> The last important point for me is tests, in src/test/modules
> probably. It might be possible to reuse the final state of other
> modules' tests to test cleanup, or at least integrate a custodian test
> into each module.

Of course.  I found some existing tests for the test_decoding plugin that
appear to reliably generate the files we want the custodian to clean up, so
I added them there.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d8342d121d39d04e995986b4244abf369b833730 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v16 1/4] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 382 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 482 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..a94381bc21
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,382 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(void);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+s

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs

On Mon, 28 Nov 2022 at 23:40, Nathan Bossart  wrote:
>
> Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
> from delaying shutdown.

That all seems good, thanks.

The last important point for me is tests, in src/test/modules
probably. It might be possible to reuse the final state of other
modules' tests to test cleanup, or at least integrate a custodian test
into each module.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Nathan Bossart

Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
from delaying shutdown.

I haven't added any logging for long-running tasks yet.  Tasks might
ordinarily take a while, so such logs wouldn't necessarily indicate
something is wrong.  Perhaps we could add a GUC for the amount of time to
wait before logging.  This feature would be off by default.  Another option
could be to create a log_custodian GUC that causes tasks to be logged when
completed, similar to log_checkpoints.  Thoughts?

On Mon, Nov 28, 2022 at 01:37:01PM -0500, Robert Haas wrote:
> On Mon, Nov 28, 2022 at 1:31 PM Andres Freund  wrote:
>> On 2022-11-28 13:08:57 +, Simon Riggs wrote:
>> > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  
>> > wrote:
>> > > > Rather than explicitly use DEBUG1 everywhere I would have an
>> > > > #define CUSTODIAN_LOG_LEVEL LOG
>> > > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
>> > > > line change in a later phase of Beta
>> > >
>> > > I can create a separate patch for this, but I don't think I've ever seen
>> > > this sort of thing before.
>> >
>> > Much of recovery is coded that way, for the same reason.
>>
>> I think that's not a good thing to copy without a lot more justification than
>> "some old code also does it that way". It's sometimes justified, but also
>> makes code harder to read (one doesn't know what it does without looking up
>> the #define, line length).
> 
> Yeah. If people need some of the log messages at a higher level during
> development, they can patch their own copies.
> 
> I think there might be some argument for having a facility that lets
> you pick subsystems or even individual messages that you want to trace
> and pump up the log level for just those call sites. But I don't know
> exactly what that would look like, and I don't think inventing one-off
> mechanisms for particular cases is a good idea.

Given this discussion, I haven't made any changes to the logging in the new
patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7fa5c047781dddedb1f9c5a4e96622a23c0c0835 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v15 1/4] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 382 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 482 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..a94381bc21
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,382 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay s

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 1:31 PM Andres Freund  wrote:
> On 2022-11-28 13:08:57 +, Simon Riggs wrote:
> > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  
> > wrote:
> > > > Rather than explicitly use DEBUG1 everywhere I would have an
> > > > #define CUSTODIAN_LOG_LEVEL LOG
> > > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > > > line change in a later phase of Beta
> > >
> > > I can create a separate patch for this, but I don't think I've ever seen
> > > this sort of thing before.
> >
> > Much of recovery is coded that way, for the same reason.
>
> I think that's not a good thing to copy without a lot more justification than
> "some old code also does it that way". It's sometimes justified, but also
> makes code harder to read (one doesn't know what it does without looking up
> the #define, line length).

Yeah. If people need some of the log messages at a higher level during
development, they can patch their own copies.

I think there might be some argument for having a facility that lets
you pick subsystems or even individual messages that you want to trace
and pump up the log level for just those call sites. But I don't know
exactly what that would look like, and I don't think inventing one-off
mechanisms for particular cases is a good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Andres Freund

On 2022-11-28 13:08:57 +, Simon Riggs wrote:
> On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  wrote:
> > > Rather than explicitly use DEBUG1 everywhere I would have an
> > > #define CUSTODIAN_LOG_LEVEL LOG
> > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > > line change in a later phase of Beta
> >
> > I can create a separate patch for this, but I don't think I've ever seen
> > this sort of thing before.
> 
> Much of recovery is coded that way, for the same reason.

I think that's not a good thing to copy without a lot more justification than
"some old code also does it that way". It's sometimes justified, but also
makes code harder to read (one doesn't know what it does without looking up
the #define, line length).

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Simon Riggs

On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  wrote:
>
> Thanks for taking a look!
>
> On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote:
> > * not sure I believe that everything it does can always be aborted out
> > of and shutdown - to achieve that you will need a
> > CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
>
> I did something like this earlier, but was advised to simply let the
> functions finish as usual during shutdown [0].  I think this is what the
> checkpointer process does today, anyway.

If we say "The custodian is not an essential process and can shutdown
quickly when requested.", and yet we know its not true in all cases,
then that will lead to misunderstandings and bugs.

If we perform a restart and the custodian is performing extra work
that delays shutdown, then it also delays restart. Given the title of
the thread, we should be looking to improve that, or at least know it
occurred.

> > * not sure why you want immediate execution of custodian tasks - I
> > feel supporting two modes will be a lot harder. For me, I would run
> > locally when !IsUnderPostmaster and also in an Assert build, so we can
> > test it works right - i.e. running in its own process is just a
> > production optimization for performance (which is the stated reason
> > for having this)
>
> I added this because 0004 involves requesting a task from the postmaster,
> so checking for IsUnderPostmaster doesn't work.  Those tasks would always
> run immediately.  However, we could use IsPostmasterEnvironment instead,
> which would allow us to remove the "immediate" argument.  I did it this way
> in v14.

Thanks

> > 0005 seems good from what I know
> > * There is no check to see if it worked in any sane time
>
> What did you have in mind?  Should the custodian begin emitting WARNINGs
> after a while?

I think it might be useful if it logged anything that took an
"extended period", TBD.

Maybe that is already covered by startup process logging. Please tell
me that still works?

> > Rather than explicitly use DEBUG1 everywhere I would have an
> > #define CUSTODIAN_LOG_LEVEL LOG
> > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > line change in a later phase of Beta
>
> I can create a separate patch for this, but I don't think I've ever seen
> this sort of thing before.

Much of recovery is coded that way, for the same reason.

> Is the idea just to help with debugging during
> the development phase?

"Just", yes. Tests would be desirable also, under src/test/modules.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-27 Thread Nathan Bossart

Thanks for taking a look!

On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote:
> * not sure I believe that everything it does can always be aborted out
> of and shutdown - to achieve that you will need a
> CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least

I did something like this earlier, but was advised to simply let the
functions finish as usual during shutdown [0].  I think this is what the
checkpointer process does today, anyway.

> * not sure why you want immediate execution of custodian tasks - I
> feel supporting two modes will be a lot harder. For me, I would run
> locally when !IsUnderPostmaster and also in an Assert build, so we can
> test it works right - i.e. running in its own process is just a
> production optimization for performance (which is the stated reason
> for having this)

I added this because 0004 involves requesting a task from the postmaster,
so checking for IsUnderPostmaster doesn't work.  Those tasks would always
run immediately.  However, we could use IsPostmasterEnvironment instead,
which would allow us to remove the "immediate" argument.  I did it this way
in v14.

I'm not sure about running locally in Assert builds.  It's true that would
help ensure there's test coverage for the task logic, but it would also
reduce coverage for the custodian logic.  And in general, I'm worried about
having Assert builds use a different code path than production builds.

> 0005 seems good from what I know
> * There is no check to see if it worked in any sane time

What did you have in mind?  Should the custodian begin emitting WARNINGs
after a while?

> * It seems possible that "Old" might change meaning - will that make
> it break/fail?

I don't believe so.

> Rather than explicitly use DEBUG1 everywhere I would have an
> #define CUSTODIAN_LOG_LEVEL LOG
> so we can run with it in LOG mode and then set it to DEBUG1 with a one
> line change in a later phase of Beta

I can create a separate patch for this, but I don't think I've ever seen
this sort of thing before.  Is the idea just to help with debugging during
the development phase?

> I can't really comment with knowledge on sub-patches 0002 to 0004.
> 
> Perhaps you should aim to get 1, 5, 6 committed first and then return
> to the others in a later CF/separate thread?

That seems like a good idea since those are all relatively self-contained.
I removed 0002-0004 in v14.

[0] https://postgr.es/m/20220217065938.x2esfdppzypegn5j%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 443c3f842785554476b1a353bcb1af13f426116b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v14 1/3] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 382 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 482 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-24 Thread Simon Riggs

On Thu, 24 Nov 2022 at 00:19, Nathan Bossart  wrote:
>
> On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote:
> > rebased
>
> another rebase for cfbot

0001 seems good to me
* I like that it sleeps forever until requested
* not sure I believe that everything it does can always be aborted out
of and shutdown - to achieve that you will need a
CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
* not sure why you want immediate execution of custodian tasks - I
feel supporting two modes will be a lot harder. For me, I would run
locally when !IsUnderPostmaster and also in an Assert build, so we can
test it works right - i.e. running in its own process is just a
production optimization for performance (which is the stated reason
for having this)

0005 seems good from what I know
* There is no check to see if it worked in any sane time
* It seems possible that "Old" might change meaning - will that make
it break/fail?

0006 seems good also
* same comments for 5

Rather than explicitly use DEBUG1 everywhere I would have an
#define CUSTODIAN_LOG_LEVEL LOG
so we can run with it in LOG mode and then set it to DEBUG1 with a one
line change in a later phase of Beta

I can't really comment with knowledge on sub-patches 0002 to 0004.

Perhaps you should aim to get 1, 5, 6 committed first and then return
to the others in a later CF/separate thread?

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-23 Thread Nathan Bossart

On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote:
> rebased

another rebase for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b2c36a6d0d8ca5cde374b1c8b34aafaabbd7f6c2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v13 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 483 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+struct cust_task_funcs_entry
+{
+	CustodianTask task;
+	CustodianTaskFunction task_func;		/* performs task */
+	CustodianTaskHandleArg handle_arg_func;	/* handles additional info in request */
+};
+
+/*
+ * Add new tasks here.
+ *
+ * task_func is the logic that will be executed via DoCustodianTasks() when the
+ * matching task is requested via Requ

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-06 Thread Nathan Bossart

On Fri, Sep 23, 2022 at 10:41:54AM -0700, Nathan Bossart wrote:
> v11 adds support for building with meson.

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 367c5f3863457cfbd0fe8add0e8df3e630aaaea9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v12 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 489 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+struct cust_task_funcs_entry
+{
+	CustodianTask task;
+	CustodianTaskFunction task_func;		/* performs task */
+	CustodianTaskHandleArg handle_arg_func;	/* handles additional info in request */
+};
+
+/*
+ * Add new tasks here.
+ *
+ * task_func is the logic that will be executed via DoCustodianTasks() when the
+ * matching task is r

Re: O(n) tasks cause lengthy startups and checkpoints

2022-09-23 Thread Nathan Bossart

On Fri, Sep 02, 2022 at 03:07:44PM -0700, Nathan Bossart wrote:
> And another.

v11 adds support for building with meson.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 56c9ff2bf1a6524518b62193c0da02372f9674a1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v11 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 489 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+struct cust_task_funcs_entry
+{
+	CustodianTask task;
+	CustodianTaskFunction task_func;		/* performs task */
+	CustodianTaskHandleArg handle_arg_func;	/* handles additional info in request */
+};
+
+/*
+ * Add new tasks here.
+ *
+ * task_func is the logic that will be executed via DoCustodianTasks() when the
+ * matching task

Re: O(n) tasks cause lengthy startups and checkpoints

2022-09-02 Thread Nathan Bossart

On Wed, Aug 24, 2022 at 09:46:24AM -0700, Nathan Bossart wrote:
> Another rebase for cfbot.

And another.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 63a470be1ac8af3b12684f136f70b2d7b6f87b81 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v10 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 12 files changed, 488 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+struct cust_task_funcs_entry
+{
+	CustodianTask task;
+	CustodianTaskFunction task_func;		/* performs task */
+	CustodianTaskHandleArg handle_arg_func;	/* handles additional info in request */
+};
+
+/*
+ * Add new tasks here.
+ *
+ * task_func is the logic that will be executed via DoCustodianTasks() when the
+ * matching task is requested via RequestCustodian().  handle_arg_func is an
+ *

Re: O(n) tasks cause lengthy startups and checkpoints

2022-08-24 Thread Nathan Bossart

On Thu, Aug 11, 2022 at 04:09:21PM -0700, Nathan Bossart wrote:
> Here is a rebased patch set for cfbot.  There are no other differences
> between v7 and v8.

Another rebase for cfbot.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 6810355cb3d1a03326b152aebe3c907f7544be4f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v9 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 12 files changed, 488 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+struct cust_task_funcs_entry
+{
+	CustodianTask task;
+	CustodianTaskFunction task_func;		/* performs task */
+	CustodianTaskHandleArg handle_arg_func;	/* handles additional info in request */
+};
+
+/*
+ * Add new tasks here.
+ *
+ * task_func is the logic that will be executed via DoCustodianTasks() when the
+ *

Re: O(n) tasks cause lengthy startups and checkpoints

2022-08-11 Thread Nathan Bossart

On Wed, Jul 06, 2022 at 09:51:10AM -0700, Nathan Bossart wrote:
> Here's a new revision where I've attempted to address all the feedback I've
> received thus far.  Notably, the custodian now uses a queue for registering
> tasks and determining which tasks to execute.  Other changes include
> splitting the temporary file functions apart to avoid consecutive boolean
> flags, using a timestamp instead of an integer for the staging name for
> temporary directories, moving temporary directories to a dedicated
> directory so that the custodian doesn't need to scan relation files,
> ERROR-ing when something goes wrong when cleaning up temporary files,
> executing requested tasks immediately in single-user mode, and more.

Here is a rebased patch set for cfbot.  There are no other differences
between v7 and v8.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From a0b28421a7a170598f6e60b2c17a8d49fb0ffd55 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v8 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 12 files changed, 488 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(Custodian

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-06 Thread Nathan Bossart

Here's a new revision where I've attempted to address all the feedback I've
received thus far.  Notably, the custodian now uses a queue for registering
tasks and determining which tasks to execute.  Other changes include
splitting the temporary file functions apart to avoid consecutive boolean
flags, using a timestamp instead of an integer for the staging name for
temporary directories, moving temporary directories to a dedicated
directory so that the custodian doesn't need to scan relation files,
ERROR-ing when something goes wrong when cleaning up temporary files,
executing requested tasks immediately in single-user mode, and more.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From f2d5205b7fab3c1dccbe25829c9b46ba26b3cd9f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v7 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 12 files changed, 488 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 39ac4490db..620a0b1bae 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static Cu

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-03 Thread Andres Freund

Hi,

On 2022-07-03 10:07:54 -0700, Nathan Bossart wrote:
> Thanks for the prompt review.
> 
> On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote:
> > On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
> >> +  /* Obtain requested tasks */
> >> +  SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +  flags = CustodianShmem->cust_flags;
> >> +  CustodianShmem->cust_flags = 0;
> >> +  SpinLockRelease(&CustodianShmem->cust_lck);
> > 
> > Just resetting the flags to 0 is problematic. Consider what happens if 
> > there's
> > two tasks and and the one processed first errors out. You'll loose 
> > information
> > about needing to run the second task.
> 
> I think we also want to retry any failed tasks.

I don't think so, at least not if it's just going to retry that task straight
away - then we'll get stuck on that one task forever. If we had the ability to
"queue" it the end, to be processed after other already dequeued tasks, it'd
be a different story.


> The way v6 handles this is by requesting all tasks after an exception.

Ick. That strikes me as a bad idea.


> >> +/*
> >> + * RequestCustodian
> >> + *Called to request a custodian task.
> >> + */
> >> +void
> >> +RequestCustodian(int flags)
> >> +{
> >> +  SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +  CustodianShmem->cust_flags |= flags;
> >> +  SpinLockRelease(&CustodianShmem->cust_lck);
> >> +
> >> +  if (ProcGlobal->custodianLatch)
> >> +  SetLatch(ProcGlobal->custodianLatch);
> >> +}
> > 
> > With this representation we can't really implement waiting for a task or
> > such. And it doesn't seem like a great API for the caller to just specify a
> > mix of flags.
> 
> At the moment, the idea is that nothing should need to wait for a task
> because the custodian only handles things that are relatively non-critical.

Which is just plainly not true as the patchset stands...

I think we're going to have to block if some cleanup as part of a checkpoint
hasn't been completed by the next checkpoint - otherwise it'll just end up
being way too confusing and there's absolutely no backpressure anymore.


> If that changes, this could probably be expanded to look more like
> RequestCheckpoint().
> 
> What would you suggest using instead of a mix of flags?

I suspect an array of tasks with requested and completed counters or such?
With a condition variable to wait on?


> >> +  /* let the custodian know what it can remove */
> >> +  CustodianSetLogicalRewriteCutoff(cutoff);
> > 
> > Setting this variable in a custodian datastructure and then fetching it from
> > there seems architecturally wrong to me.
> 
> Where do you think it should go?  I previously had it in the checkpointer's
> shared memory, but you didn't like that the functions were declared in
> bgwriter.h (along with the other checkpoint stuff).  If the checkpointer
> shared memory is the right place, should we create checkpointer.h and use
> that instead?

Well, so far I have not understood what the whole point of the shared state
is, so i have a bit of a hard time answering this ;)


> >> +/*
> >> + * Remove all mappings not needed anymore based on the logical restart 
> >> LSN saved
> >> + * by the checkpointer.  We use this saved value instead of calling
> >> + * ReplicationSlotsComputeLogicalRestartLSN() so that we don't interfere 
> >> with an
> >> + * ongoing call to CheckPointLogicalRewriteHeap() that is flushing 
> >> mappings to
> >> + * disk.
> >> + */
> > 
> > What interference could there be?
> 
> My concern is that the custodian could obtain a later cutoff than what the
> checkpointer does, which might cause files to be concurrently unlinked and
> fsync'd.  If we always use the checkpointer's cutoff, that shouldn't be a
> problem.  This could probably be better explained in this comment.

How about having a Datum argument to RequestCustodian() that is forwarded to
the task?


> >> +void
> >> +RemoveOldLogicalRewriteMappings(void)
> >> +{
> >> +  XLogRecPtr  cutoff;
> >> +  DIR*mappings_dir;
> >> +  struct dirent *mapping_de;
> >> +  charpath[MAXPGPATH + 20];
> >> +  boolvalue_set = false;
> >> +
> >> +  cutoff = CustodianGetLogicalRewriteCutoff(&value_set);
> >> +  if (!value_set)
> >> +  return;
> > 
> > Afaics nothing clears values_set - is that a good idea?
> 
> I'm using value_set to differentiate the case where InvalidXLogRecPtr means
> the checkpointer hasn't determined a value yet versus the case where it
> has.  In the former, we don't want to take any action.  In the latter, we
> want to unlink all the files.  Since we're moving to a request model for
> the custodian, I might be able to remove this value_set stuff completely.
> If that's not possible, it probably deserves a better comment.

It would.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-03 Thread Nathan Bossart

Hi Andres,

Thanks for the prompt review.

On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote:
> On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
>> +/* Obtain requested tasks */
>> +SpinLockAcquire(&CustodianShmem->cust_lck);
>> +flags = CustodianShmem->cust_flags;
>> +CustodianShmem->cust_flags = 0;
>> +SpinLockRelease(&CustodianShmem->cust_lck);
> 
> Just resetting the flags to 0 is problematic. Consider what happens if there's
> two tasks and and the one processed first errors out. You'll loose information
> about needing to run the second task.

I think we also want to retry any failed tasks.  The way v6 handles this is
by requesting all tasks after an exception.  Another way to handle this
could be to reset each individual flag before the task is executed, and
then we could surround each one with a PG_CATCH block that resets the flag.
I'll do it this way in the next revision.

>> +/*
>> + * RequestCustodian
>> + *  Called to request a custodian task.
>> + */
>> +void
>> +RequestCustodian(int flags)
>> +{
>> +SpinLockAcquire(&CustodianShmem->cust_lck);
>> +CustodianShmem->cust_flags |= flags;
>> +SpinLockRelease(&CustodianShmem->cust_lck);
>> +
>> +if (ProcGlobal->custodianLatch)
>> +SetLatch(ProcGlobal->custodianLatch);
>> +}
> 
> With this representation we can't really implement waiting for a task or
> such. And it doesn't seem like a great API for the caller to just specify a
> mix of flags.

At the moment, the idea is that nothing should need to wait for a task
because the custodian only handles things that are relatively non-critical.
If that changes, this could probably be expanded to look more like
RequestCheckpoint().

What would you suggest using instead of a mix of flags?

>> +/* Calculate how long to sleep */
>> +end_time = (pg_time_t) time(NULL);
>> +elapsed_secs = end_time - start_time;
>> +if (elapsed_secs >= CUSTODIAN_TIMEOUT_S)
>> +continue;   /* no sleep for us */
>> +cur_timeout = CUSTODIAN_TIMEOUT_S - elapsed_secs;
>> +
>> +(void) WaitLatch(MyLatch,
>> + WL_LATCH_SET | WL_TIMEOUT | 
>> WL_EXIT_ON_PM_DEATH,
>> + cur_timeout * 1000L /* convert 
>> to ms */ ,
>> + WAIT_EVENT_CUSTODIAN_MAIN);
>> +}
> 
> I don't think we should have this thing wake up on a regular basis. We're
> doing way too much of that already, and I don't think we should add
> more. Either we need a list of times when tasks need to be processed and wake
> up at that time, or just wake up if somebody requests a task.

I agree.  I will remove the timeout in the next revision.

>> diff --git a/src/backend/postmaster/postmaster.c 
>> b/src/backend/postmaster/postmaster.c
>> index e67370012f..82aa0c6307 100644
>> --- a/src/backend/postmaster/postmaster.c
>> +++ b/src/backend/postmaster/postmaster.c
>> @@ -1402,7 +1402,8 @@ PostmasterMain(int argc, char *argv[])
>>   * Remove old temporary files.  At this point there can be no other
>>   * Postgres processes running in this directory, so this should be safe.
>>   */
>> -RemovePgTempFiles();
>> +RemovePgTempFiles(true, true);
>> +RemovePgTempFiles(false, false);
> 
> This is imo hard to read and easy to get wrong. Make it multiple functions or
> pass named flags in.

Will do.

>> + * StagePgTempDirForRemoval
>> + *
>> + * This function renames the given directory with a special prefix that
>> + * RemoveStagedPgTempDirs() will know to look for.  An integer is appended 
>> to
>> + * the end of the new directory name in case previously staged pgsql_tmp
>> + * directories have not yet been removed.
>> + */
> 
> It doesn't seem great to need to iterate through a directory that contains
> other files, potentially a significant number. How about having a
> staged_for_removal/ directory, and then only scanning that?

Yeah, that seems like a good idea.  Will do.

>> +/*
>> + * Find a name for the stage directory.  We just increment an integer 
>> at the
>> + * end of the name until we find one that doesn't exist.
>> + */
>> +for (int n = 0; n <= INT_MAX; n++)
>> +{
>> +snprintf(stage_path, sizeof(stage_path), "%s/%s%d", parent_path,
>> + PG_TEMP_DIR_TO_REMOVE_PREFIX, n);
>> +
>> +if (stat(stage_path, &statbuf) != 0)
>> +{
>> +if (errno == ENOENT)
>> +break;
>> +
>> +ereport(LOG,
>> +(errcode_for_file_access(),
>> + errmsg("could not stat file \"%s\": 
>> %m", stage_path)));
>> +return;
>> +}
>> +
>> +stage_path[0] = '\0';
> 
> I still disli

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-02 Thread Andres Freund

Hi,

On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
> + /* Obtain requested tasks */
> + SpinLockAcquire(&CustodianShmem->cust_lck);
> + flags = CustodianShmem->cust_flags;
> + CustodianShmem->cust_flags = 0;
> + SpinLockRelease(&CustodianShmem->cust_lck);

Just resetting the flags to 0 is problematic. Consider what happens if there's
two tasks and and the one processed first errors out. You'll loose information
about needing to run the second task.


> + /* TODO: offloaded tasks go here */

Seems we're going to need some sorting of which tasks are most "urgent" / need
to be processed next if we plan to make this into some generic facility.


> +/*
> + * RequestCustodian
> + *   Called to request a custodian task.
> + */
> +void
> +RequestCustodian(int flags)
> +{
> + SpinLockAcquire(&CustodianShmem->cust_lck);
> + CustodianShmem->cust_flags |= flags;
> + SpinLockRelease(&CustodianShmem->cust_lck);
> +
> + if (ProcGlobal->custodianLatch)
> + SetLatch(ProcGlobal->custodianLatch);
> +}

With this representation we can't really implement waiting for a task or
such. And it doesn't seem like a great API for the caller to just specify a
mix of flags.


> + /* Calculate how long to sleep */
> + end_time = (pg_time_t) time(NULL);
> + elapsed_secs = end_time - start_time;
> + if (elapsed_secs >= CUSTODIAN_TIMEOUT_S)
> + continue;   /* no sleep for us */
> + cur_timeout = CUSTODIAN_TIMEOUT_S - elapsed_secs;
> +
> + (void) WaitLatch(MyLatch,
> +  WL_LATCH_SET | WL_TIMEOUT | 
> WL_EXIT_ON_PM_DEATH,
> +  cur_timeout * 1000L /* convert 
> to ms */ ,
> +  WAIT_EVENT_CUSTODIAN_MAIN);
> + }

I don't think we should have this thing wake up on a regular basis. We're
doing way too much of that already, and I don't think we should add
more. Either we need a list of times when tasks need to be processed and wake
up at that time, or just wake up if somebody requests a task.


> From 5e95666efa31d6c8aa351e430c37ead6e27acb72 Mon Sep 17 00:00:00 2001
> From: Nathan Bossart 
> Date: Sun, 5 Dec 2021 21:16:44 -0800
> Subject: [PATCH v6 3/6] Split pgsql_tmp cleanup into two stages.
> 
> First, pgsql_tmp directories will be renamed to stage them for
> removal.  Then, all files in pgsql_tmp are removed before removing
> the staged directories themselves.  This change is being made in
> preparation for a follow-up change to offload most temporary file
> cleanup to the new custodian process.

> Note that temporary relation files cannot be cleaned up via the
> aforementioned strategy and will not be offloaded to the custodian.

> ---
>  src/backend/postmaster/postmaster.c |   8 +-
>  src/backend/storage/file/fd.c   | 174 
>  src/include/storage/fd.h|   2 +-
>  3 files changed, 160 insertions(+), 24 deletions(-)
> 
> diff --git a/src/backend/postmaster/postmaster.c 
> b/src/backend/postmaster/postmaster.c
> index e67370012f..82aa0c6307 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -1402,7 +1402,8 @@ PostmasterMain(int argc, char *argv[])
>* Remove old temporary files.  At this point there can be no other
>* Postgres processes running in this directory, so this should be safe.
>*/
> - RemovePgTempFiles();
> + RemovePgTempFiles(true, true);
> + RemovePgTempFiles(false, false);

This is imo hard to read and easy to get wrong. Make it multiple functions or
pass named flags in.


> + * StagePgTempDirForRemoval
> + *
> + * This function renames the given directory with a special prefix that
> + * RemoveStagedPgTempDirs() will know to look for.  An integer is appended to
> + * the end of the new directory name in case previously staged pgsql_tmp
> + * directories have not yet been removed.
> + */

It doesn't seem great to need to iterate through a directory that contains
other files, potentially a significant number. How about having a
staged_for_removal/ directory, and then only scanning that?


> +static void
> +StagePgTempDirForRemoval(const char *tmp_dir)
> +{
> + DIR*dir;
> + charstage_path[MAXPGPATH * 2];
> + charparent_path[MAXPGPATH * 2];
> + struct stat statbuf;
> +
> + /*
> +  * If tmp_dir doesn't exist, there is nothing to stage.
> +  */
> + dir = AllocateDir(tmp_dir);
> + if (dir == NULL)
> + {
> + if (errno != ENOENT)
> + ereport(LOG,
> + (errcode_for_file_access(),
> +  errmsg("could not open directory 
> \"%s\": %m", tmp_dir)));
> + return;
> + }
> +

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-02 Thread Nathan Bossart

On Fri, Jun 24, 2022 at 11:45:22AM +0100, Simon Riggs wrote:
> On Thu, 23 Jun 2022 at 18:15, Nathan Bossart  wrote:
>> I'm grateful for the discussion in this thread so far, but I'm not seeing a
>> clear path forward.
> 
> +1 to add the new auxiliary process.

I went ahead and put together a new patch set for this in which I've
attempted to address most of the feedback from upthread.  Notably, I've
abandoned 0007 and 0008, added a way for processes to request specific
tasks for the custodian, and removed all the checks for
ShutdownRequestPending.

I haven't addressed the existing transaction ID wraparound risk with the
logical replication files.  My instinct is that this deserveѕ its own
thread, and it might need to be considered a prerequisite to this change
based on the prior discussion here.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 68e3005c14ba116e372a1724dad079914108ab2d Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v6 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 252 
 src/backend/postmaster/postmaster.c |  44 -
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  20 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 12 files changed, 345 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 39ac4490db..620a0b1bae 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..db00282658
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,252 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  Also, ensure that any offloaded
+ * tasks are either not required during single-user mode or are performed
+ * separately during single-user mode.
+ *
+ * The custodian is not an essential process and can shutdown quickly when
+ * requested.  The custodian will wake up approximately once every 5
+ * minutes to perform its tasks, but backends can (and should) set its
+ * latch to wake it up sooner.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+#define CUSTODIAN_TIMEOUT_S (300)		/* 5 minutes */
+
+typedef struct
+{
+

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-24 Thread Simon Riggs

On Thu, 23 Jun 2022 at 18:15, Nathan Bossart  wrote:

> I'm grateful for the discussion in this thread so far, but I'm not seeing a
> clear path forward.

+1 to add the new auxiliary process.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Nathan Bossart

On Thu, Jun 23, 2022 at 09:46:28AM -0400, Robert Haas wrote:
> I do agree that a general mechanism for getting cleanup tasks done in
> the background could be a useful thing to have, but I feel like it's
> hard to see exactly how to make it work well. We can't just allow it
> to spin up a million new processes, but at the same time, if it can't
> guarantee that time-critical tasks get performed relatively quickly,
> it's pretty worthless.

My intent with this new auxiliary process is to offload tasks that aren't
particularly time-critical.  They are only time-critical in the sense that
1) you might eventually run out of space and 2) you might encounter
wraparound with the logical replication files.  But AFAICT these same risks
exist today in the checkpointer approach, although maybe not to the same
extent.  In any case, 2 seems solvable to me outside of this patch set.

I'm grateful for the discussion in this thread so far, but I'm not seeing a
clear path forward.  I'm glad to see threads like the one to stop doing
end-of-recovery checkpoints [0], but I don't know if it will be possible to
solve all of these availability concerns in a piecemeal fashion.  I remain
open to exploring other suggested approaches beyond creating a new
auxiliary process.

[0] 
https://postgr.es/m/CA%2BTgmobrM2jvkiccCS9NgFcdjNSgAvk1qcAPx5S6F%2BoJT3D2mQ%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Simon Riggs

On Thu, 23 Jun 2022 at 14:46, Robert Haas  wrote:
>
> On Thu, Jun 23, 2022 at 7:58 AM Simon Riggs
>  wrote:
> > Having a central cleanup process makes a lot of sense. There is a long
> > list of potential tasks for such a process. My understanding is that
> > autovacuum already has an interface for handling additional workload
> > types, which is how BRIN indexes are handled. Do we really need a new
> > process?
>
> It seems to me that if there's a long list of possible tasks for such
> a process, that's actually a trickier situation than if there were
> only one or two, because it may happen that when task X is really
> urgent, the process is already busy with task Y.
>
> I don't think that piggybacking more stuff onto autovacuum is a very
> good idea for this exact reason. We already know that autovacuum
> workers can get so busy that they can't keep up with the need to
> vacuum and analyze tables. If we give them more things to do, that
> figures to make it worse, at least on busy systems.
>
> I do agree that a general mechanism for getting cleanup tasks done in
> the background could be a useful thing to have, but I feel like it's
> hard to see exactly how to make it work well. We can't just allow it
> to spin up a million new processes, but at the same time, if it can't
> guarantee that time-critical tasks get performed relatively quickly,
> it's pretty worthless.

Most of the tasks mentioned aren't time critical.

I have no objection to a new auxiliary process to execute those tasks,
which can be spawned when needed.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 7:58 AM Simon Riggs
 wrote:
> Having a central cleanup process makes a lot of sense. There is a long
> list of potential tasks for such a process. My understanding is that
> autovacuum already has an interface for handling additional workload
> types, which is how BRIN indexes are handled. Do we really need a new
> process?

It seems to me that if there's a long list of possible tasks for such
a process, that's actually a trickier situation than if there were
only one or two, because it may happen that when task X is really
urgent, the process is already busy with task Y.

I don't think that piggybacking more stuff onto autovacuum is a very
good idea for this exact reason. We already know that autovacuum
workers can get so busy that they can't keep up with the need to
vacuum and analyze tables. If we give them more things to do, that
figures to make it worse, at least on busy systems.

I do agree that a general mechanism for getting cleanup tasks done in
the background could be a useful thing to have, but I feel like it's
hard to see exactly how to make it work well. We can't just allow it
to spin up a million new processes, but at the same time, if it can't
guarantee that time-critical tasks get performed relatively quickly,
it's pretty worthless.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Simon Riggs

On Fri, 18 Feb 2022 at 20:51, Nathan Bossart  wrote:
>
> > On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote:
> >>> > The improvements around deleting temporary files and serialized 
> >>> > snapshots
> >>> > afaict don't require a dedicated process - they're only relevant during
> >>> > startup. We could use the approach of renaming the directory out of the 
> >>> > way as
> >>> > done in this patchset but perform the cleanup in the startup process 
> >>> > after
> >>> > we're up.
>
> BTW I know you don't like the dedicated process approach, but one
> improvement to that approach could be to shut down the custodian process
> when it has nothing to do.

Having a central cleanup process makes a lot of sense. There is a long
list of potential tasks for such a process. My understanding is that
autovacuum already has an interface for handling additional workload
types, which is how BRIN indexes are handled. Do we really need a new
process? If so, lets do this now.

Nathan's point that certain tasks are blocking fast startup is a good
one and higher availability is a critical end goal. The thought that
we should complete these tasks during checkpoint is a good one, but
checkpoints should NOT be delayed by long running tasks that are
secondary to availability.

Andres' point that it would be better to avoid long running tasks is
good, if that is possible. That can be done better over time. This
point does not block the higher level goal of better availability
asap, so I support Nathan's overall proposals.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-03-17 Thread Nathan Bossart

It seems unlikely that anything discussed in this thread will be committed
for v15, so I've adjusted the commitfest entry to v16 and moved it to the
next commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-18 Thread Nathan Bossart

> On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote:
>>> > The improvements around deleting temporary files and serialized snapshots
>>> > afaict don't require a dedicated process - they're only relevant during
>>> > startup. We could use the approach of renaming the directory out of the 
>>> > way as
>>> > done in this patchset but perform the cleanup in the startup process after
>>> > we're up.

BTW I know you don't like the dedicated process approach, but one
improvement to that approach could be to shut down the custodian process
when it has nothing to do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-18 Thread Nathan Bossart

On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote:
>> > The improvements around deleting temporary files and serialized snapshots
>> > afaict don't require a dedicated process - they're only relevant during
>> > startup. We could use the approach of renaming the directory out of the 
>> > way as
>> > done in this patchset but perform the cleanup in the startup process after
>> > we're up.
>> 
>> Perhaps this is a good place to start.  As I mentioned above, IME the
>> temporary file cleanup is the most common problem, so I think even getting
>> that one fixed would be a huge improvement.
> 
> Cool.

Hm.  How should this work for standbys?  I can think of the following
options:
1. Do temporary file cleanup in the postmaster (as it does today).
2. Pause after allowing connections to clean up temporary files.
3. Do small amounts of temporary file cleanup whenever there is an
   opportunity during recovery.
4. Wait until recovery completes before cleaning up temporary files.

I'm not too thrilled about any of these options.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Andres Freund

Hi,

On 2022-02-17 14:58:38 -0800, Nathan Bossart wrote:
> On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote:
> > As far as I understand, the primary concern are logical decoding serialized
> > snapshots, because a lot of them can accumulate if there e.g. is an old 
> > unused
> > / far behind slot. It should be easy to reduce the number of those snapshots
> > by e.g. eliding some redundant ones. Perhaps we could also make backends in
> > logical decoding occasionally do a bit of cleanup themselves.
> > 
> > I've not seen reports of the number of mapping files to be an real issue?
> 
> I routinely see all four of these tasks impacting customers, but I'd say
> the most common one is the temporary file cleanup.

I took temp file cleanup and StartupReorderBuffer() "out of consideration" for
custodian, because they're not needed during normal running.


> Besides eliminating some redundant files and having backends perform some
> cleanup, what do you think about skipping the logical decoding cleanup
> during end-of-recovery/shutdown checkpoints?

I strongly disagree with it. Then you might never get the cleanup done, but
keep on operating until you hit corruption issues.


> > The improvements around deleting temporary files and serialized snapshots
> > afaict don't require a dedicated process - they're only relevant during
> > startup. We could use the approach of renaming the directory out of the way 
> > as
> > done in this patchset but perform the cleanup in the startup process after
> > we're up.
> 
> Perhaps this is a good place to start.  As I mentioned above, IME the
> temporary file cleanup is the most common problem, so I think even getting
> that one fixed would be a huge improvement.

Cool.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Nathan Bossart

On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote:
> As far as I understand, the primary concern are logical decoding serialized
> snapshots, because a lot of them can accumulate if there e.g. is an old unused
> / far behind slot. It should be easy to reduce the number of those snapshots
> by e.g. eliding some redundant ones. Perhaps we could also make backends in
> logical decoding occasionally do a bit of cleanup themselves.
> 
> I've not seen reports of the number of mapping files to be an real issue?

I routinely see all four of these tasks impacting customers, but I'd say
the most common one is the temporary file cleanup.  Besides eliminating
some redundant files and having backends perform some cleanup, what do you
think about skipping the logical decoding cleanup during
end-of-recovery/shutdown checkpoints?  This was something that Bharath
brought up a while back [0].  As I noted in that thread, startup and
shutdown could still take a while if checkpoints are regularly delayed due
to logical decoding cleanup, but that might still help avoid a bit of
downtime.

> The improvements around deleting temporary files and serialized snapshots
> afaict don't require a dedicated process - they're only relevant during
> startup. We could use the approach of renaming the directory out of the way as
> done in this patchset but perform the cleanup in the startup process after
> we're up.

Perhaps this is a good place to start.  As I mentioned above, IME the
temporary file cleanup is the most common problem, so I think even getting
that one fixed would be a huge improvement.

[0] 
https://postgr.es/m/CALj2ACXkkSL8EBpR7m%3DMt%3DyRGBhevcCs3x4fsp3Bc-D13yyHOg%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Andres Freund

Hi,

On 2022-02-17 13:00:22 -0800, Nathan Bossart wrote:
> Okay.  So IIUC the problem might already exist today, but offloading these
> tasks to a separate process could make it more likely.

Vastly more, yes. Before checkpoints not happening would be a (but not a
great) form of backpressure. You can't cancel them without triggering a
crash-restart. Whereas custodian can be cancelled etc.

As I said before, I think this is tackling things from the wrong end. Instead
of moving the sometimes expensive task out of the way, but still expensive,
the focus should be to make the expensive task cheaper.

As far as I understand, the primary concern are logical decoding serialized
snapshots, because a lot of them can accumulate if there e.g. is an old unused
/ far behind slot. It should be easy to reduce the number of those snapshots
by e.g. eliding some redundant ones. Perhaps we could also make backends in
logical decoding occasionally do a bit of cleanup themselves.

I've not seen reports of the number of mapping files to be an real issue?

The improvements around deleting temporary files and serialized snapshots
afaict don't require a dedicated process - they're only relevant during
startup. We could use the approach of renaming the directory out of the way as
done in this patchset but perform the cleanup in the startup process after
we're up.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Nathan Bossart

On Thu, Feb 17, 2022 at 11:27:09AM -0800, Andres Freund wrote:
> On 2022-02-17 10:23:37 -0800, Nathan Bossart wrote:
>> On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote:
>> > They're accessed by xid. The LSN is just for cleanup. Accessing files
>> > left over from a previous transaction with the same xid wouldn't be
>> > good - we'd read wrong catalog state for decoding...
>> 
>> Okay, that part makes sense to me.  However, I'm still confused about how
>> this is handled today and why moving cleanup to a separate auxiliary
>> process makes matters worse.
> 
> Right now cleanup happens every checkpoint. So cleanup can't be deferred all
> that far. We currently include a bunch of 32bit xids inside checkspoints, so
> if they're rarer than 2^31-1, we're in trouble independent of logical
> decoding.
> 
> But with this patch cleanup of logical decoding mapping files (and other
> pieces) can be *indefinitely* deferred, without being noticeable.

I see.  The custodian should ordinarily remove the files as quickly as
possible.  In fact, I bet it will typically line up with checkpoints for
most users, as the checkpointer will set the latch.  However, if there are
many temporary files to clean up, removing the logical decoding files could
be delayed for some time, as you said.

> One possible way to improve this would be to switch the on-disk filenames to
> be based on 64bit xids. But that might also present some problems (file name
> length, cost of converting 32bit xids to 64bit xids).

Okay.

>> I've done quite a bit of reading, and I haven't found anything that seems
>> intended to prevent this problem.  Do you have any pointers?
> 
> I don't know if we have an iron-clad enforcement of checkpoints happening
> every 2*31-1 xids. It's very unlikely to happen - you'd run out of space
> etc. But it'd be good to have something better than that.

Okay.  So IIUC the problem might already exist today, but offloading these
tasks to a separate process could make it more likely.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Andres Freund

Hi,

On 2022-02-17 10:23:37 -0800, Nathan Bossart wrote:
> On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote:
> > They're accessed by xid. The LSN is just for cleanup. Accessing files
> > left over from a previous transaction with the same xid wouldn't be
> > good - we'd read wrong catalog state for decoding...
> 
> Okay, that part makes sense to me.  However, I'm still confused about how
> this is handled today and why moving cleanup to a separate auxiliary
> process makes matters worse.

Right now cleanup happens every checkpoint. So cleanup can't be deferred all
that far. We currently include a bunch of 32bit xids inside checkspoints, so
if they're rarer than 2^31-1, we're in trouble independent of logical
decoding.

But with this patch cleanup of logical decoding mapping files (and other
pieces) can be *indefinitely* deferred, without being noticeable.

One possible way to improve this would be to switch the on-disk filenames to
be based on 64bit xids. But that might also present some problems (file name
length, cost of converting 32bit xids to 64bit xids).

> I've done quite a bit of reading, and I haven't found anything that seems
> intended to prevent this problem.  Do you have any pointers?

I don't know if we have an iron-clad enforcement of checkpoints happening
every 2*31-1 xids. It's very unlikely to happen - you'd run out of space
etc. But it'd be good to have something better than that.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Nathan Bossart

On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote:
> On 2022-02-16 20:14:04 -0800, Nathan Bossart wrote:
>> >> - while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
>> >> + while (!ShutdownRequestPending &&
>> >> +(spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != 
>> >> NULL)
>> >
>> > Uh, huh? It strikes me as a supremely bad idea to have functions *silently*
>> > not do their jobs when ShutdownRequestPending is set, particularly without 
>> > a
>> > huge fat comment.
>>
>> The idea was to avoid delaying shutdown because we're waiting for the
>> custodian to finish relatively nonessential tasks.  Another option might be
>> to just exit immediately when the custodian receives a shutdown request.
> 
> I think we should just not do either of these and let the functions
> finish. For the cases where shutdown really needs to be immediate
> there's, uhm, immediate mode shutdowns.

Alright.

>> > Why does this not open us up to new xid wraparound issues? Before there 
>> > was a
>> > hard bound on how long these files could linger around. Now there's not
>> > anymore.
>>
>> Sorry, I'm probably missing something obvious, but I'm not sure how this
>> adds transaction ID wraparound risk.  These files are tied to LSNs, and
>> AFAIK they won't impact slots' xmins.
> 
> They're accessed by xid. The LSN is just for cleanup. Accessing files
> left over from a previous transaction with the same xid wouldn't be
> good - we'd read wrong catalog state for decoding...

Okay, that part makes sense to me.  However, I'm still confused about how
this is handled today and why moving cleanup to a separate auxiliary
process makes matters worse.  I've done quite a bit of reading, and I
haven't found anything that seems intended to prevent this problem.  Do you
have any pointers?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Andres Freund

Hi,

On 2022-02-16 20:14:04 -0800, Nathan Bossart wrote:
> >> -  while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
> >> +  while (!ShutdownRequestPending &&
> >> + (spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != 
> >> NULL)
> >
> > Uh, huh? It strikes me as a supremely bad idea to have functions *silently*
> > not do their jobs when ShutdownRequestPending is set, particularly without a
> > huge fat comment.
>
> The idea was to avoid delaying shutdown because we're waiting for the
> custodian to finish relatively nonessential tasks.  Another option might be
> to just exit immediately when the custodian receives a shutdown request.

I think we should just not do either of these and let the functions
finish. For the cases where shutdown really needs to be immediate
there's, uhm, immediate mode shutdowns.


> > Why does this not open us up to new xid wraparound issues? Before there was 
> > a
> > hard bound on how long these files could linger around. Now there's not
> > anymore.
>
> Sorry, I'm probably missing something obvious, but I'm not sure how this
> adds transaction ID wraparound risk.  These files are tied to LSNs, and
> AFAIK they won't impact slots' xmins.

They're accessed by xid. The LSN is just for cleanup. Accessing files
left over from a previous transaction with the same xid wouldn't be
good - we'd read wrong catalog state for decoding...

Andres

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Nathan Bossart

Hi Andres,

I appreciate the feedback.

On Wed, Feb 16, 2022 at 05:50:52PM -0800, Andres Freund wrote:
>> +/* Since not using PG_TRY, must reset error stack by hand */
>> +if (sigsetjmp(local_sigjmp_buf, 1) != 0)
>> +{
> 
> I also think it's a bad idea to introduce even more copies of the error
> handling body.  I think we need to unify this. And yes, it's unfair to stick
> you with it, but it's been a while since a new aux process has been added.

+1, I think this is useful refactoring.  I might spin this off to its own
thread.

>> +/*
>> + * These operations are really just a minimal subset of
>> + * AbortTransaction().  We don't have very many resources to 
>> worry
>> + * about.
>> + */
> 
> Given what you're proposing this for, are you actually confident that we don't
> need more than this?

I will give this a closer look.

>> +extern void RemovePgTempDir(const char *tmpdirname, bool missing_ok,
>> +bool unlink_all);
> 
> I don't like functions with multiple consecutive booleans, they tend to get
> swapped around. Why not just split unlink_all=true/false into different
> functions?

Will do.

>> Subject: [PATCH v5 3/8] Split pgsql_tmp cleanup into two stages.
>>
>> First, pgsql_tmp directories will be renamed to stage them for
>> removal.
> 
> What if the target name already exists?

The integer at the end of the target name is incremented until we find a
unique name.

>> Note that temporary relation files cannot be cleaned up via the
>> aforementioned strategy and will not be offloaded to the custodian.
> 
> This should be in the prior commit message, otherwise people will ask the same
> question as I did.

Will do.

>> +/*
>> + * Find a name for the stage directory.  We just increment an integer 
>> at the
>> + * end of the name until we find one that doesn't exist.
>> + */
>> +for (int n = 0; n <= INT_MAX; n++)
>> +{
>> +snprintf(stage_path, sizeof(stage_path), "%s/%s%d", parent_path,
>> + PG_TEMP_DIR_TO_REMOVE_PREFIX, n);
> 
> Uninterruptible loops up to INT_MAX do not seem like a good idea.

I modeled this after ChooseRelationName() in indexcmds.c.  Looking again, I
see that it loops forever until a unique name is found.  I suspect this is
unlikely to be a problem in practice.  What strategy would you recommend
for choosing a unique name?  Should we just append a couple of random
characters?

>> +dir = AllocateDir(stage_path);
>> +if (dir == NULL)
>> +{
> 
> Why not just use stat()? That's cheaper, and there's no
> time-to-check-time-to-use issue here, we're the only one writing.

I'm not sure why I didn't use stat().  I will update this.

>> -while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
>> +while (!ShutdownRequestPending &&
>> +   (spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != 
>> NULL)
> 
> Uh, huh? It strikes me as a supremely bad idea to have functions *silently*
> not do their jobs when ShutdownRequestPending is set, particularly without a
> huge fat comment.

The idea was to avoid delaying shutdown because we're waiting for the
custodian to finish relatively nonessential tasks.  Another option might be
to just exit immediately when the custodian receives a shutdown request.

>> +/*
>> + * If we just staged some pgsql_tmp directories for removal, wake up the
>> + * custodian process so that it deletes all the files in the staged
>> + * directories as well as the directories themselves.
>> + */
>> +if (stage && ProcGlobal->custodianLatch)
>> +SetLatch(ProcGlobal->custodianLatch);
> 
> Just signalling without letting the custodian know what it's expected to do
> strikes me as a bad idea.

Good point.  I will work on that.

>> From 9c2013d53cc5c857ef8aca3df044613e66215aee Mon Sep 17 00:00:00 2001
>> From: Nathan Bossart 
>> Date: Sun, 5 Dec 2021 22:02:40 -0800
>> Subject: [PATCH v5 5/8] Move removal of old serialized snapshots to 
>> custodian.
>>
>> This was only done during checkpoints because it was a convenient
>> place to put it.  However, if there are many snapshots to remove,
>> it can significantly extend checkpoint time.  To avoid this, move
>> this work to the newly-introduced custodian process.
>> ---
>>  src/backend/access/transam/xlog.c   |  2 --
>>  src/backend/postmaster/custodian.c  | 11 +++
>>  src/backend/replication/logical/snapbuild.c | 13 +++--
>>  src/include/replication/snapbuild.h |  2 +-
>>  4 files changed, 19 insertions(+), 9 deletions(-)
> 
> Why does this not open us up to new xid wraparound issues? Before there was a
> hard bound on how long these files could linger around. Now there's not
> anymore.

Sorry, I'm probably missing something obvious, but I'm not sure how this
adds transaction ID wraparound risk.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Andres Freund

Hi,

On 2022-02-16 16:50:57 -0800, Nathan Bossart wrote:
> + * The custodian process is new as of Postgres 15.

I think this kind of comment tends to age badly and not be very useful.


> It's main purpose is to
> + * offload tasks that could otherwise delay startup and checkpointing, but
> + * it needn't be restricted to just those things.  Offloaded tasks should
> + * not be synchronous (e.g., checkpointing shouldn't need to wait for the
> + * custodian to complete a task before proceeding).  Also, ensure that any
> + * offloaded tasks are either not required during single-user mode or are
> + * performed separately during single-user mode.
> + *
> + * The custodian is not an essential process and can shutdown quickly when
> + * requested.  The custodian will wake up approximately once every 5
> + * minutes to perform its tasks, but backends can (and should) set its
> + * latch to wake it up sooner.

Hm. This kind policy makes it easy to introduce bugs where the occasional runs
mask forgotten notifications etc.


> + * Normal termination is by SIGTERM, which instructs the bgwriter to
> + * exit(0).

s/bgwriter/.../

> Emergency termination is by SIGQUIT; like any backend, the
> + * custodian will simply abort and exit on SIGQUIT.
> + *
> + * If the custodian exits unexpectedly, the postmaster treats that the same
> + * as a backend crash: shared memory may be corrupted, so remaining
> + * backends should be killed by SIGQUIT and then a recovery cycle started.

This doesn't really seem useful stuff to me.



> + /*
> +  * If an exception is encountered, processing resumes here.
> +  *
> +  * You might wonder why this isn't coded as an infinite loop around a
> +  * PG_TRY construct.  The reason is that this is the bottom of the
> +  * exception stack, and so with PG_TRY there would be no exception 
> handler
> +  * in force at all during the CATCH part.  By leaving the outermost 
> setjmp
> +  * always active, we have at least some chance of recovering from an 
> error
> +  * during error recovery.  (If we get into an infinite loop thereby, it
> +  * will soon be stopped by overflow of elog.c's internal state stack.)
> +  *
> +  * Note that we use sigsetjmp(..., 1), so that the prevailing signal 
> mask
> +  * (to wit, BlockSig) will be restored when longjmp'ing to here.  Thus,
> +  * signals other than SIGQUIT will be blocked until we complete error
> +  * recovery.  It might seem that this policy makes the HOLD_INTERRUPS()
> +  * call redundant, but it is not since InterruptPending might be set
> +  * already.
> +  */

I think it's bad to copy this comment into even more places.


> + /* Since not using PG_TRY, must reset error stack by hand */
> + if (sigsetjmp(local_sigjmp_buf, 1) != 0)
> + {

I also think it's a bad idea to introduce even more copies of the error
handling body.  I think we need to unify this. And yes, it's unfair to stick
you with it, but it's been a while since a new aux process has been added.


> + /*
> +  * These operations are really just a minimal subset of
> +  * AbortTransaction().  We don't have very many resources to 
> worry
> +  * about.
> +  */

Given what you're proposing this for, are you actually confident that we don't
need more than this?


> From d9826f75ad2259984d55fc04622f0b91ebbba65a Mon Sep 17 00:00:00 2001
> From: Nathan Bossart 
> Date: Sun, 5 Dec 2021 19:38:20 -0800
> Subject: [PATCH v5 2/8] Also remove pgsql_tmp directories during startup.
>
> Presently, the server only removes the contents of the temporary
> directories during startup, not the directory itself.  This changes
> that to prepare for future commits that will move temporary file
> cleanup to a separate auxiliary process.

Is this actually safe? Is there a guarantee no process can access a temp table
stored in one of these? Because without WAL guaranteeing consistency, we can't
just access e.g. temp tables written before a crash.


> +extern void RemovePgTempDir(const char *tmpdirname, bool missing_ok,
> + bool unlink_all);

I don't like functions with multiple consecutive booleans, they tend to get
swapped around. Why not just split unlink_all=true/false into different
functions?


> Subject: [PATCH v5 3/8] Split pgsql_tmp cleanup into two stages.
>
> First, pgsql_tmp directories will be renamed to stage them for
> removal.

What if the target name already exists?


> Then, all files in pgsql_tmp are removed before removing
> the staged directories themselves.  This change is being made in
> preparation for a follow-up change to offload most temporary file
> cleanup to the new custodian process.
>
> Note that temporary relation files cannot be cleaned up via the
> aforementioned strategy and will not be offloaded to the custodian.

This should be in the prior commit message, otherwise people wil

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Nathan Bossart

Here is another rebased patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From c11a6893d2d509df1389a1c03081b6cc563d5683 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v5 1/8] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 214 
 src/backend/postmaster/postmaster.c |  44 -
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  17 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 11 files changed, 301 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index dbbeac5a82..1b7aae60f5 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 0587e45920..7eae34884d 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..5f2b647544
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,214 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process is new as of Postgres 15.  It's main purpose is to
+ * offload tasks that could otherwise delay startup and checkpointing, but
+ * it needn't be restricted to just those things.  Offloaded tasks should
+ * not be synchronous (e.g., checkpointing shouldn't need to wait for the
+ * custodian to complete a task before proceeding).  Also, ensure that any
+ * offloaded tasks are either not required during single-user mode or are
+ * performed separately during single-user mode.
+ *
+ * The custodian is not an essential process and can shutdown quickly when
+ * requested.  The custodian will wake up approximately once every 5
+ * minutes to perform its tasks, but backends can (and should) set its
+ * latch to wake it up sooner.
+ *
+ * Normal termination is by SIGTERM, which instructs the bgwriter to
+ * exit(0).  Emergency termination is by SIGQUIT; like any backend, the
+ * custodian will simply abort and exit on SIGQUIT.
+ *
+ * If the custodian exits unexpectedly, the postmaster treats that the same
+ * as a backend crash: shared memory may be corrupted, so remaining
+ * backends should be killed by SIGQUIT and then a recovery cycle started.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "utils/memutils.h"
+
+#define CUSTODIAN_TIMEOUT_S (300)		/* 5 minutes */
+
+/*
+ * Main entry point for custodian process
+ *
+ * This is invoked from AuxiliaryProcessMain, which has already created the
+ * basic execution environment, but not enabled signals yet.
+ */
+void
+CustodianMain(void)
+{
+	sigjmp_buf	local_sigjmp_buf;
+	MemoryContext custodian_context;
+
+	/*
+	 * Properly accept or ignore signals that might be sent

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-11 Thread Nathan Bossart

Here is a rebased patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 506aa95dd77f16dc64d7fe9c52ca67dd3c10212e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v4 1/8] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 213 
 src/backend/postmaster/postmaster.c |  44 -
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  17 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 11 files changed, 300 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index dbbeac5a82..1b7aae60f5 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 0587e45920..7eae34884d 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..dd86f0f5ce
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,213 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process is new as of Postgres 15.  It's main purpose is to
+ * offload tasks that could otherwise delay startup and checkpointing, but
+ * it needn't be restricted to just those things.  Offloaded tasks should
+ * not be synchronous (e.g., checkpointing shouldn't need to wait for the
+ * custodian to complete a task before proceeding).  Also, ensure that any
+ * offloaded tasks are either not required during single-user mode or are
+ * performed separately during single-user mode.
+ *
+ * The custodian is not an essential process and can shutdown quickly when
+ * requested.  The custodian will wake up approximately once every 5
+ * minutes to perform its tasks, but backends can (and should) set its
+ * latch to wake it up sooner.
+ *
+ * Normal termination is by SIGTERM, which instructs the bgwriter to
+ * exit(0).  Emergency termination is by SIGQUIT; like any backend, the
+ * custodian will simply abort and exit on SIGQUIT.
+ *
+ * If the custodian exits unexpectedly, the postmaster treats that the same
+ * as a backend crash: shared memory may be corrupted, so remaining
+ * backends should be killed by SIGQUIT and then a recovery cycle started.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "utils/memutils.h"
+
+#define CUSTODIAN_TIMEOUT_S (300)		/* 5 minutes */
+
+/*
+ * Main entry point for custodian process
+ *
+ * This is invoked from AuxiliaryProcessMain, which has already created the
+ * basic execution environment, but not enabled signals yet.
+ */
+void
+CustodianMain(void)
+{
+	sigjmp_buf	local_sigjmp_buf;
+	MemoryContext custodian_context;
+
+	/*
+	 * Properly accept or ignore signals that might be sent to us.
+	 */
+	pqsignal(SIGHUP

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-18 Thread Bossart, Nathan

On 1/14/22, 11:26 PM, "Bharath Rupireddy" 
 wrote:
> On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan  wrote:
>> I'd personally like to avoid creating two code paths for the same
>> thing.  Are there really cases when this one extra auxiliary process
>> would be too many?  And if so, how would a user know when to adjust
>> this GUC?  I understand the point that we should introduce new
>> processes sparingly to avoid burdening low-end systems, but I don't
>> think we should be afraid to add new ones when it is needed.
>
> IMO, having a GUC for enabling/disabling this new worker and it's
> related code would be a better idea. The reason is that if the
> postgres has no replication slots at all(which is quite possible in
> real stand-alone production environments) or if the file enumeration
> (directory traversal and file removal) is fast enough on the servers,
> there's no point having this new worker, the checkpointer itself can
> take care of the work as it is doing today.

IMO introducing a GUC wouldn't be doing users many favors.  Their
cluster might work just fine for a long time before they begin
encountering problems during startups/checkpoints.  Once the user
discovers the underlying reason, they have to then find a GUC for
enabling a special background worker that makes this problem go away.
Why not just fix the problem for everybody by default?

I've been thinking about what other approaches we could take besides
creating more processes.  The root of the problem seems to be that
there are a number of tasks that are performed synchronously that can
take a long time.  The process approach essentially makes these tasks
asynchronous so that they do not block startup and checkpointing.  But
perhaps this can be done in an existing process, possibly even the
checkpointer.  Like the current WAL pre-allocation patch, we could do
this work when the checkpointer isn't checkpointing, and we could also
do small amounts of work in CheckpointWriteDelay() (or a new function
called in a similar way).  In theory, this would help avoid delaying
checkpoints too long while doing cleanup at every opportunity to lower
the chances it falls far behind.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Bharath Rupireddy

On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan  wrote:
>
> On 1/14/22, 3:43 AM, "Maxim Orlov"  wrote:
> > The code seems to be in good condition. All the tests are running ok with 
> > no errors.
>
> Thanks for your review.
>
> > I like the whole idea of shifting additional checkpointer jobs as much as 
> > possible to another worker. In my view, it is more appropriate to call this 
> > worker "bg cleaner" or "bg file cleaner" or smth.

I personally prefer "background cleaner" as the new process name in
line with "background writer" and "background worker".

> > It could be useful for systems with high load, which may deal with deleting 
> > many files at once, but I'm not sure about "small" installations. Extra bg 
> > worker need more resources to do occasional deletion of small amounts of 
> > files. I really do not know how to do it better, maybe to have two 
> > different code paths switched by GUC?
>
> I'd personally like to avoid creating two code paths for the same
> thing.  Are there really cases when this one extra auxiliary process
> would be too many?  And if so, how would a user know when to adjust
> this GUC?  I understand the point that we should introduce new
> processes sparingly to avoid burdening low-end systems, but I don't
> think we should be afraid to add new ones when it is needed.

IMO, having a GUC for enabling/disabling this new worker and it's
related code would be a better idea. The reason is that if the
postgres has no replication slots at all(which is quite possible in
real stand-alone production environments) or if the file enumeration
(directory traversal and file removal) is fast enough on the servers,
there's no point having this new worker, the checkpointer itself can
take care of the work as it is doing today.

> That being said, if making the extra worker optional addresses the
> concerns about resource usage, maybe we should consider it.  Justin
> suggested using something like max_parallel_maintenance_workers
> upthread [0].

I don't think having this new process is built as part of
max_parallel_maintenance_workers, instead I prefer to have it as an
auxiliary process much like "background writer", "wal writer" and so
on.

I think now it's the time for us to run some use cases and get the
perf reports to see how beneficial this new process is going to be, in
terms of improving the checkpoint timings.

> > Should we also think about adding WAL preallocation into custodian worker 
> > from the patch "Pre-alocationg WAL files" [1] ?
>
> This was brought up in the pre-allocation thread [1].  I don't think
> the custodian process would be the right place for it, and I'm also
> not as concerned about it because it will generally be a small, fixed,
> and configurable amount of work.  In any case, I don't sense a ton of
> support for a new auxiliary process in this thread, so I'm hesitant to
> go down the same path for pre-allocation.
>
> [0] https://postgr.es/m/20211213171935.GX17618%40telsasoft.com
> [1] https://postgr.es/m/B2ACCC5A-F9F2-41D9-AC3B-251362A0A254%40amazon.com

I think the idea of weaving every non-critical task to a common
background process is a good idea but let's not mix up with the new
background cleaner process here for now, at least until we get some
numbers and prove that the idea proposed here will be beneficial.

Regards,
Bharath Rupireddy.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Bossart, Nathan

On 1/14/22, 3:43 AM, "Maxim Orlov"  wrote:
> The code seems to be in good condition. All the tests are running ok with no 
> errors.

Thanks for your review.

> I like the whole idea of shifting additional checkpointer jobs as much as 
> possible to another worker. In my view, it is more appropriate to call this 
> worker "bg cleaner" or "bg file cleaner" or smth.
>
> It could be useful for systems with high load, which may deal with deleting 
> many files at once, but I'm not sure about "small" installations. Extra bg 
> worker need more resources to do occasional deletion of small amounts of 
> files. I really do not know how to do it better, maybe to have two different 
> code paths switched by GUC?

I'd personally like to avoid creating two code paths for the same
thing.  Are there really cases when this one extra auxiliary process
would be too many?  And if so, how would a user know when to adjust
this GUC?  I understand the point that we should introduce new
processes sparingly to avoid burdening low-end systems, but I don't
think we should be afraid to add new ones when it is needed.

That being said, if making the extra worker optional addresses the
concerns about resource usage, maybe we should consider it.  Justin
suggested using something like max_parallel_maintenance_workers
upthread [0].

> Should we also think about adding WAL preallocation into custodian worker 
> from the patch "Pre-alocationg WAL files" [1] ?

This was brought up in the pre-allocation thread [1].  I don't think
the custodian process would be the right place for it, and I'm also
not as concerned about it because it will generally be a small, fixed,
and configurable amount of work.  In any case, I don't sense a ton of
support for a new auxiliary process in this thread, so I'm hesitant to
go down the same path for pre-allocation.

Nathan

[0] https://postgr.es/m/20211213171935.GX17618%40telsasoft.com
[1] https://postgr.es/m/B2ACCC5A-F9F2-41D9-AC3B-251362A0A254%40amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Maxim Orlov

The code seems to be in good condition. All the tests are running ok with
no errors.

I like the whole idea of shifting additional checkpointer jobs as much
as possible to another worker. In my view, it is more appropriate to call
this worker "bg cleaner" or "bg file cleaner" or smth.

It could be useful for systems with high load, which may deal with deleting
many files at once, but I'm not sure about "small" installations. Extra bg
worker need more resources to do occasional deletion of small amounts of
files. I really do not know how to do it better, maybe to have two
different code paths switched by GUC?

Should we also think about adding WAL preallocation into custodian worker
from the patch "Pre-alocationg WAL files" [1] ?

[1]
https://www.postgresql.org/message-id/flat/20201225200953.jjkrytlrzojbn...@alap3.anarazel.de
-- 
Best regards,
Maxim Orlov.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-02 Thread Amul Sul

On Mon, Jan 3, 2022 at 2:56 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-12-14 20:23:57 +, Bossart, Nathan wrote:
> > As promised, here is v2.  This patch set includes handling for all
> > four tasks noted upthread.  I'd still consider this a work-in-
> > progress, as I've done minimal testing.  At the very least, it should
> > demonstrate what an auxiliary process approach might look like.
>
> This generates a compiler warning:
> https://cirrus-ci.com/task/5740581082103808?logs=mingw_cross_warning#L378
>

Somehow, I am not getting these compiler warnings on the latest master
head (69872d0bbe6).

Here are the few minor comments for the v2 version, I thought would help:

+ * Copyright (c) 2021, PostgreSQL Global Development Group

Time to change the year :)
--

+
+   /* These operations are really just a minimal subset of
+* AbortTransaction().  We don't have very many resources to worry
+* about.
+*/

Incorrect formatting, the first line should be empty in the multiline
code comment.
--

+   XLogRecPtr  logical_rewrite_mappings_cutoff;/* can remove
older mappings */
+   XLogRecPtr  logical_rewrite_mappings_cutoff_set;

Look like logical_rewrite_mappings_cutoff gets to set only once and
never get reset, if it is true then I think that variable can be
skipped completely and set the initial logical_rewrite_mappings_cutoff
to InvalidXLogRecPtr, that will do the needful.
--

Regards,
Amul

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-02 Thread Andres Freund

Hi,

On 2021-12-14 20:23:57 +, Bossart, Nathan wrote:
> As promised, here is v2.  This patch set includes handling for all
> four tasks noted upthread.  I'd still consider this a work-in-
> progress, as I've done minimal testing.  At the very least, it should
> demonstrate what an auxiliary process approach might look like.

This generates a compiler warning:
https://cirrus-ci.com/task/5740581082103808?logs=mingw_cross_warning#L378

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bossart, Nathan

On 12/14/21, 12:09 PM, "Bossart, Nathan"  wrote:
> On 12/14/21, 9:00 AM, "Bruce Momjian"  wrote:
>> Have we changed temporary file handling in any recent major releases,
>> meaning is this a current problem or one already improved in PG 14.
>
> I haven't noticed any recent improvements while working in this area.

On second thought, the addition of the remove_temp_files_after_crash
GUC is arguably an improvement since it could prevent files from
accumulating after repeated crashes.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bossart, Nathan

On 12/14/21, 9:00 AM, "Bruce Momjian"  wrote:
> Have we changed temporary file handling in any recent major releases,
> meaning is this a current problem or one already improved in PG 14.

I haven't noticed any recent improvements while working in this area.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bruce Momjian

On Mon, Dec 13, 2021 at 11:05:46PM +, Bossart, Nathan wrote:
> On 12/13/21, 12:37 PM, "Robert Haas"  wrote:
> > On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan  wrote:
> >> > But against all that, if these tasks are slowing down checkpoints and
> >> > that's avoidable, that seems pretty important too. Interestingly, I
> >> > can't say that I've ever seen any of these things be a problem for
> >> > checkpoint or startup speed. I wonder why you've had a different
> >> > experience.
> >>
> >> Yeah, it's difficult for me to justify why users should suffer long
> >> periods of downtime because startup or checkpointing is taking a very
> >> long time doing things that are arguably unrelated to startup and
> >> checkpointing.
> >
> > Well sure. But I've never actually seen that happen.
> 
> I'll admit that surprises me.  As noted elsewhere [0], we were seeing
> this enough with pgsql_tmp that we started moving the directory aside
> before starting the server.  Discussions about handling this usually
> prompt questions about why there are so many temporary files in the
> first place (which is fair).  FWIW all four functions noted in my
> original message [1] are things I've seen firsthand affecting users.

Have we changed temporary file handling in any recent major releases,
meaning is this a current problem or one already improved in PG 14.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan

On 12/13/21, 12:37 PM, "Robert Haas"  wrote:
> On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan  wrote:
>> > But against all that, if these tasks are slowing down checkpoints and
>> > that's avoidable, that seems pretty important too. Interestingly, I
>> > can't say that I've ever seen any of these things be a problem for
>> > checkpoint or startup speed. I wonder why you've had a different
>> > experience.
>>
>> Yeah, it's difficult for me to justify why users should suffer long
>> periods of downtime because startup or checkpointing is taking a very
>> long time doing things that are arguably unrelated to startup and
>> checkpointing.
>
> Well sure. But I've never actually seen that happen.

I'll admit that surprises me.  As noted elsewhere [0], we were seeing
this enough with pgsql_tmp that we started moving the directory aside
before starting the server.  Discussions about handling this usually
prompt questions about why there are so many temporary files in the
first place (which is fair).  FWIW all four functions noted in my
original message [1] are things I've seen firsthand affecting users.

Nathan

[0] https://postgr.es/m/E7573D54-A8C9-40A8-89D7-0596A36ED124%40amazon.com
[1] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Robert Haas

On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan  wrote:
> > But against all that, if these tasks are slowing down checkpoints and
> > that's avoidable, that seems pretty important too. Interestingly, I
> > can't say that I've ever seen any of these things be a problem for
> > checkpoint or startup speed. I wonder why you've had a different
> > experience.
>
> Yeah, it's difficult for me to justify why users should suffer long
> periods of downtime because startup or checkpointing is taking a very
> long time doing things that are arguably unrelated to startup and
> checkpointing.

Well sure. But I've never actually seen that happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan

On 12/13/21, 9:20 AM, "Justin Pryzby"  wrote:
> On Mon, Dec 13, 2021 at 08:53:37AM -0500, Robert Haas wrote:
>> Another issue is that we don't want to increase the number of
>> processes without bound. Processes use memory and CPU resources and if
>> we run too many of them it becomes a burden on the system. Low-end
>> systems may not have too many resources in total, and high-end systems
>> can struggle to fit demanding workloads within the resources that they
>> have. Maybe it would be cheaper to do more things at once if we were
>> using threads rather than processes, but that day still seems fairly
>> far off.
>
> Maybe that's an argument that this should be a dynamic background worker
> instead of an auxilliary process.  Then maybe it would be controlled by
> max_parallel_maintenance_workers (or something similar).  The checkpointer
> would need to do these tasks itself if parallel workers were disabled or
> couldn't be launched.

I think this is an interesting idea.  I dislike the prospect of having
two code paths for all this stuff, but if it addresses the concerns
about resource usage, maybe it's worth it.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan

On 12/13/21, 5:54 AM, "Robert Haas"  wrote:
> I don't know whether this kind of idea is good or not.

Thanks for chiming in.  I have an almost-complete patch set that I'm
hoping to post to the lists in the next couple of days.

> One thing we've seen a number of times now is that entrusting the same
> process with multiple responsibilities often ends poorly. Sometimes
> it's busy with one thing when another thing really needs to be done
> RIGHT NOW. Perhaps that won't be an issue here since all of these
> things are related to checkpointing, but then the process name should
> reflect that rather than making it sound like we can just keep piling
> more responsibilities onto this process indefinitely. At some point
> that seems bound to become an issue.

Two of the tasks are cleanup tasks that checkpointing handles at the
moment, and two are cleanup tasks that are done at startup.  For now,
all of these tasks are somewhat nonessential.  There's no requirement
that any of these tasks complete in order to finish startup or
checkpointing.  In fact, outside of preventing the server from running
out of disk space, I don't think there's any requirement that these
tasks run at all.  IMO this would have to be a core tenet of a new
auxiliary process like this.

That being said, I totally understand your point.  If there were a
dozen such tasks handled by a single auxiliary process, that could
cause a new set of problems.  Your checkpointing and startup might be
fast, but you might run out of disk space because our cleanup process
can't handle it all.  So a new worker could end up becoming an
availability risk as well.

> Another issue is that we don't want to increase the number of
> processes without bound. Processes use memory and CPU resources and if
> we run too many of them it becomes a burden on the system. Low-end
> systems may not have too many resources in total, and high-end systems
> can struggle to fit demanding workloads within the resources that they
> have. Maybe it would be cheaper to do more things at once if we were
> using threads rather than processes, but that day still seems fairly
> far off.

I do agree that it is important to be very careful about adding new
processes, and if a better idea for how to handle these tasks emerges,
I will readily abandon my current approach.  Upthread, Andres
mentioned optimizing unnecessary snapshot files, and I mentioned
possibly limiting how much time startup and checkpoints spend on these
tasks.  I don't have too many details for the former, and for the
latter, I'm worried about not being able to keep up.  But if the
prospect of adding a new auxiliary process for this stuff is a non-
starter, perhaps I should explore that approach some more.

> But against all that, if these tasks are slowing down checkpoints and
> that's avoidable, that seems pretty important too. Interestingly, I
> can't say that I've ever seen any of these things be a problem for
> checkpoint or startup speed. I wonder why you've had a different
> experience.

Yeah, it's difficult for me to justify why users should suffer long
periods of downtime because startup or checkpointing is taking a very
long time doing things that are arguably unrelated to startup and
checkpointing.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Justin Pryzby

On Mon, Dec 13, 2021 at 08:53:37AM -0500, Robert Haas wrote:
> On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan  wrote:
> > Well, I haven't had a chance to look at your patch, and my patch set
> > still only has handling for CheckPointSnapBuild() and
> > RemovePgTempFiles(), but I thought I'd share what I have anyway.  I
> > split it into 5 patches:
> >
> > 0001 - Adds a new "custodian" auxiliary process that does nothing.
...
> 
> I don't know whether this kind of idea is good or not.
...
> 
> Another issue is that we don't want to increase the number of
> processes without bound. Processes use memory and CPU resources and if
> we run too many of them it becomes a burden on the system. Low-end
> systems may not have too many resources in total, and high-end systems
> can struggle to fit demanding workloads within the resources that they
> have. Maybe it would be cheaper to do more things at once if we were
> using threads rather than processes, but that day still seems fairly
> far off.

Maybe that's an argument that this should be a dynamic background worker
instead of an auxilliary process.  Then maybe it would be controlled by
max_parallel_maintenance_workers (or something similar).  The checkpointer
would need to do these tasks itself if parallel workers were disabled or
couldn't be launched.

-- 
Justin

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Robert Haas

On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan  wrote:
> Well, I haven't had a chance to look at your patch, and my patch set
> still only has handling for CheckPointSnapBuild() and
> RemovePgTempFiles(), but I thought I'd share what I have anyway.  I
> split it into 5 patches:
>
> 0001 - Adds a new "custodian" auxiliary process that does nothing.
> 0002 - During startup, remove the pgsql_tmp directories instead of
>only clearing the contents.
> 0003 - Split temporary file cleanup during startup into two stages.
>The first renames the directories, and the second clears them.
> 0004 - Moves the second stage from 0003 to the custodian process.
> 0005 - Moves CheckPointSnapBuild() to the custodian process.

I don't know whether this kind of idea is good or not.

One thing we've seen a number of times now is that entrusting the same
process with multiple responsibilities often ends poorly. Sometimes
it's busy with one thing when another thing really needs to be done
RIGHT NOW. Perhaps that won't be an issue here since all of these
things are related to checkpointing, but then the process name should
reflect that rather than making it sound like we can just keep piling
more responsibilities onto this process indefinitely. At some point
that seems bound to become an issue.

Another issue is that we don't want to increase the number of
processes without bound. Processes use memory and CPU resources and if
we run too many of them it becomes a burden on the system. Low-end
systems may not have too many resources in total, and high-end systems
can struggle to fit demanding workloads within the resources that they
have. Maybe it would be cheaper to do more things at once if we were
using threads rather than processes, but that day still seems fairly
far off.

But against all that, if these tasks are slowing down checkpoints and
that's avoidable, that seems pretty important too. Interestingly, I
can't say that I've ever seen any of these things be a problem for
checkpoint or startup speed. I wonder why you've had a different
experience.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-06 Thread Bossart, Nathan

On 12/6/21, 3:44 AM, "Bharath Rupireddy" 
 wrote:
> On Fri, Dec 3, 2021 at 11:50 PM Bossart, Nathan  wrote:
>> I might hack something together for the separate worker approach, if
>> for no other reason than to make sure I really understand how these
>> functions work.  If/when a better idea emerges, we can alter course.
>
> Thanks. As I said upthread we've been discussing the approach of
> offloading some of the checkpoint tasks like (deleting snapshot files)
> internally for quite some time and I would like to share a patch that
> adds a new background cleaner process (currently able to delete the
> logical replication snapshot files, if required can be extended to do
> other tasks as well). I don't mind if it gets rejected. Please have a
> look.

Thanks for sharing!  I've also spent some time on a patch set, which I
intend to share once I have handling for all four tasks (so far I have
handling for CheckPointSnapBuild() and RemovePgTempFiles()).  I'll
take a look at your patch as well.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-06 Thread Bharath Rupireddy

On Fri, Dec 3, 2021 at 11:50 PM Bossart, Nathan  wrote:
>
> On 12/3/21, 5:57 AM, "Bharath Rupireddy" 
>  wrote:
> > On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan  wrote:
> >>
> >> On 12/1/21, 6:48 PM, "Bharath Rupireddy" 
> >>  wrote:
> >> > +1 for the overall idea of making the checkpoint faster. In fact, we
> >> > here at our team have been thinking about this problem for a while. If
> >> > there are a lot of files that checkpoint has to loop over and remove,
> >> > IMO, that task can be delegated to someone else (maybe a background
> >> > worker called background cleaner or bg cleaner, of course, we can have
> >> > a GUC to enable or disable it). The checkpoint can just write some
> >>
> >> Right.  IMO it isn't optimal to have critical things like startup and
> >> checkpointing depend on somewhat-unrelated tasks.  I understand the
> >> desire to avoid adding additional processes, and maybe it is a bigger
> >> hammer than what is necessary to reduce the impact, but it seemed like
> >> a natural solution for this problem.  That being said, I'm all for
> >> exploring other ways to handle this.
> >
> > Having a generic background cleaner process (controllable via a few
> > GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
> > temp files etc.) or some other task on behalf of the checkpointer,
> > seems to be the easiest solution.
> >
> > I'm too open for other ideas.
>
> I might hack something together for the separate worker approach, if
> for no other reason than to make sure I really understand how these
> functions work.  If/when a better idea emerges, we can alter course.

Thanks. As I said upthread we've been discussing the approach of
offloading some of the checkpoint tasks like (deleting snapshot files)
internally for quite some time and I would like to share a patch that
adds a new background cleaner process (currently able to delete the
logical replication snapshot files, if required can be extended to do
other tasks as well). I don't mind if it gets rejected. Please have a
look.

Regards,
Bharath Rupireddy.


v1-0001-background-cleaner-to-offload-checkpoint-tasks.patch
Description: Binary data

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-03 Thread Bossart, Nathan

On 12/3/21, 5:57 AM, "Bharath Rupireddy" 
 wrote:
> On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan  wrote:
>>
>> On 12/1/21, 6:48 PM, "Bharath Rupireddy" 
>>  wrote:
>> > +1 for the overall idea of making the checkpoint faster. In fact, we
>> > here at our team have been thinking about this problem for a while. If
>> > there are a lot of files that checkpoint has to loop over and remove,
>> > IMO, that task can be delegated to someone else (maybe a background
>> > worker called background cleaner or bg cleaner, of course, we can have
>> > a GUC to enable or disable it). The checkpoint can just write some
>>
>> Right.  IMO it isn't optimal to have critical things like startup and
>> checkpointing depend on somewhat-unrelated tasks.  I understand the
>> desire to avoid adding additional processes, and maybe it is a bigger
>> hammer than what is necessary to reduce the impact, but it seemed like
>> a natural solution for this problem.  That being said, I'm all for
>> exploring other ways to handle this.
>
> Having a generic background cleaner process (controllable via a few
> GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
> temp files etc.) or some other task on behalf of the checkpointer,
> seems to be the easiest solution.
>
> I'm too open for other ideas.

I might hack something together for the separate worker approach, if
for no other reason than to make sure I really understand how these
functions work.  If/when a better idea emerges, we can alter course.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-03 Thread Bharath Rupireddy

On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan  wrote:
>
> On 12/1/21, 6:48 PM, "Bharath Rupireddy" 
>  wrote:
> > +1 for the overall idea of making the checkpoint faster. In fact, we
> > here at our team have been thinking about this problem for a while. If
> > there are a lot of files that checkpoint has to loop over and remove,
> > IMO, that task can be delegated to someone else (maybe a background
> > worker called background cleaner or bg cleaner, of course, we can have
> > a GUC to enable or disable it). The checkpoint can just write some
>
> Right.  IMO it isn't optimal to have critical things like startup and
> checkpointing depend on somewhat-unrelated tasks.  I understand the
> desire to avoid adding additional processes, and maybe it is a bigger
> hammer than what is necessary to reduce the impact, but it seemed like
> a natural solution for this problem.  That being said, I'm all for
> exploring other ways to handle this.

Having a generic background cleaner process (controllable via a few
GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
temp files etc.) or some other task on behalf of the checkpointer,
seems to be the easiest solution.

I'm too open for other ideas.

> > Another idea could be to parallelize the checkpoint i.e. IIUC, the
> > tasks that checkpoint do in CheckPointGuts are independent and if we
> > have some counters like (how many snapshot/mapping files that the
> > server generated)
>
> Could you elaborate on this?  Is your idea that the checkpointer would
> create worker processes like autovacuum does?

Yes, I was thinking that the checkpointer creates one or more dynamic
background workers (we can assume one background worker for now) to
delete the files. If a threshold of files crosses (snapshot files
count is more than this threshold), the new worker gets spawned which
would then enumerate the files and delete the unneeded ones, the
checkpointer can proceed with the other tasks and finish the
checkpointing. Having said this, I prefer the background cleaner
approach over the dynamic background worker. The advantage with the
background cleaner being that it can do other tasks (like other kinds
of file deletion).

Another idea could be that, use the existing background writer to do
the file deletion while the checkpoint is happening. But again, this
might cause problems because the bg writer flushing dirty buffers will
get delayed.

Regards,
Bharath Rupireddy.

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-02 Thread Bossart, Nathan

On 12/1/21, 6:48 PM, "Bharath Rupireddy" 
 wrote:
> +1 for the overall idea of making the checkpoint faster. In fact, we
> here at our team have been thinking about this problem for a while. If
> there are a lot of files that checkpoint has to loop over and remove,
> IMO, that task can be delegated to someone else (maybe a background
> worker called background cleaner or bg cleaner, of course, we can have
> a GUC to enable or disable it). The checkpoint can just write some

Right.  IMO it isn't optimal to have critical things like startup and
checkpointing depend on somewhat-unrelated tasks.  I understand the
desire to avoid adding additional processes, and maybe it is a bigger
hammer than what is necessary to reduce the impact, but it seemed like
a natural solution for this problem.  That being said, I'm all for
exploring other ways to handle this.

> Another idea could be to parallelize the checkpoint i.e. IIUC, the
> tasks that checkpoint do in CheckPointGuts are independent and if we
> have some counters like (how many snapshot/mapping files that the
> server generated)

Could you elaborate on this?  Is your idea that the checkpointer would
create worker processes like autovacuum does?

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-02 Thread Bossart, Nathan

On 12/1/21, 6:06 PM, "Euler Taveira"  wrote:
> Saying that a certain task is O(n) doesn't mean it needs a separate process to
> handle it. Did you have a use case or even better numbers (% of checkpoint /
> startup time) that makes your proposal worthwhile?

I don't have specific numbers on hand, but each of the four functions
I listed is something I routinely see impacting customers.

> For (3), there is already a GUC that would avoid the slowdown during startup.
> Use it if you think the startup time is more important that disk space 
> occupied
> by useless files.

Setting remove_temp_files_after_crash to false only prevents temp file
cleanup during restart after a backend crash.  It is always called for
other startups.

> For (4), you are forgetting that the on-disk state of replication slots is
> stored in the pg_replslot/SLOTNAME/state. It seems you cannot just rename the
> replication slot directory and copy the state file. What happen if there is a
> crash before copying the state file?

Good point.  I think it's possible to deal with this, though.  Perhaps
the files that should be deleted on startup should go in a separate
directory, or maybe we could devise a way to ensure the state file is
copied even if there is a crash at an inconvenient time.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Bharath Rupireddy

On Thu, Dec 2, 2021 at 1:54 AM Bossart, Nathan  wrote:
>
> Hi hackers,
>
> Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
> avoid individually syncing all database files after a crash.  However,
> as noted earlier this year [0], there are still a number of O(n) tasks
> that affect startup and checkpointing that I'd like to improve.
> Below, I've attempted to summarize each task and to offer ideas for
> improving matters.  I'll likely split each of these into its own
> thread, given there is community interest for such changes.
>
> 1) CheckPointSnapBuild(): This function loops through
>pg_logical/snapshots to remove all snapshots that are no longer
>needed.  If there are many entries in this directory, this can take
>a long time.  The note above this function indicates that this is
>done during checkpoints simply because it is convenient.  IIUC
>there is no requirement that this function actually completes for a
>given checkpoint.  My current idea is to move this to a new
>maintenance worker.
> 2) CheckPointLogicalRewriteHeap(): This function loops through
>pg_logical/mappings to remove old mappings and flush all remaining
>ones.  IIUC there is no requirement that the "remove old mappings"
>part must complete for a given checkpoint, but the "flush all
>remaining" portion allows replay after a checkpoint to only "deal
>with the parts of a mapping that have been written out after the
>checkpoint started."  Therefore, I think we should move the "remove
>old mappings" part to a new maintenance worker (probably the same
>one as for 1), and we should consider using syncfs() for the "flush
>all remaining" part.  (I suspect the main argument against the
>latter will be that it could cause IO spikes.)
> 3) RemovePgTempFiles(): This step can delay startup if there are many
>temporary files to individually remove.  This step is already
>optionally done after a crash via the remove_temp_files_after_crash
>GUC.  I propose that we have startup move the temporary file
>directories aside and create new ones, and then a separate worker
>(probably the same one from 1 and 2) could clean up the old files.
> 4) StartupReorderBuffer(): This step deletes logical slot data that
>has been spilled to disk.  This code appears to be written to avoid
>deleting different types of files in these directories, but AFAICT
>there shouldn't be any other files.  Therefore, I think we could do
>something similar to 3 (i.e., move the directories aside during
>startup and clean them up via a new maintenance worker).
>
> I realize adding a new maintenance worker might be a bit heavy-handed,
> but I think it would be nice to have somewhere to offload tasks that
> really shouldn't impact startup and checkpointing.  I imagine such a
> process would come in handy down the road, too.  WDYT?

+1 for the overall idea of making the checkpoint faster. In fact, we
here at our team have been thinking about this problem for a while. If
there are a lot of files that checkpoint has to loop over and remove,
IMO, that task can be delegated to someone else (maybe a background
worker called background cleaner or bg cleaner, of course, we can have
a GUC to enable or disable it). The checkpoint can just write some
marker files (for instance, it can write snapshot_ files
with file name itself representing the cutoff lsn so that the new bg
cleaner can remove the snapshot files, similarly it can write marker
files for other file removals). Having said that, a new bg cleaner
deleting the files asynchronously on behalf of checkpoint can look an
overkill until we have some numbers that we could save with this
approach. For this purpose, I did a small experiment to figure out how
much usually file deletion takes [1] on a SSD, for 1million files
8seconds, I'm sure it will be much more on HDD.

The bg cleaner can also be used for RemovePgTempFiles, probably the
postmaster just renaming the pgsql_temp to something
pgsql_temp_delete, then proceeding with the server startup, the bg
cleaner can then delete the files.
Also, we could do something similar for removing/recycling old xlog
files and StartupReorderBuffer.

Another idea could be to parallelize the checkpoint i.e. IIUC, the
tasks that checkpoint do in CheckPointGuts are independent and if we
have some counters like (how many snapshot/mapping files that the
server generated)

[1] on SSD:
deletion of 100 files took 7.930380 seconds
deletion of 50 files took 3.921676 seconds
deletion of 10 files took 0.768772 seconds
deletion of 5 files took 0.400623 seconds
deletion of 1 files took 0.077565 seconds
deletion of 1000 files took 0.006232 seconds

Regards,
Bharath Rupireddy.

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Euler Taveira

On Wed, Dec 1, 2021, at 9:19 PM, Bossart, Nathan wrote:
> On 12/1/21, 2:56 PM, "Andres Freund"  wrote:
> > On 2021-12-01 20:24:25 +, Bossart, Nathan wrote:
> >> I realize adding a new maintenance worker might be a bit heavy-handed,
> >> but I think it would be nice to have somewhere to offload tasks that
> >> really shouldn't impact startup and checkpointing.  I imagine such a
> >> process would come in handy down the road, too.  WDYT?
> >
> > -1. I think the overhead of an additional worker is disproportional here. 
> > And
> > there's simplicity benefits in having a predictable cleanup interlock as 
> > well.
> 
> Another idea I had was to put some upper limit on how much time is
> spent on such tasks.  For example, a checkpoint would only spend X
> minutes on CheckPointSnapBuild() before giving up until the next one.
> I think the main downside of that approach is that it could lead to
> unbounded growth, so perhaps we would limit (or even skip) such tasks
> only for end-of-recovery and shutdown checkpoints.  Perhaps the
> startup tasks could be limited in a similar fashion.
Saying that a certain task is O(n) doesn't mean it needs a separate process to
handle it. Did you have a use case or even better numbers (% of checkpoint /
startup time) that makes your proposal worthwhile?

I would try to optimize (1) and (2). However, delayed removal can be a
long-term issue if the new routine cannot keep up with the pace of file
creation (specially if the checkpoints are far apart).

For (3), there is already a GUC that would avoid the slowdown during startup.
Use it if you think the startup time is more important that disk space occupied
by useless files.

For (4), you are forgetting that the on-disk state of replication slots is
stored in the pg_replslot/SLOTNAME/state. It seems you cannot just rename the
replication slot directory and copy the state file. What happen if there is a
crash before copying the state file?

While we are talking about items (1), (2) and (4), we could probably have an
option to create some ephemeral logical decoding files into ramdisk (similar to
statistics directory). I wouldn't like to hijack this thread but this proposal
could alleviate the possible issues that you pointed out. If people are
interested in this proposal, I can start a new thread about it.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Bossart, Nathan

On 12/1/21, 2:56 PM, "Andres Freund"  wrote:
> On 2021-12-01 20:24:25 +, Bossart, Nathan wrote:
>> I realize adding a new maintenance worker might be a bit heavy-handed,
>> but I think it would be nice to have somewhere to offload tasks that
>> really shouldn't impact startup and checkpointing.  I imagine such a
>> process would come in handy down the road, too.  WDYT?
>
> -1. I think the overhead of an additional worker is disproportional here. And
> there's simplicity benefits in having a predictable cleanup interlock as well.

Another idea I had was to put some upper limit on how much time is
spent on such tasks.  For example, a checkpoint would only spend X
minutes on CheckPointSnapBuild() before giving up until the next one.
I think the main downside of that approach is that it could lead to
unbounded growth, so perhaps we would limit (or even skip) such tasks
only for end-of-recovery and shutdown checkpoints.  Perhaps the
startup tasks could be limited in a similar fashion.

> I think particularly for the snapshot stuff it'd be better to optimize away
> unnecessary snapshot files, rather than making the cleanup more asynchronous.

I can look into this.  Any pointers would be much appreciated.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Andres Freund

Hi,

On 2021-12-01 20:24:25 +, Bossart, Nathan wrote:
> I realize adding a new maintenance worker might be a bit heavy-handed,
> but I think it would be nice to have somewhere to offload tasks that
> really shouldn't impact startup and checkpointing.  I imagine such a
> process would come in handy down the road, too.  WDYT?

-1. I think the overhead of an additional worker is disproportional here. And
there's simplicity benefits in having a predictable cleanup interlock as well.

I think particularly for the snapshot stuff it'd be better to optimize away
unnecessary snapshot files, rather than making the cleanup more asynchronous.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread SATYANARAYANA NARLAPURAM

+1 to the idea. I don't see a reason why checkpointer has to do all of
that. Keeping checkpoint to minimal essential work helps servers recover
faster in the event of a crash.

RemoveOldXlogFiles is also an O(N) operation that can at least be avoided
during the end of recovery (CHECKPOINT_END_OF_RECOVERY) checkpoint. When a
sufficient number of WAL files accumulated and the previous checkpoint did
not get a chance to cleanup, this can increase the unavailability of the
server.

RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);



On Wed, Dec 1, 2021 at 12:24 PM Bossart, Nathan  wrote:

> Hi hackers,
>
> Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
> avoid individually syncing all database files after a crash.  However,
> as noted earlier this year [0], there are still a number of O(n) tasks
> that affect startup and checkpointing that I'd like to improve.
> Below, I've attempted to summarize each task and to offer ideas for
> improving matters.  I'll likely split each of these into its own
> thread, given there is community interest for such changes.
>
> 1) CheckPointSnapBuild(): This function loops through
>pg_logical/snapshots to remove all snapshots that are no longer
>needed.  If there are many entries in this directory, this can take
>a long time.  The note above this function indicates that this is
>done during checkpoints simply because it is convenient.  IIUC
>there is no requirement that this function actually completes for a
>given checkpoint.  My current idea is to move this to a new
>maintenance worker.
> 2) CheckPointLogicalRewriteHeap(): This function loops through
>pg_logical/mappings to remove old mappings and flush all remaining
>ones.  IIUC there is no requirement that the "remove old mappings"
>part must complete for a given checkpoint, but the "flush all
>remaining" portion allows replay after a checkpoint to only "deal
>with the parts of a mapping that have been written out after the
>checkpoint started."  Therefore, I think we should move the "remove
>old mappings" part to a new maintenance worker (probably the same
>one as for 1), and we should consider using syncfs() for the "flush
>all remaining" part.  (I suspect the main argument against the
>latter will be that it could cause IO spikes.)
> 3) RemovePgTempFiles(): This step can delay startup if there are many
>temporary files to individually remove.  This step is already
>optionally done after a crash via the remove_temp_files_after_crash
>GUC.  I propose that we have startup move the temporary file
>directories aside and create new ones, and then a separate worker
>(probably the same one from 1 and 2) could clean up the old files.
> 4) StartupReorderBuffer(): This step deletes logical slot data that
>has been spilled to disk.  This code appears to be written to avoid
>deleting different types of files in these directories, but AFAICT
>there shouldn't be any other files.  Therefore, I think we could do
>something similar to 3 (i.e., move the directories aside during
>startup and clean them up via a new maintenance worker).
>
> I realize adding a new maintenance worker might be a bit heavy-handed,
> but I think it would be nice to have somewhere to offload tasks that
> really shouldn't impact startup and checkpointing.  I imagine such a
> process would come in handy down the road, too.  WDYT?
>
> Nathan
>
> [0] https://postgr.es/m/32B59582-AA6C-4609-B08F-2256A271F7A5%40amazon.com
>
>

85 matches

Mail list logo