Hi hackers,

As of 15251c0, when a standby encounters an incompatible parameter change,
it pauses replay so that read traffic can continue while the administrator
fixes the parameters.  Once the server is restarted, replay can continue.
Before this change, such incompatible parameter changes caused the standby
to immediately shut down.

I noticed that there was some suggestion in the thread associated with
15251c0 [0] for making this behavior configurable, but there didn't seem to
be much interest at the time.  I am interested in allowing administrators
to specify the behavior before 15251c0 (i.e., immediately shut down the
standby when an incompatible parameter change is detected).  The use-case I
have in mind is when an administrator has automation in place for adjusting
these parameters and would like to avoid stopping replay any longer than
necessary.  FWIW this is what we do in RDS.

I've attached a patch that adds a new GUC where users can specify the
action to take when an incompatible parameter change is detected on a
standby.  For now, there are just two options: 'pause' and 'shutdown'.
This new GUC is largely modeled after recovery_target_action.

I initially set out to see if it was possible to automatically adjust these
parameters on a standby, but that is considerably more difficult.  It isn't
enough to just hook into the restart_after_crash functionality since it
doesn't go back far enough in the postmaster logic.  IIUC we'd need to
reload preloaded libraries (which there is presently no support for),
recalculate MaxBackends, etc.  Another option I considered was to
automatically adjust the parameters during startup so that you just need to
restart the server.  However, we need to know for sure that the server is
going to be a hot standby, and I don't believe we have that information
where such GUC changes would need to occur (I could be wrong about this).
Anyway, for now I'm just proposing the modest change described above, but
I'd welcome any discussion about improving matters further in this area.

[0] https://postgr.es/m/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b%402ndquadrant.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 754429b5ad4c9c8b40b66c9c0ede0a7572f0e071 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Mon, 4 Apr 2022 11:56:21 -0700
Subject: [PATCH v1 1/1] Introduce insufficient_standby_setting_action.

As of 15251c0, when a standby encounters an incompatible parameter
change when replaying WAL, it will pause replay to allow read
traffic to continue while the administrator figures out the next
steps.  After fixing the parameters, the server must be restarted.

This change introduces a new GUC to allow users to indicate that
the server should immediately shut down when it encounters such an
incompatible parameter change (i.e., the behavior before 15251c0).
This may be desirable when the administrator has automation for
adjusting incompatible parameter settings and wants to avoid
stopping replay any longer than necessary.
---
 doc/src/sgml/config.sgml                      | 21 +++++++++
 doc/src/sgml/high-availability.sgml           | 14 +++---
 src/backend/access/transam/xlogrecovery.c     | 12 ++++-
 src/backend/utils/misc/guc.c                  | 12 +++++
 src/backend/utils/misc/postgresql.conf.sample | 47 ++++++++++---------
 src/include/access/xlog_internal.h            |  9 ++++
 src/include/access/xlogrecovery.h             |  1 +
 7 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a1682f6d4d..54095e56e6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4901,6 +4901,27 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-insufficient-standby-setting-action" xreflabel="insufficient_standby_setting_action">
+      <term><varname>insufficient_standby_setting_action</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>insufficient_standby_setting_action</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies what action a hot standby server should take when it
+        encounters an incompatible parameter change (see
+        <xref linkend="hot-standby-admin"/>).  The default is
+        <literal>'pause'</literal>, which means recovery will be paused.  After
+        recovery is paused due to an incompatible parameter change, unpausing
+        will cause the server to shut down.  <literal>'shutdown'</literal> means
+        that the server should immediately shut down without pausing.  This
+        parameter can only be set in the <filename>postgresql.conf</filename>
+        file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b0a653373d..39baf07adf 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2043,9 +2043,11 @@ LOG:  database system is ready to accept read-only connections
 
    <para>
     The WAL tracks changes to these parameters on the
-    primary.  If a hot standby processes WAL that indicates that the current
-    value on the primary is higher than its own value, it will log a warning
-    and pause recovery, for example:
+    primary.  When a hot standby processes WAL that indicates that the current
+    value on the primary is higher than its own value, it will take the action
+    specified by <xref linkend="guc-insufficient-standby-setting-action"/>.  If
+    this parameter is set to <literal>'pause'</literal> (the default), the
+    standby will log a warning and pause recovery.  For example:
 <screen>
 WARNING:  hot standby is not possible because of insufficient parameter settings
 DETAIL:  max_connections = 80 is a lower setting than on the primary server, where its value was 100.
@@ -2055,9 +2057,9 @@ HINT:  You can then restart the server after making the necessary configuration
 </screen>
     At that point, the settings on the standby need to be updated and the
     instance restarted before recovery can continue.  If the standby is not a
-    hot standby, then when it encounters the incompatible parameter change, it
-    will shut down immediately without pausing, since there is then no value
-    in keeping it up.
+    hot standby or <varname>insufficient_standby_setting_action</varname> is set
+    to <literal>'shutdown'</literal>, then when it encounters the incompatible
+    parameter change, it will shut down immediately without pausing.
    </para>
 
    <para>
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 39ef865ed9..9a2c0eefe9 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -74,6 +74,12 @@ const struct config_enum_entry recovery_target_action_options[] = {
 	{NULL, 0, false}
 };
 
+const struct config_enum_entry insufficient_standby_setting_action_options[] = {
+	{"pause", INSUFFICIENT_STANDBY_SETTING_ACTION_PAUSE, false},
+	{"shutdown", INSUFFICIENT_STANDBY_SETTING_ACTION_SHUTDOWN, false},
+	{NULL, 0, false}
+};
+
 /* options formerly taken from recovery.conf for archive recovery */
 char	   *recoveryRestoreCommand = NULL;
 char	   *recoveryEndCommand = NULL;
@@ -94,6 +100,9 @@ char	   *PrimarySlotName = NULL;
 char	   *PromoteTriggerFile = NULL;
 bool		wal_receiver_create_temp_slot = false;
 
+/* other GUC options */
+int			insufficient_standby_setting_action = INSUFFICIENT_STANDBY_SETTING_ACTION_PAUSE;
+
 /*
  * recoveryTargetTimeLineGoal: what the user requested, if any
  *
@@ -4532,7 +4541,8 @@ RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue
 {
 	if (currValue < minValue)
 	{
-		if (HotStandbyActiveInReplay())
+		if (HotStandbyActiveInReplay() &&
+			insufficient_standby_setting_action == INSUFFICIENT_STANDBY_SETTING_ACTION_PAUSE)
 		{
 			bool		warned_for_promote = false;
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8e9b71375c..cb5367fa9f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -602,6 +602,7 @@ static const struct config_enum_entry wal_compression_options[] = {
 extern const struct config_enum_entry wal_level_options[];
 extern const struct config_enum_entry archive_mode_options[];
 extern const struct config_enum_entry recovery_target_action_options[];
+extern const struct config_enum_entry insufficient_standby_setting_action_options[];
 extern const struct config_enum_entry sync_method_options[];
 extern const struct config_enum_entry dynamic_shared_memory_options[];
 
@@ -4923,6 +4924,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"insufficient_standby_setting_action", PGC_POSTMASTER, REPLICATION_STANDBY,
+			gettext_noop("Sets the action to perform when hot standby cannot "
+						 "continue due to insufficient parameter settings."),
+			NULL
+		},
+		&insufficient_standby_setting_action,
+		INSUFFICIENT_STANDBY_SETTING_ACTION_PAUSE, insufficient_standby_setting_action_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"trace_recovery_messages", PGC_SIGHUP, DEVELOPER_OPTIONS,
 			gettext_noop("Enables logging of recovery-related debugging information."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 94270eb0ec..1fb0c1ae9a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -329,29 +329,30 @@
 
 # These settings are ignored on a primary server.
 
-#primary_conninfo = ''			# connection string to sending server
-#primary_slot_name = ''			# replication slot on sending server
-#promote_trigger_file = ''		# file name whose presence ends recovery
-#hot_standby = on			# "off" disallows queries during recovery
-					# (change requires restart)
-#max_standby_archive_delay = 30s	# max delay before canceling queries
-					# when reading WAL from archive;
-					# -1 allows indefinite delay
-#max_standby_streaming_delay = 30s	# max delay before canceling queries
-					# when reading streaming WAL;
-					# -1 allows indefinite delay
-#wal_receiver_create_temp_slot = off	# create temp slot if primary_slot_name
-					# is not set
-#wal_receiver_status_interval = 10s	# send replies at least this often
-					# 0 disables
-#hot_standby_feedback = off		# send info from standby to prevent
-					# query conflicts
-#wal_receiver_timeout = 60s		# time that receiver waits for
-					# communication from primary
-					# in milliseconds; 0 disables
-#wal_retrieve_retry_interval = 5s	# time to wait before retrying to
-					# retrieve WAL after a failed attempt
-#recovery_min_apply_delay = 0		# minimum delay for applying changes during recovery
+#primary_conninfo = ''				# connection string to sending server
+#primary_slot_name = ''				# replication slot on sending server
+#promote_trigger_file = ''			# file name whose presence ends recovery
+#hot_standby = on				# "off" disallows queries during recovery
+						# (change requires restart)
+#max_standby_archive_delay = 30s		# max delay before canceling queries
+						# when reading WAL from archive;
+						# -1 allows indefinite delay
+#max_standby_streaming_delay = 30s		# max delay before canceling queries
+						# when reading streaming WAL;
+						# -1 allows indefinite delay
+#wal_receiver_create_temp_slot = off		# create temp slot if primary_slot_name
+						# is not set
+#wal_receiver_status_interval = 10s		# send replies at least this often
+						# 0 disables
+#hot_standby_feedback = off			# send info from standby to prevent
+						# query conflicts
+#wal_receiver_timeout = 60s			# time that receiver waits for
+						# communication from primary
+						# in milliseconds; 0 disables
+#wal_retrieve_retry_interval = 5s		# time to wait before retrying to
+						# retrieve WAL after a failed attempt
+#recovery_min_apply_delay = 0			# minimum delay for applying changes during recovery
+#insufficient_standby_setting_action = 'pause'	# 'pause', 'shutdown'
 
 # - Subscribers -
 
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index fae0bef8f5..f955549347 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -287,6 +287,15 @@ typedef enum
 	RECOVERY_TARGET_ACTION_SHUTDOWN
 }			RecoveryTargetAction;
 
+/*
+ * Insufficient hot standby parameter setting action.
+ */
+typedef enum
+{
+	INSUFFICIENT_STANDBY_SETTING_ACTION_PAUSE,
+	INSUFFICIENT_STANDBY_SETTING_ACTION_SHUTDOWN
+} InsufficientStandbySettingAction;
+
 struct LogicalDecodingContext;
 struct XLogRecordBuffer;
 
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index 0aa85d90e8..aa40ebfaa7 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT char *PrimarySlotName;
 extern PGDLLIMPORT char *recoveryRestoreCommand;
 extern PGDLLIMPORT char *recoveryEndCommand;
 extern PGDLLIMPORT char *archiveCleanupCommand;
+extern PGDLLIMPORT int insufficient_standby_setting_action;
 
 /* indirectly set via GUC system */
 extern PGDLLIMPORT TransactionId recoveryTargetXid;
-- 
2.25.1

Reply via email to