At Mon, 5 Sep 2022 11:56:33 +0200, "Drouvot, Bertrand" <bdrou...@amazon.com> 
wrote in 
> Hi,
> 
> On 3/2/22 7:37 AM, Kyotaro Horiguchi wrote:
> > At Tue, 04 Jan 2022 10:29:31 +0900 (JST), Kyotaro
> > Horiguchi<horikyota....@gmail.com> wrote in
> >> So what do you say if I propose the following?
> >>
> >> LOG:  terminating process %d to release replication slot \"%s\"
> >> because its restart_lsn %X/%X exceeds the limit %X/%X
> >> HINT: You might need to increase max_slot_wal_keep_size.
> > This version emits the following message.
> >
> > [35785:checkpointer] LOG: terminating process 36368 to release
> > replication slot "s1" because its restart_lsn 0/1F000148 exceeds the
> > limit 0/21000000
> > [35785:checkpointer] HINT: You might need to increase
> > max_slot_wal_keep_size.
> 
> As the hint is to increase max_slot_wal_keep_size, what about
> reporting the difference in size (rather than the limit lsn)?
> Something along those lines?
> 
> [35785:checkpointer] LOG: terminating process 36368 to release
> replication slot "s1" because its restart_lsn 0/1F000148 exceeds the
> limit by <NNN MB>.

Thanks! That might be more sensible exactly for the reason you
mentioned.  One issue doing that is size_pretty is dbsize.c local
function. Since the size is less than kB in many cases, we cannot use
fixed unit for that.

0001 and 0002 are the same with v5.

0003 exposes byte_size_pretty() to other modules.
0004 does the change by using byte_size_pretty()

After 0004 applied, they look like this.

> LOG:  terminating process 108413 to release replication slot "rep3" because 
> its restart_lsn 0/7000D8 exceeds the limit by 1024 kB
> HINT:  You might need to increase max_slot_wal_keep_size.

The reason for "1024 kB" instead of "1 MB" is the precise value is a
bit less than 1024 * 1024.


regards.

- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d6432aa13d3f4446a5cee4c4c33dcf5841314546 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v6 1/4] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie <a...@altmetric.com>
Author: Kyotaro Horiguchi <horikyota....@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8fec1cb4a5..8326c019cf 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1293,8 +1293,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 				ereport(LOG,
-						(errmsg("terminating process %d to release replication slot \"%s\"",
-								active_pid, NameStr(slotname))));
+						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+								active_pid, NameStr(slotname),
+								LSN_FORMAT_ARGS(restart_lsn))));
 
 				(void) kill(active_pid, SIGTERM);
 				last_signaled_pid = active_pid;
-- 
2.31.1

>From da3df65d4a24d6235b401aa4f944a0f4d3652207 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Fri, 24 Dec 2021 12:58:25 +0900
Subject: [PATCH v6 2/4] Add detailed information to slot-invalidation messages

The messages says just "exceeds max_slot_wal_keep_size" but doesn't
tell the concrete LSN of the limit. That information helps operators'
understanding on the issue.

Author: Kyotaro Horiguchi <horikyota....@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.m...@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c            | 12 ++++++++----
 src/test/recovery/t/019_replslot_limit.pl |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 8326c019cf..a30d4e93cd 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1293,9 +1293,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 				ereport(LOG,
-						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
 								active_pid, NameStr(slotname),
-								LSN_FORMAT_ARGS(restart_lsn))));
+								LSN_FORMAT_ARGS(restart_lsn),
+								LSN_FORMAT_ARGS(oldestLSN)),
+						 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 				(void) kill(active_pid, SIGTERM);
 				last_signaled_pid = active_pid;
@@ -1332,9 +1334,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotRelease();
 
 			ereport(LOG,
-					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
 							NameStr(slotname),
-							LSN_FORMAT_ARGS(restart_lsn))));
+							LSN_FORMAT_ARGS(restart_lsn),
+							LSN_FORMAT_ARGS(oldestLSN)),
+					 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 			/* done with this slot for now */
 			break;
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index ce8d36b4c6..f6c34c6887 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -186,7 +186,7 @@ for (my $i = 0; $i < 10000; $i++)
 {
 	if (find_in_log(
 			$node_primary,
-			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size",
+			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit [0-9A-F/]+",
 			$logstart))
 	{
 		$invalidated = 1;
-- 
2.31.1

>From 23daf4cd8538fa47fe7ebba8ea644d63c6248f30 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Tue, 6 Sep 2022 11:42:10 +0900
Subject: [PATCH v6 3/4] Expose byte_size_pretty in dbsize.c

---
 src/backend/utils/adt/dbsize.c | 25 ++++++++++++++++---------
 src/include/utils/dbsize.h     | 16 ++++++++++++++++
 2 files changed, 32 insertions(+), 9 deletions(-)
 create mode 100644 src/include/utils/dbsize.h

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 34efa121b4..abe2c41db4 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -25,6 +25,7 @@
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/dbsize.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
@@ -549,14 +550,9 @@ pg_total_relation_size(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(size);
 }
 
-/*
- * formatting with size units
- */
-Datum
-pg_size_pretty(PG_FUNCTION_ARGS)
+void
+byte_size_pretty(char *dest, size_t destlen, int64 size)
 {
-	int64		size = PG_GETARG_INT64(0);
-	char		buf[64];
 	const struct size_pretty_unit *unit;
 
 	for (unit = size_pretty_units; unit->name != NULL; unit++)
@@ -569,8 +565,8 @@ pg_size_pretty(PG_FUNCTION_ARGS)
 			if (unit->round)
 				size = half_rounded(size);
 
-			snprintf(buf, sizeof(buf), INT64_FORMAT " %s", size, unit->name);
-			break;
+			snprintf(dest, destlen, INT64_FORMAT " %s", size, unit->name);
+			return;
 		}
 
 		/*
@@ -586,7 +582,18 @@ pg_size_pretty(PG_FUNCTION_ARGS)
 				+ (unit->round == true));
 		size /= ((int64) 1) << bits;
 	}
+}
 
+/*
+ * formatting with size units
+ */
+Datum
+pg_size_pretty(PG_FUNCTION_ARGS)
+{
+	int64		size = PG_GETARG_INT64(0);
+	char		buf[64];
+
+	byte_size_pretty(buf, sizeof(buf), size);
 	PG_RETURN_TEXT_P(cstring_to_text(buf));
 }
 
diff --git a/src/include/utils/dbsize.h b/src/include/utils/dbsize.h
new file mode 100644
index 0000000000..6bff9d4d6a
--- /dev/null
+++ b/src/include/utils/dbsize.h
@@ -0,0 +1,16 @@
+/*-----------------------------------------------------------------------
+ * dbsize.h
+ *
+ *	 Portions Copyright (c) 1999-2022, PostgreSQL Global Development Group
+ *
+ * src/include/utils/dbsize.h
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#ifndef _DBSIZE_H_
+#define _DBSIZE_H_
+
+extern void byte_size_pretty(char *dest, size_t destlen, int64 size);
+
+#endif							/* _DBSIZE_H_ */
-- 
2.31.1

>From b900819822609b60388b47277fb4f738561968cd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Tue, 6 Sep 2022 13:18:42 +0900
Subject: [PATCH v6 4/4] Change error messages to show the difference to the
 limit

Change the error message to show the difference to the limit instead
of the absolute value of the limit LSN.

"exceeds the limit by xxx MB"
---
 src/backend/replication/slot.c            | 18 ++++++++++++------
 src/test/recovery/t/019_replslot_limit.pl |  2 +-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a30d4e93cd..f06b34ab1f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -51,6 +51,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/dbsize.h"
 
 /*
  * Replication slot on-disk data structure.
@@ -1292,11 +1293,14 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			 */
 			if (last_signaled_pid != active_pid)
 			{
+				char buf[64];
+
+				byte_size_pretty(buf, sizeof(buf),
+								 oldestLSN - restart_lsn);
 				ereport(LOG,
-						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
+						(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit by %s",
 								active_pid, NameStr(slotname),
-								LSN_FORMAT_ARGS(restart_lsn),
-								LSN_FORMAT_ARGS(oldestLSN)),
+								LSN_FORMAT_ARGS(restart_lsn), buf),
 						 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 				(void) kill(active_pid, SIGTERM);
@@ -1317,6 +1321,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 		}
 		else
 		{
+			char buf[64];
+
 			/*
 			 * We hold the slot now and have already invalidated it; flush it
 			 * to ensure that state persists.
@@ -1333,11 +1339,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotSave();
 			ReplicationSlotRelease();
 
+			byte_size_pretty(buf, sizeof(buf), oldestLSN - restart_lsn);
 			ereport(LOG,
-					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
+					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit by %s",
 							NameStr(slotname),
-							LSN_FORMAT_ARGS(restart_lsn),
-							LSN_FORMAT_ARGS(oldestLSN)),
+							LSN_FORMAT_ARGS(restart_lsn), buf),
 					 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 			/* done with this slot for now */
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index f6c34c6887..7c975f74c2 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -186,7 +186,7 @@ for (my $i = 0; $i < 10000; $i++)
 {
 	if (find_in_log(
 			$node_primary,
-			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit [0-9A-F/]+",
+			"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds the limit by ",
 			$logstart))
 	{
 		$invalidated = 1;
-- 
2.31.1

Reply via email to