Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-05-13 Thread Michael Paquier
On Thu, Apr 25, 2024 at 02:53:33PM +0200, Majid Garoosi wrote:
> Unfortunately, I quit that company a month ago (I wish we could
> discuss this earlier) and don't have access to the environment
> anymore.
> I'll try to ask my teammates and see if they can find anything about
> the exact values of latency, bw, etc.
> 
> Anyway, here is some description of the environment. Sadly, there
> are no numbers in this description, but I'll try to provide as much details
> as possible.
> There is a k8s cluster running over some VMs. Each postgres
> instance runs as a pod inside the k8s cluster. So, both the
> primary and standby servers are in the same DC, but might be on
> different baremetal nodes. There is an overlay network for the pods to
> see each other, and there's also another overlay network for the VMs
> to see each other.
> The storage servers are in the same DC, but we're sure they're on some
> racks other than the postgres pods. They run Ceph [1] project and provide
> Rados Block Devices (RBD) [2] interface. In order for k8s to use ceph, a
> Ceph-CSI [3] controller is running inside the k8s cluster.
> BTW, the FS type is ext4.

Okay, seeing the feedback for this patch and Andres disagreeing with
it as being a good idea, I have marked the patch as rejected as it was
still in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-04-25 Thread Majid Garoosi
Hi,

> Now Majid stated that he uses "RBD" - Majid, any chance to specify
> what that RBD really is ? What's the tech? What fs? Any ioping or fio
> results? What's the blockdev --report /dev/XXX output ? (you stated
> "high" latency and "high" bandwidth , but it is relative, for me 15ms+
> is high latency and >1000MB/s sequential, but it would help others in
> future if you could specify it by the exact numbers please). Maybe
> it's just a matter of enabling readahead (line in [1]) there and/or
> using a higher WAL segment during initdb.

Unfortunately, I quit that company a month ago (I wish we could
discuss this earlier) and don't have access to the environment
anymore.
I'll try to ask my teammates and see if they can find anything about
the exact values of latency, bw, etc.

Anyway, here is some description of the environment. Sadly, there
are no numbers in this description, but I'll try to provide as much details
as possible.
There is a k8s cluster running over some VMs. Each postgres
instance runs as a pod inside the k8s cluster. So, both the
primary and standby servers are in the same DC, but might be on
different baremetal nodes. There is an overlay network for the pods to
see each other, and there's also another overlay network for the VMs
to see each other.
The storage servers are in the same DC, but we're sure they're on some
racks other than the postgres pods. They run Ceph [1] project and provide
Rados Block Devices (RBD) [2] interface. In order for k8s to use ceph, a
Ceph-CSI [3] controller is running inside the k8s cluster.
BTW, the FS type is ext4.

[1] - https://ceph.io/en/
[2] - https://docs.ceph.com/en/latest/rbd/
[3] - https://github.com/ceph/ceph-csi


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-04-24 Thread Jakub Wartak
Hi,

> My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the
> bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is
> that it determines the max size passed to WALRead(), which in turn determines
> how much we read from the OS at once.  If the storage has high latency but
> also high throughput, and readahead is disabled or just not aggressive enough
> after crossing segment boundaries, larger reads reduce the number of times
> you're likely to be blocked waiting for read IO.
>
> Which is also why I think that making MAX_SEND_SIZE configurable is a really
> poor proxy for improving the situation.
>
> We're imo much better off working on read_stream.[ch] support for reading WAL.

Well then that would be a consistent message at least, because earlier
in [1] it was rejected to have prefetch the WAL segment but on the
standby side, where the patch was only helping in configurations
having readahead *disabled* for some reason.

Now Majid stated that he uses "RBD" - Majid, any chance to specify
what that RBD really is ? What's the tech? What fs? Any ioping or fio
results? What's the blockdev --report /dev/XXX output ? (you stated
"high" latency and "high" bandwidth , but it is relative, for me 15ms+
is high latency and >1000MB/s sequential, but it would help others in
future if you could specify it by the exact numbers please). Maybe
it's just a matter of enabling readahead (line in [1]) there and/or
using a higher WAL segment during initdb.

[1] - 
https://www.postgresql.org/message-id/flat/CADVKa1WsQMBYK_02_Ji%3DpbOFnms%2BCT7TV6qJxDdHsFCiC9V_hw%40mail.gmail.com




Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-04-23 Thread Andres Freund
Hi,

On 2024-04-23 14:47:31 +0200, Jakub Wartak wrote:
> On Tue, Apr 23, 2024 at 2:24 AM Michael Paquier  wrote:
> >
> > > Any news, comments, etc. about this thread?
> >
> > FWIW, I'd still be in favor of doing a GUC-ification of this part, but
> > at this stage I'd need more time to do a proper study of a case where
> > this shows benefits to prove your point, or somebody else could come
> > in and show it.
> >
> > Andres has objected to this change, on the ground that this was not
> > worth it, though you are telling the contrary.  I would be curious to
> > hear from others, first, so as we gather more opinions to reach a
> > consensus.

I think it's a bad idea to make it configurable. It's just one more guc that
nobody has a chance of realistically tuning.  I'm not saying we shouldn't
improve the code - just that making MAX_SEND_SIZE configurable doesn't really
seem like a good answer.

FWIW, I have a hard time believing that MAX_SEND_SIZE is going to be the the
only or even primary issue with high latency, high bandwidth storage devices.



> First: it's very hard to get *reliable* replication setup for
> benchmark, where one could demonstrate correlation between e.g.
> increasing MAX_SEND_SIZE and observing benefits (in sync rep it is
> easier, as you are simply stalled in pgbench). Part of the problem are
> the following things:

Depending on the workload, it's possible to measure streaming-out performance
without actually regenerating WAL. E.g. by using pg_receivewal to stream the
data out multiple times.


Another way to get fairly reproducible WAL workloads is to drive
pg_logical_emit_message() from pgbench, that tends to havea lot less
variability than running tpcb-like or such.


> Second: once you perform above and ensure that there are no network or
> I/O stalls back then I *think* I couldn't see any impact of playing
> with MAX_SEND_SIZE from what I remember as probably something else is
> saturated first.

My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the
bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is
that it determines the max size passed to WALRead(), which in turn determines
how much we read from the OS at once.  If the storage has high latency but
also high throughput, and readahead is disabled or just not aggressive enough
after crossing segment boundaries, larger reads reduce the number of times
you're likely to be blocked waiting for read IO.

Which is also why I think that making MAX_SEND_SIZE configurable is a really
poor proxy for improving the situation.

We're imo much better off working on read_stream.[ch] support for reading WAL.

Greetings,

Andres Freund




Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-04-23 Thread Jakub Wartak
On Tue, Apr 23, 2024 at 2:24 AM Michael Paquier  wrote:
>
> On Mon, Apr 22, 2024 at 03:40:01PM +0200, Majid Garoosi wrote:
> > Any news, comments, etc. about this thread?
>
> FWIW, I'd still be in favor of doing a GUC-ification of this part, but
> at this stage I'd need more time to do a proper study of a case where
> this shows benefits to prove your point, or somebody else could come
> in and show it.
>
> Andres has objected to this change, on the ground that this was not
> worth it, though you are telling the contrary.  I would be curious to
> hear from others, first, so as we gather more opinions to reach a
> consensus.

I'm more with Andres on this one.I vaguely remember researching impact
of MAX_SEND_SIZE on independent two occasions (earlier async and more
recent sync case where I've investigated variety of ways to keep
latency down) and my summary would be:

First: it's very hard to get *reliable* replication setup for
benchmark, where one could demonstrate correlation between e.g.
increasing MAX_SEND_SIZE and observing benefits (in sync rep it is
easier, as you are simply stalled in pgbench). Part of the problem are
the following things:

a) workload can be tricky, for this purpose it needs to be trivial but bulky
b) it needs to be on isolated network and with guaranteed bandwidth
c) wal_init_zero impact should be ruled out
d) OS should be properly tuned autotuning TCP max(3rd value) + have
setup rmem_max/wmem_max properly
e) more serious TCP congestion should be used that the default one in OS
f) one should prevent any I/O stalls on walreceiver writeback during
huge WAL activity and restart points on standby (dirty_bytes and so
on, isolated pg_xlog, BDI max_ratio)

Second: once you perform above and ensure that there are no network or
I/O stalls back then I *think* I couldn't see any impact of playing
with MAX_SEND_SIZE from what I remember as probably something else is
saturated first.

I can offer help with trying to test this with artificial tests and
even injecting proper latency (WAN-like), but OP (Majid) I think needs
first describe his env much better (exact latency, bandwidth,
workload, TCP sysctl values, duration of the tests, no# of attempts
tried, exact commands used, iperf3 TCP results demonstrating hw used
and so on). So in short the patch is easy, but demonstrating the
effect and persuading others here would be hard.

-J.




Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-04-22 Thread Michael Paquier
On Mon, Apr 22, 2024 at 03:40:01PM +0200, Majid Garoosi wrote:
> Any news, comments, etc. about this thread?

FWIW, I'd still be in favor of doing a GUC-ification of this part, but
at this stage I'd need more time to do a proper study of a case where
this shows benefits to prove your point, or somebody else could come
in and show it.

Andres has objected to this change, on the ground that this was not
worth it, though you are telling the contrary.  I would be curious to
hear from others, first, so as we gather more opinions to reach a
consensus.
--
Michael


signature.asc
Description: PGP signature


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-04-22 Thread Majid Garoosi
Hey folks,

Any news, comments, etc. about this thread?

Best regards
Majid Garoosi

On Mon, 12 Feb 2024 at 01:10, Michael Paquier  wrote:

> On Sun, Feb 11, 2024 at 04:32:20PM +0330, Majid Garoosi wrote:
> > On Fri, 9 Feb 2024 at 22:33, Andres Freund  wrote:
> >> The way we read the WAL data is perfectly prefetchable by the the OS,
> so I
> >> wouldn't really expect gains here.  Have you actually been able to see a
> >> performance benefit by increasing MAX_SEND_SIZE?
> >
> > Yes, I have seen a huge performance jump. Take a look at here
> > <
> https://www.postgresql.org/message-id/CAFWczPvi_5FWH%2BJTqkWbi%2Bw83hy%3DMYg%3D2hKK0%3DJZBe9%3DhTpE4w%40mail.gmail.com
> >
> > for
> > more info.
>
> Yes, I can get the idea that grouping more replication messages in
> one shot can be beneficial in some cases while being
> environment-dependent, though I also get the point that we cannot
> simply GUC-ify everything either.  I'm OK with this one at the end,
> because it is not performance critical.
>
> Note that it got lowered to the current value in ea5516081dcb to make
> it more responsive, while being half a WAL segment in 40f908bdcdc7
> when WAL senders have been introduced in 2010.  I cannot pinpoint the
> exact thread that led to this change, but I'm adding Fujii-san and
> Heikki in CC for comments.
> --
> Michael
>


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-11 Thread Michael Paquier
On Sun, Feb 11, 2024 at 04:32:20PM +0330, Majid Garoosi wrote:
> On Fri, 9 Feb 2024 at 22:33, Andres Freund  wrote:
>> The way we read the WAL data is perfectly prefetchable by the the OS, so I
>> wouldn't really expect gains here.  Have you actually been able to see a
>> performance benefit by increasing MAX_SEND_SIZE?
> 
> Yes, I have seen a huge performance jump. Take a look at here
> 
> for
> more info.

Yes, I can get the idea that grouping more replication messages in
one shot can be beneficial in some cases while being
environment-dependent, though I also get the point that we cannot
simply GUC-ify everything either.  I'm OK with this one at the end,
because it is not performance critical.

Note that it got lowered to the current value in ea5516081dcb to make
it more responsive, while being half a WAL segment in 40f908bdcdc7
when WAL senders have been introduced in 2010.  I cannot pinpoint the
exact thread that led to this change, but I'm adding Fujii-san and
Heikki in CC for comments.
--
Michael


signature.asc
Description: PGP signature


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-11 Thread Majid Garoosi
Hi Andres,

On Fri, 9 Feb 2024 at 22:33, Andres Freund  wrote:

> On 2024-01-19 23:04:50 +0330, Majid Garoosi wrote:
> > Following is the description which is also written in the commit message:
> > MAX_SEND_SIZE parameter was used in WALSender to limit maximum size of
> > a WAL data packet sent to a WALReceiver during replication. Although
> > its default value (128kB) was a reasonable value, it was written in
> > 2010. Since the hardwares have changed a lot since then, a PostgreSQL
> > user might need to customize this value.
> > For example, if a database's disk has a high bandwidth and a high
> > latency at the same time, it makes more sense to read bigger chunks of
> > data from disk in each iteration. One example of such disks is a remote
> > disk. (e.g. an RBD volume)
>
> The way we read the WAL data is perfectly prefetchable by the the OS, so I
> wouldn't really expect gains here.  Have you actually been able to see a
> performance benefit by increasing MAX_SEND_SIZE?
>

Yes, I have seen a huge performance jump. Take a look at here

for
more info.

Best
Majid


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-09 Thread Andres Freund
Hi,

On 2024-01-19 23:04:50 +0330, Majid Garoosi wrote:
> Following is the description which is also written in the commit message:
> MAX_SEND_SIZE parameter was used in WALSender to limit maximum size of
> a WAL data packet sent to a WALReceiver during replication. Although
> its default value (128kB) was a reasonable value, it was written in
> 2010. Since the hardwares have changed a lot since then, a PostgreSQL
> user might need to customize this value.
> For example, if a database's disk has a high bandwidth and a high
> latency at the same time, it makes more sense to read bigger chunks of
> data from disk in each iteration. One example of such disks is a remote
> disk. (e.g. an RBD volume)

The way we read the WAL data is perfectly prefetchable by the the OS, so I
wouldn't really expect gains here.  Have you actually been able to see a
performance benefit by increasing MAX_SEND_SIZE?

I don't think we should add configuration parameters without at least some
demonstration of practical gains, otherwise we'll end up with hundreds of
never-useful config options.

Greetings,

Andres Freund




Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-09 Thread Majid Garoosi
On Fri, 9 Feb 2024 at 08:50, Michael Paquier  wrote:

> Something to be aware of, but the community lists use bottom-posting
> for replies because it is easier to follow the logic of a thread this
> way.  See here:
> https://en.wikipedia.org/wiki/Posting_style#Bottom-posting
>
Oh, sorry for not using the convention here. I just noticed that after
pressing the send button. =)

Thanks for the patch, that looks rather fine.  I have spent some time
> polishing the docs, adding a mention that increasing the value can
> show benefits depending on what you do.  How does the attached look to
> you?
>
I took a look at it and it seems good to me.
Maybe I should work on my writing skills more. :D

Best
Majid


Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-08 Thread Michael Paquier
On Thu, Feb 08, 2024 at 02:42:00PM +0330, Majid Garoosi wrote:
> Thank you very much for your review.

Something to be aware of, but the community lists use bottom-posting
for replies because it is easier to follow the logic of a thread this
way.  See here:
https://en.wikipedia.org/wiki/Posting_style#Bottom-posting

> I generally agree with your suggestions, so just applied them.
> You can find the new patch in the attached file.

Thanks for the patch, that looks rather fine.  I have spent some time
polishing the docs, adding a mention that increasing the value can
show benefits depending on what you do.  How does the attached look to
you?
--
Michael
From 2bacd4e4aaff53d26d8017e905e1dd02a67e93ff Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 9 Feb 2024 14:18:31 +0900
Subject: [PATCH v3] Introduce wal_sender_max_send_size as a GUC

Blah, blah.

Author: Majid Garoosi
Discussion: https://postgr.es/m/
---
 src/include/replication/walsender.h   |  1 +
 src/include/utils/guc_hooks.h |  1 +
 src/backend/replication/walsender.c   | 26 +++
 src/backend/utils/init/postinit.c | 11 
 src/backend/utils/misc/guc_tables.c   | 11 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 doc/src/sgml/config.sgml  | 23 
 7 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 1b58d50b3b..f4d3c73f4d 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -31,6 +31,7 @@ extern PGDLLIMPORT bool wake_wal_senders;
 /* user-settable parameters */
 extern PGDLLIMPORT int max_wal_senders;
 extern PGDLLIMPORT int wal_sender_timeout;
+extern PGDLLIMPORT int wal_sender_max_send_size;
 extern PGDLLIMPORT bool log_replication_commands;
 
 extern void InitWalSender(void);
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..b0ac07171c 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -84,6 +84,7 @@ extern bool check_maintenance_io_concurrency(int *newval, void **extra,
 extern void assign_maintenance_io_concurrency(int newval, void *extra);
 extern bool check_max_connections(int *newval, void **extra, GucSource source);
 extern bool check_max_wal_senders(int *newval, void **extra, GucSource source);
+extern bool check_wal_sender_max_send_size(int *newval, void **extra, GucSource source);
 extern bool check_max_slot_wal_keep_size(int *newval, void **extra,
 		 GucSource source);
 extern void assign_max_wal_size(int newval, void *extra);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 77c8baa32a..a55516f589 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -96,17 +96,6 @@
 #include "utils/timeout.h"
 #include "utils/timestamp.h"
 
-/*
- * Maximum data payload in a WAL data message.  Must be >= XLOG_BLCKSZ.
- *
- * We don't have a good idea of what a good value would be; there's some
- * overhead per message in both walsender and walreceiver, but on the other
- * hand sending large batches makes walsender less responsive to signals
- * because signals are checked only between messages.  128kB (with
- * default 8k blocks) seems like a reasonable guess for now.
- */
-#define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
-
 /* Array of WalSnds in shared memory */
 WalSndCtlData *WalSndCtl = NULL;
 
@@ -124,6 +113,8 @@ int			max_wal_senders = 10;	/* the maximum number of concurrent
 	 * walsenders */
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
+int			wal_sender_max_send_size = XLOG_BLCKSZ * 16;	/* Maximum data payload
+			 * in a WAL data message */
 bool		log_replication_commands = false;
 
 /*
@@ -2950,8 +2941,8 @@ WalSndSegmentOpen(XLogReaderState *state, XLogSegNo nextSegNo,
 /*
  * Send out the WAL in its normal physical/stored form.
  *
- * Read up to MAX_SEND_SIZE bytes of WAL that's been flushed to disk,
- * but not yet sent to the client, and buffer it in the libpq output
+ * Read up to wal_sender_max_send_size bytes of WAL that's been flushed to
+ * disk, but not yet sent to the client, and buffer it in the libpq output
  * buffer.
  *
  * If there is no unsent WAL remaining, WalSndCaughtUp is set to true,
@@ -3132,8 +3123,9 @@ XLogSendPhysical(void)
 
 	/*
 	 * Figure out how much to send in one message. If there's no more than
-	 * MAX_SEND_SIZE bytes to send, send everything. Otherwise send
-	 * MAX_SEND_SIZE bytes, but round back to logfile or page boundary.
+	 * wal_sender_max_send_size bytes to send, send everything. Otherwise send
+	 * wal_sender_max_send_size bytes, but round back to logfile or page
+	 * boundary.
 	 *
 	 * The rounding is not only for performance reasons. Walreceiver relies on
 	 * the fact that we never split a WAL record 

Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-08 Thread Majid Garoosi
Thank you very much for your review.

I generally agree with your suggestions, so just applied them.
You can find the new patch in the attached file.

Best
Majid

On Tue, 6 Feb 2024 at 09:26, Michael Paquier  wrote:

> On Fri, Jan 19, 2024 at 11:04:50PM +0330, Majid Garoosi wrote:
> > However, this value does not need to be larger than wal_segment_size,
> > thus its checker function returns false if a larger value is set for
> > this.
> >
> > This is my first patch. So, I hope I haven't done something wrong. :'D
>
> You've done nothing wrong.  Thanks for the patch!
>
> +   if (*newval > wal_segment_size)
> +   return false;
> +   return true;
>
> I was not sure first that we need a dynamic check, but I could get why
> somebody may want to make it higher than 1MB these days.
>
> The patch is missing a couple of things:
> - Documentation in doc/src/sgml/config.sgml, that has a section for
> "Sending Servers".
> - It is not listed in postgresql.conf.sample.  I would suggest to put
> it in REPLICATION -> Sending Servers.
> The default value of 128kB should be mentioned in both cases.
>
> - * We don't have a good idea of what a good value would be; there's some
> - * overhead per message in both walsender and walreceiver, but on the
> other
> - * hand sending large batches makes walsender less responsive to signals
> - * because signals are checked only between messages.  128kB (with
> - * default 8k blocks) seems like a reasonable guess for now.
> [...]
> +   gettext_noop("Walsender procedure consists of a loop, reading
> wal_sender_max_send_size "
> + "bytes of WALs from disk and sending them to the receiver.
> Sending large "
> + "batches makes walsender less responsive to signals."),
>
> This is removing some information about why it may be a bad idea to
> use a too low value (message overhead) and why it may be a bad idea to
> use a too large value (responsiveness).  I would suggest to remove the
> second gettext_noop() in guc_tables.c and move all this information to
> config.sgml with the description of the new GUC.  Perhaps just put it
> after wal_sender_timeout in the sample file and the docs?
>
> Three comments in walsender.c still mention MAX_SEND_SIZE.  These
> should be switched to mention the GUC instead.
>
> I am switching the patch as waiting on author for now.
> --
> Michael
>
From fdd833b8b5cfdc1645f8288b27e9a92fc46f7951 Mon Sep 17 00:00:00 2001
From: Majid Garoosi 
Date: Fri, 19 Jan 2024 22:48:23 +0330
Subject: [PATCH v2] Add documentation for wal_sender_max_send_size

Detailed changes:
- Add documentation to postgresql's config docs
- Remove code comments mentioning the setting's old constant name
- Add the parameter to postgresql.conf.sample
---
 doc/src/sgml/config.sgml  | 26 ++
 src/backend/replication/walsender.c   | 27 +++
 src/backend/utils/init/postinit.c | 11 
 src/backend/utils/misc/guc_tables.c   | 11 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/replication/walsender.h   |  1 +
 src/include/utils/guc_hooks.h |  1 +
 7 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..86b9488cd1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4405,6 +4405,32 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  wal_sender_max_send_size (integer)
+  
+   wal_sender_max_send_size configuration parameter
+  
+  
+  
+   
+Maximum size of chunks that wal sender reads from disk and sends to the
+standby servers in each iteration. Its default value is 16
+times XLOG_BLCKSZ, i.e. it is 128kB if
+the project is built with the default XLOG_BLCKSZ
+configuration parameter. This parameter can be changed without restarting the
+server.
+   
+   
+The minimum allowed value is XLOG_BLCKSZ bytes, typically
+8kB. Its maximum allowed value is
+wal_segment_size, typically 16MB.
+While a large value for this parameter reduces the responsiveness of WAL
+sender processes to the signals, setting small values increases the overhead
+of the messages both in WAL senders and WAL receivers.
+   
+  
+ 
+
  
   track_commit_timestamp (boolean)
   
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 087031e9dc..c429facded 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -96,17 +96,6 @@
 #include "utils/timeout.h"
 #include "utils/timestamp.h"
 
-/*
- * Maximum data payload in a WAL data message.  Must be >= XLOG_BLCKSZ.
- *
- * We don't have a good idea of what a good value would be; there's some
- * overhead per message in both walsender and walreceiver, but on the other
- 

Re: GUC-ify walsender MAX_SEND_SIZE constant

2024-02-05 Thread Michael Paquier
On Fri, Jan 19, 2024 at 11:04:50PM +0330, Majid Garoosi wrote:
> However, this value does not need to be larger than wal_segment_size,
> thus its checker function returns false if a larger value is set for
> this.
> 
> This is my first patch. So, I hope I haven't done something wrong. :'D

You've done nothing wrong.  Thanks for the patch!

+   if (*newval > wal_segment_size)
+   return false;
+   return true;

I was not sure first that we need a dynamic check, but I could get why
somebody may want to make it higher than 1MB these days.

The patch is missing a couple of things:
- Documentation in doc/src/sgml/config.sgml, that has a section for
"Sending Servers".
- It is not listed in postgresql.conf.sample.  I would suggest to put
it in REPLICATION -> Sending Servers.
The default value of 128kB should be mentioned in both cases.

- * We don't have a good idea of what a good value would be; there's some
- * overhead per message in both walsender and walreceiver, but on the other
- * hand sending large batches makes walsender less responsive to signals
- * because signals are checked only between messages.  128kB (with
- * default 8k blocks) seems like a reasonable guess for now.
[...]
+   gettext_noop("Walsender procedure consists of a loop, reading 
wal_sender_max_send_size "
+ "bytes of WALs from disk and sending them to the receiver. Sending 
large "
+ "batches makes walsender less responsive to signals."),

This is removing some information about why it may be a bad idea to
use a too low value (message overhead) and why it may be a bad idea to
use a too large value (responsiveness).  I would suggest to remove the
second gettext_noop() in guc_tables.c and move all this information to
config.sgml with the description of the new GUC.  Perhaps just put it
after wal_sender_timeout in the sample file and the docs?

Three comments in walsender.c still mention MAX_SEND_SIZE.  These
should be switched to mention the GUC instead.

I am switching the patch as waiting on author for now.
--
Michael


signature.asc
Description: PGP signature


GUC-ify walsender MAX_SEND_SIZE constant

2024-01-19 Thread Majid Garoosi
Hello all,

Following the threads here

and there , I decided to submit
this patch.

Following is the description which is also written in the commit message:
MAX_SEND_SIZE parameter was used in WALSender to limit maximum size of
a WAL data packet sent to a WALReceiver during replication. Although
its default value (128kB) was a reasonable value, it was written in
2010. Since the hardwares have changed a lot since then, a PostgreSQL
user might need to customize this value.
For example, if a database's disk has a high bandwidth and a high
latency at the same time, it makes more sense to read bigger chunks of
data from disk in each iteration. One example of such disks is a remote
disk. (e.g. an RBD volume)
However, this value does not need to be larger than wal_segment_size,
thus its checker function returns false if a larger value is set for
this.

This is my first patch. So, I hope I haven't done something wrong. :'D

Best regards
Majid
From 218a925d1161878c888bef8adb3f246ec2b44d12 Mon Sep 17 00:00:00 2001
From: Majid Garoosi 
Date: Fri, 19 Jan 2024 22:48:23 +0330
Subject: [PATCH v1] GUC-ify MAX_SEND_SIZE constant in WALSender

MAX_SEND_SIZE parameter was used in WALSender to limit maximum size of
a WAL data packet sent to a WALReceiver during replication. Although
its default value (128kB) was a reasonable value, it was written in
2010. Since the hardwares have changed a lot since then, a PostgreSQL
user might need to customize this value.
For example, if a database's disk has a high bandwidth and a high
latency at the same time, it makes more sense to read bigger chunks of
data from disk in each iteration. One example of such disks is a remote
disk. (e.g. an RBD volume)
However, this value does not need to be larger than wal_segment_size,
thus its checker function returns false if a larger value is set for
this.
---
 src/backend/replication/walsender.c | 18 +-
 src/backend/utils/init/postinit.c   | 11 +++
 src/backend/utils/misc/guc_tables.c | 13 +
 src/include/replication/walsender.h |  1 +
 src/include/utils/guc_hooks.h   |  1 +
 5 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 087031e9dc..0299859d97 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -96,17 +96,6 @@
 #include "utils/timeout.h"
 #include "utils/timestamp.h"
 
-/*
- * Maximum data payload in a WAL data message.  Must be >= XLOG_BLCKSZ.
- *
- * We don't have a good idea of what a good value would be; there's some
- * overhead per message in both walsender and walreceiver, but on the other
- * hand sending large batches makes walsender less responsive to signals
- * because signals are checked only between messages.  128kB (with
- * default 8k blocks) seems like a reasonable guess for now.
- */
-#define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
-
 /* Array of WalSnds in shared memory */
 WalSndCtlData *WalSndCtl = NULL;
 
@@ -124,6 +113,9 @@ int			max_wal_senders = 10;	/* the maximum number of concurrent
 	 * walsenders */
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
+int wal_sender_max_send_size = XLOG_BLCKSZ * 16; /* Maximum data
+  * payload in a WAL
+  * data message */
 bool		log_replication_commands = false;
 
 /*
@@ -3087,7 +3079,7 @@ XLogSendPhysical(void)
 	 */
 	startptr = sentPtr;
 	endptr = startptr;
-	endptr += MAX_SEND_SIZE;
+	endptr += wal_sender_max_send_size;
 
 	/* if we went beyond SendRqstPtr, back off */
 	if (SendRqstPtr <= endptr)
@@ -3106,7 +3098,7 @@ XLogSendPhysical(void)
 	}
 
 	nbytes = endptr - startptr;
-	Assert(nbytes <= MAX_SEND_SIZE);
+	Assert(nbytes <= wal_sender_max_send_size);
 
 	/*
 	 * OK to read and send the slice.
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1ad3367159..8d61b006c4 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,17 @@ check_max_wal_senders(int *newval, void **extra, GucSource source)
 	return true;
 }
 
+/*
+ * GUC check_hook for wal_sender_max_send_size
+ */
+bool
+check_wal_sender_max_send_size(int *newval, void **extra, GucSource source)
+{
+	if (*newval > wal_segment_size)
+		return false;
+	return true;
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index e53ebc6dc2..76f755cbd3 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2892,6 +2892,19 @@ struct