subject:"Re\: Flushing large data immediately in pqcomm"

Re: Flushing large data immediately in pqcomm

2024-04-09 Thread Ranier Vilela

Em seg., 8 de abr. de 2024 às 09:27, Jelte Fennema-Nio 
escreveu:

> On Sun, 7 Apr 2024 at 11:34, David Rowley  wrote:
> > That seems to require modifying the following function signatures:
> > secure_write(), be_tls_write(), be_gssapi_write().  That's not an area
> > I'm familiar with, however.
>
> Attached is a new patchset where 0003 does exactly that. The only
> place where we need to cast to non-const is for GSS, but that seems
> fine (commit message has more details).
>
+1.
Looks ok to me.
The GSS pointer *ptr, is already cast to char * where it is needed,
so the code is already correct.

best regards,
Ranier Vilela

Re: Flushing large data immediately in pqcomm

2024-04-08 Thread Jelte Fennema-Nio

On Sun, 7 Apr 2024 at 11:34, David Rowley  wrote:
> That seems to require modifying the following function signatures:
> secure_write(), be_tls_write(), be_gssapi_write().  That's not an area
> I'm familiar with, however.

Attached is a new patchset where 0003 does exactly that. The only
place where we need to cast to non-const is for GSS, but that seems
fine (commit message has more details).

I also added patch 0002, which is a small addition to the function
comment of internal_flush_buffer that seemed useful to me to
differentiate it from internal_flush (feel free to ignore/rewrite).

v9-0001-Make-a-few-variables-size_t-in-pqcomm.c.patch
Description: Binary data

v9-0002-Expand-comment-of-internal_flush_buffer.patch
Description: Binary data

v9-0003-Make-backend-libpq-write-functions-take-const-poi.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-04-08 Thread Ranier Vilela

Em seg., 8 de abr. de 2024 às 07:42, Jelte Fennema-Nio 
escreveu:

> On Sun, 7 Apr 2024 at 14:41, David Rowley  wrote:
> > Looking at the code in socket_putmessage_noblock(), I don't understand
> > why it's ok for PqSendBufferSize to be int but "required" must be
> > size_t.  There's a line that does "PqSendBufferSize = required;". It
> > kinda looks like they both should be size_t.  Am I missing something
> > that you've thought about?
>
>
> You and Ranier are totally right (I missed this assignment). Attached is
> v8.
>
+1
LGTM.

best regards,
Ranier Vilela

Re: Flushing large data immediately in pqcomm

2024-04-08 Thread Jelte Fennema-Nio

On Sun, 7 Apr 2024 at 14:41, David Rowley  wrote:
> Looking at the code in socket_putmessage_noblock(), I don't understand
> why it's ok for PqSendBufferSize to be int but "required" must be
> size_t.  There's a line that does "PqSendBufferSize = required;". It
> kinda looks like they both should be size_t.  Am I missing something
> that you've thought about?


You and Ranier are totally right (I missed this assignment). Attached is v8.


v8-0001-Make-a-few-variables-size_t-in-pqcomm.c.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-04-07 Thread Melih Mutlu

David Rowley , 6 Nis 2024 Cmt, 04:34 tarihinde şunu
yazdı:

> Does anyone else want to try the attached script on the v5 patch to
> see if their numbers are better?
>

I'm seeing the below results with your script on my machine (). I ran it
several times, results were almost similar each time.

master:
Run 100 100 500: 1.627905512
Run 1024 10240 20: 1.603231684
Run 1024 1048576 2000: 2.962812352
Run 1048576 1048576 1000: 2.940766748

v5:
Run 100 100 500: 1.611508155
Run 1024 10240 20: 1.603505596
Run 1024 1048576 2000: 2.727241937
Run 1048576 1048576 1000: 2.721268988

Re: Flushing large data immediately in pqcomm

2024-04-07 Thread Ranier Vilela

Em sáb., 6 de abr. de 2024 às 22:39, Andres Freund 
escreveu:

> Hi,
>
> On 2024-04-07 00:45:31 +0200, Jelte Fennema-Nio wrote:
> > On Sat, 6 Apr 2024 at 22:21, Andres Freund  wrote:
> > > The small regression for small results is still kinda visible, I
> haven't yet
> > > tested the patch downthread.
> >
> > Thanks a lot for the faster test script, I'm also impatient. I still
> > saw the small regression with David his patch. Here's a v6 where I
> > think it is now gone. I added inline to internal_put_bytes too. I
> > think that helped especially because for two calls to
> > internal_put_bytes len is a constant (1 and 4) that is smaller than
> > PqSendBufferSize. So for those calls the compiler can now statically
> > eliminate the new codepath because "len >= PqSendBufferSize" is known
> > to be false at compile time.
>
> Nice.
>
>
> > Also I incorporated all of Ranier his comments.
>
> Changing the global vars to size_t seems mildly bogus to me. All it's
> achieving is to use slightly more memory. It also just seems unrelated to
> the
> change.
>
I don't agree with this thought.
Actually size_t uses 4 bytes of memory than int, right.
But mixing up int and size_t is a sure way to write non-portable code.
And the compilers will start showing messages such as " signed/unsigned
mismatch".

The global vars PqSendPointer and PqSendStart were changed in the v5 patch,
so for the sake of style and consistency, I understand that it is better
not to mix the types.

The compiler will promote PqSendBufferSize to size_t in all comparisons.

And finally the correct type to deal with char * variables is size_t.

Best regards,
Ranier Vilela


> Greetings,
>
> Andres Freund
>

Re: Flushing large data immediately in pqcomm

2024-04-07 Thread David Rowley

On Sun, 7 Apr 2024 at 22:05, Jelte Fennema-Nio  wrote:
>
> On Sun, 7 Apr 2024 at 03:39, Andres Freund  wrote:
> > Changing the global vars to size_t seems mildly bogus to me. All it's
> > achieving is to use slightly more memory. It also just seems unrelated to 
> > the
> > change.
>
> I took a closer look at this. I agree that changing PqSendBufferSize
> to size_t is unnecessary: given the locations that it is used I see no
> risk of overflow anywhere. Changing the type of PqSendPointer and
> PqSendStart is needed though, because (as described by Heiki and David
> upthread) the argument type of internal_flush_buffer is size_t*. So if
> you actually pass int* there, and the sizes are not the same then you
> will start writing out of bounds. And because internal_flush_buffer is
> introduced in this patch, it is related to this change.
>
> This is what David just committed too.
>
> However, the "required" var actually should be of size_t to avoid
> overflow if len is larger than int even without this change. So
> attached is a tiny patch that does that.

Looking at the code in socket_putmessage_noblock(), I don't understand
why it's ok for PqSendBufferSize to be int but "required" must be
size_t.  There's a line that does "PqSendBufferSize = required;". It
kinda looks like they both should be size_t.  Am I missing something
that you've thought about?

David

Re: Flushing large data immediately in pqcomm

2024-04-07 Thread Jelte Fennema-Nio

On Sun, 7 Apr 2024 at 03:39, Andres Freund  wrote:
> Changing the global vars to size_t seems mildly bogus to me. All it's
> achieving is to use slightly more memory. It also just seems unrelated to the
> change.

I took a closer look at this. I agree that changing PqSendBufferSize
to size_t is unnecessary: given the locations that it is used I see no
risk of overflow anywhere. Changing the type of PqSendPointer and
PqSendStart is needed though, because (as described by Heiki and David
upthread) the argument type of internal_flush_buffer is size_t*. So if
you actually pass int* there, and the sizes are not the same then you
will start writing out of bounds. And because internal_flush_buffer is
introduced in this patch, it is related to this change.

This is what David just committed too.

However, the "required" var actually should be of size_t to avoid
overflow if len is larger than int even without this change. So
attached is a tiny patch that does that.

v7-0001-Avoid-possible-overflow-in-socket_putmessage_nonb.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-04-07 Thread David Rowley

On Sun, 7 Apr 2024 at 08:21, Andres Freund  wrote:
> I added WITH BINARY, SET STORAGE EXTERNAL and tested both unix socket and
> localhost. I also reduced row counts and iteration counts, because I am
> impatient, and I don't think it matters much here. Attached the modified
> version.

Thanks for the script.  I'm able to reproduce the speedup with your script.

I looked over the patch again and ended up making internal_flush an
inline function rather than a macro.  I compared the assembly produced
from each and it's the same with the exception of the label names
being different.

I've now pushed the patch.

One thing that does not seem ideal is having to cast away the
const-ness of the buffer in internal_flush_buffer().  Previously this
wasn't an issue as we always copied the buffer and passed that to
secure_write().  I wonder if it's worth seeing if we can keep this
buffer constant all the way to the socket write.

That seems to require modifying the following function signatures:
secure_write(), be_tls_write(), be_gssapi_write().  That's not an area
I'm familiar with, however.

David

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread Andres Freund

Hi,

On 2024-04-07 00:45:31 +0200, Jelte Fennema-Nio wrote:
> On Sat, 6 Apr 2024 at 22:21, Andres Freund  wrote:
> > The small regression for small results is still kinda visible, I haven't yet
> > tested the patch downthread.
> 
> Thanks a lot for the faster test script, I'm also impatient. I still
> saw the small regression with David his patch. Here's a v6 where I
> think it is now gone. I added inline to internal_put_bytes too. I
> think that helped especially because for two calls to
> internal_put_bytes len is a constant (1 and 4) that is smaller than
> PqSendBufferSize. So for those calls the compiler can now statically
> eliminate the new codepath because "len >= PqSendBufferSize" is known
> to be false at compile time.

Nice.


> Also I incorporated all of Ranier his comments.

Changing the global vars to size_t seems mildly bogus to me. All it's
achieving is to use slightly more memory. It also just seems unrelated to the
change.

Greetings,

Andres Freund

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread Jelte Fennema-Nio

On Sat, 6 Apr 2024 at 22:21, Andres Freund  wrote:
> The small regression for small results is still kinda visible, I haven't yet
> tested the patch downthread.

Thanks a lot for the faster test script, I'm also impatient. I still
saw the small regression with David his patch. Here's a v6 where I
think it is now gone. I added inline to internal_put_bytes too. I
think that helped especially because for two calls to
internal_put_bytes len is a constant (1 and 4) that is smaller than
PqSendBufferSize. So for those calls the compiler can now statically
eliminate the new codepath because "len >= PqSendBufferSize" is known
to be false at compile time.

Also I incorporated all of Ranier his comments.

v6-0001-Faster-internal_putbytes.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread Ranier Vilela

Hi,

On Fri, 5 Apr 2024 at 03:28, Melih Mutlu
 wrote:
>Right. It was a mistake, forgot to remove that. Fixed it in v5.

If you don't mind, I have some suggestions for patch v5.

1.  Shouldn't PqSendBufferSize be of type size_t?
There are several comparisons with other size_t variables.
static size_t PqSendBufferSize; /* Size send buffer */

I think this would prevent possible overflows.

2. If PqSendBufferSize is changed to size_t, in the function
socket_putmessage_noblock, the variable which name is *required*, should
be changed to size_t as well.

static void
socket_putmessage_noblock(char msgtype, const char *s, size_t len)
{
int res PG_USED_FOR_ASSERTS_ONLY;
size_t required;

3. In the internal_putbytes function, the *amout* variable could
have the scope safely reduced.

else
{
size_t amount;

amount = PqSendBufferSize - PqSendPointer;

4. In the function internal_flush_buffer, the variables named
*bufptr* and *bufend* could be const char * type, like:

static int
internal_flush_buffer(const char *s, size_t *start, size_t *end)
{
static int last_reported_send_errno = 0;

const char *bufptr = s + *start;
const char *bufend = s + *end;

best regards,
Ranier Vilela

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread Andres Freund

Hi,

On 2024-04-06 14:34:17 +1300, David Rowley wrote:
> I don't see any issues with v5, so based on the performance numbers
> shown on this thread for the latest patch, it would make sense to push
> it.  The problem is, I just can't recreate the performance numbers.
>
> I've tried both on my AMD 3990x machine and an Apple M2 with a script
> similar to the test.sh from above.  I mostly just stripped out the
> buffer size stuff and adjusted the timing code to something that would
> work with mac.

I think there are a few issues with the test script leading to not seeing a
gain:

1) I think using the textual protocol, with the text datatype, will make it
   harder to spot differences. That's a lot of overhead.

2) Afaict the test is connecting over the unix socket, I think we expect
   bigger wins for tcp

3) Particularly the larger string is bottlenecked due to pglz compression in
   toast.


Where I had noticed the overhead of the current approach badly, was streaming
out basebackups. Which is all binary, of course.


I added WITH BINARY, SET STORAGE EXTERNAL and tested both unix socket and
localhost. I also reduced row counts and iteration counts, because I am
impatient, and I don't think it matters much here. Attached the modified
version.


On a dual xeon Gold 5215, turbo boost disabled, server pinned to one core,
script pinned to another:


unix:

master:
Run 100 100 100: 0.058482377
Run 1024 10240 10: 0.120909810
Run 1024 1048576 2000: 0.153027916
Run 1048576 1048576 1000: 0.154953512

v5:
Run 100 100 100: 0.058760126
Run 1024 10240 10: 0.118831396
Run 1024 1048576 2000: 0.124282503
Run 1048576 1048576 1000: 0.123894962


localhost:

master:
Run 100 100 100: 0.067088000
Run 1024 10240 10: 0.170894273
Run 1024 1048576 2000: 0.230346632
Run 1048576 1048576 1000: 0.230336078

v5:
Run 100 100 100: 0.067144036
Run 1024 10240 10: 0.167950948
Run 1024 1048576 2000: 0.135167027
Run 1048576 1048576 1000: 0.135347867


The perf difference for 1MB via TCP is really impressive.

The small regression for small results is still kinda visible, I haven't yet
tested the patch downthread.

Greetings,

Andres Freund
#!/bin/bash

set -e

dbname=postgres
port=5440
host=/tmp
host=localhost

test_cases=(
"100 100 100"   # only 100 bytes
"1024 10240 10"# 1Kb and 10Kb
"1024 1048576 2000" # 1Kb and 1Mb
"1048576 1048576 1000"  # all 1Mb
)

insert_rows(){
psql -d $dbname -p $port -h $host  -c "
DO \$\$
DECLARE
counter INT;
BEGIN
FOR counter IN 1..$3 LOOP
IF counter % 2 = 1 THEN
INSERT INTO test_table VALUES (repeat('a', $1)::text);
ELSE
INSERT INTO test_table VALUES (repeat('b', $2)::text);
END IF;
END LOOP;
END \$\$;
" > /dev/null
}


psql -d $dbname -p $port -c "CREATE EXTENSION IF NOT EXISTS pg_prewarm;" > 
/dev/null

for case in "${test_cases[@]}"
do
psql -d $dbname -p $port -h $host -c "DROP TABLE IF EXISTS test_table;" 
> /dev/null
psql -d $dbname -p $port -h $host  -c "CREATE UNLOGGED TABLE 
test_table(data text not null);" > /dev/null
psql -d $dbname -p $port -h $host  -c "ALTER TABLE test_table ALTER 
data SET STORAGE EXTERNAL;" > /dev/null

insert_rows $case

psql -d $dbname -p $port -h $host  -c "select 
pg_prewarm('test_table');" > /dev/null

echo -n "Run $case: "

elapsed_time=0
for a in {1..5}
do
start_time=$(perl -MTime::HiRes=time -e 'printf "%.9f\n", time')
psql -d $dbname -p $port -h $host -c "COPY test_table TO STDOUT 
WITH BINARY;" > /dev/null
end_time=$(perl -MTime::HiRes=time -e 'printf "%.9f\n", time')
elapsed_time=$(perl -e "printf('%.9f', ($end_time - 
$start_time) + $elapsed_time)")
done

avg_elapsed_time_in_ms=$(perl -e "printf('%.9f', ($elapsed_time / 30))")
echo $avg_elapsed_time_in_ms
done

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread David Rowley

On Sat, 6 Apr 2024 at 23:17, Jelte Fennema-Nio  wrote:
> Weird that on your machines you don't see a difference. Are you sure
> you didn't make a silly mistake, like not restarting postgres or
> something?

I'm sure. I spent quite a long time between the AMD and an Apple m2 trying.

I did see the same regression as you on the smaller numbers.  I
experimented with the attached which macro'ifies internal_flush() and
pg_noinlines internal_flush_buffer.

Can you try that to see if it gets rid of the regression on the first two tests?

David
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 6497100a1a..824b2f11a3 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -120,8 +120,8 @@ static List *sock_paths = NIL;
 
 static char *PqSendBuffer;
 static int PqSendBufferSize;   /* Size send buffer */
-static int PqSendPointer;  /* Next index to store a byte in 
PqSendBuffer */
-static int PqSendStart;/* Next index to send a byte in 
PqSendBuffer */
+static size_t PqSendPointer;   /* Next index to store a byte in 
PqSendBuffer */
+static size_t PqSendStart; /* Next index to send a byte in 
PqSendBuffer */
 
 static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];
 static int PqRecvPointer;  /* Next index to read a byte from 
PqRecvBuffer */
@@ -133,6 +133,7 @@ static int  PqRecvLength;   /* End of data 
available in PqRecvBuffer */
 static bool PqCommBusy;/* busy sending data to the 
client */
 static bool PqCommReadingMsg;  /* in the middle of reading a message */
 
+#define internal_flush()   internal_flush_buffer(PqSendBuffer, 
&PqSendStart, &PqSendPointer)
 
 /* Internal functions */
 static void socket_comm_reset(void);
@@ -144,7 +145,8 @@ static bool socket_is_send_pending(void);
 static int socket_putmessage(char msgtype, const char *s, size_t len);
 static void socket_putmessage_noblock(char msgtype, const char *s, size_t len);
 static int internal_putbytes(const char *s, size_t len);
-static int internal_flush(void);
+static pg_noinline int internal_flush_buffer(const char *s, size_t *start,
+   
 size_t *end);
 
 static int Lock_AF_UNIX(const char *unixSocketDir, const char 
*unixSocketPath);
 static int Setup_AF_UNIX(const char *sock_path);
@@ -1282,14 +1284,32 @@ internal_putbytes(const char *s, size_t len)
if (internal_flush())
return EOF;
}
-   amount = PqSendBufferSize - PqSendPointer;
-   if (amount > len)
-   amount = len;
-   memcpy(PqSendBuffer + PqSendPointer, s, amount);
-   PqSendPointer += amount;
-   s += amount;
-   len -= amount;
+
+   /*
+* If the buffer is empty and data length is larger than the 
buffer
+* size, send it without buffering. Otherwise, put as much data 
as
+* possible into the buffer.
+*/
+   if (len >= PqSendBufferSize && PqSendStart == PqSendPointer)
+   {
+   size_t start = 0;
+
+   socket_set_nonblocking(false);
+   if (internal_flush_buffer(s, &start, &len))
+   return EOF;
+   }
+   else
+   {
+   amount = PqSendBufferSize - PqSendPointer;
+   if (amount > len)
+   amount = len;
+   memcpy(PqSendBuffer + PqSendPointer, s, amount);
+   PqSendPointer += amount;
+   s += amount;
+   len -= amount;
+   }
}
+
return 0;
 }
 
@@ -1315,19 +1335,19 @@ socket_flush(void)
 }
 
 /* 
- * internal_flush - flush pending output
+ * internal_flush_buffer - flush the given buffer content
  *
  * Returns 0 if OK (meaning everything was sent, or operation would block
  * and the socket is in non-blocking mode), or EOF if trouble.
  * 
  */
-static int
-internal_flush(void)
+static pg_noinline int
+internal_flush_buffer(const char *s, size_t *start, size_t *end)
 {
static int  last_reported_send_errno = 0;
 
-   char   *bufptr = PqSendBuffer + PqSendStart;
-   char   *bufend = PqSendBuffer + PqSendPointer;
+   char   *bufptr = (char*) s + *start;
+   char   *bufend = (char*) s + *end;
 
while (bufptr < bufend)
{
@@ -1373,7 +1393,7 @@ internal_flush(void)
 * flag that'll cause the next CHECK_FOR_INTERRUPTS to 
terminate
 * the connection.
 */
-   PqSendStart =

Re: Flushing large data immediately in pqcomm

2024-04-06 Thread Jelte Fennema-Nio

On Sat, 6 Apr 2024 at 03:34, David Rowley  wrote:
> Does anyone else want to try the attached script on the v5 patch to
> see if their numbers are better?

On my machine (i9-10900X, in Ubuntu 22.04 on WSL on Windows) v5
consistently beats master by ~0.25 seconds:

master:
Run 100 100 500: 1.948975205
Run 1024 10240 20: 3.039986587
Run 1024 1048576 2000: 2.444176276
Run 1048576 1048576 1000: 2.475328596

v5:
Run 100 100 500: 1.997170909
Run 1024 10240 20: 3.057802598
Run 1024 1048576 2000: 2.199449857
Run 1048576 1048576 1000: 2.210328762

The first two runs are pretty much equal, and I ran your script a few
more times and this seems like just random variance (sometimes v5 wins
those, sometimes master does always quite close to each other). But
the last two runs v5 consistently wins.

Weird that on your machines you don't see a difference. Are you sure
you didn't make a silly mistake, like not restarting postgres or
something?

Re: Flushing large data immediately in pqcomm

2024-04-05 Thread David Rowley

On Fri, 5 Apr 2024 at 03:28, Melih Mutlu  wrote:
>
> Jelte Fennema-Nio , 4 Nis 2024 Per, 16:34 tarihinde şunu 
> yazdı:
>>
>> On Thu, 4 Apr 2024 at 13:08, Melih Mutlu  wrote:
>> > I changed internal_flush() to an inline function, results look better this 
>> > way.
>>
>> It seems you also change internal_flush_buffer to be inline (but only
>> in the actual function definition, not declaration at the top). I
>> don't think inlining internal_flush_buffer should be necessary to
>> avoid the perf regressions, i.e. internal_flush is adding extra
>> indirection compared to master and is only a single line, so that one
>> makes sense to inline.
>
> Right. It was a mistake, forgot to remove that. Fixed it in v5.

I don't see any issues with v5, so based on the performance numbers
shown on this thread for the latest patch, it would make sense to push
it.  The problem is, I just can't recreate the performance numbers.

I've tried both on my AMD 3990x machine and an Apple M2 with a script
similar to the test.sh from above.  I mostly just stripped out the
buffer size stuff and adjusted the timing code to something that would
work with mac.

The script runs each copy 30 times and takes the average time,
reported here in seconds.

With AMD 3990x:

master
Run 100 100 500: 1.032264113 sec
Run 1024 10240 20: 1.016229105 sec
Run 1024 1048576 2000: 1.242267116 sec
Run 1048576 1048576 1000: 1.245425089 sec

v5
Run 100 100 500: 1.068543053 sec
Run 1024 10240 20: 1.026298571 sec
Run 1024 1048576 2000: 1.231169669 sec
Run 1048576 1048576 1000: 1.236355567 sec

With the M2 mini:

master
Run 100 100 500: 1.167851249 sec
Run 1024 10240 20: 1.962466987 sec
Run 1024 1048576 2000: 2.052836275 sec
Run 1048576 1048576 1000: 2.057908066 sec

v5
Run 100 100 500: 1.149636571 sec
Run 1024 10240 20: 2.158487741 sec
Run 1024 1048576 2000: 2.046627068 sec
Run 1048576 1048576 1000: 2.039329068 sec

>From looking at the perf reports, the top function is:

  57.62%  postgres   [.] CopyAttributeOutText

I messed around with trying to speed up the string escaping in that
function with the attached hacky patch and got the following on the
AMD 3990x machine:

CopyAttributeOutText_speedup.patch.txt
Run 100 100 500: 0.821673910
Run 1024 10240 20: 0.546632147
Run 1024 1048576 2000: 0.848492694
Run 1048576 1048576 1000: 0.840870293

I don't think we could actually do this unless we modified the output
function API to have it somehow output the number of bytes. The patch
may look beyond the NUL byte with pg_lfind8, which I don't think is
safe.

Does anyone else want to try the attached script on the v5 patch to
see if their numbers are better?

David
#!/bin/bash

dbname=postgres
port=5432

test_cases=(
"100 100 500"   # only 100 bytes
"1024 10240 20"# 1Kb and 10Kb 
"1024 1048576 2000" # 1Kb and 1Mb 
"1048576 1048576 1000"  # all 1Mb
)

insert_rows(){
psql -d $dbname -p $port -c "
DO \$\$
DECLARE
counter INT;
BEGIN
FOR counter IN 1..$3 LOOP
IF counter % 2 = 1 THEN
INSERT INTO test_table VALUES (repeat('a', $1)::text);
ELSE
INSERT INTO test_table VALUES (repeat('b', $2)::text);
END IF;
END LOOP;
END \$\$;
" > /dev/null
}


psql -d $dbname -p $port -c "CREATE EXTENSION IF NOT EXISTS pg_prewarm;" > 
/dev/null

for case in "${test_cases[@]}" 
do
psql -d $dbname -p $port -c "DROP TABLE IF EXISTS test_table;" > 
/dev/null
psql -d $dbname -p $port -c "CREATE TABLE test_table(data text not 
null);" > /dev/null
insert_rows $case

psql -d $dbname -p $port -c "select pg_prewarm('test_table');" > 
/dev/null

echo -n "Run $case: "

elapsed_time=0
for a in {1..30}
do
start_time=$(perl -MTime::HiRes=time -e 'printf "%.9f\n", time')
psql -d $dbname -p $port -c "COPY test_table TO STDOUT;" > 
/dev/null
end_time=$(perl -MTime::HiRes=time -e 'printf "%.9f\n", time')
elapsed_time=$(perl -e "printf('%.9f', ($end_time - 
$start_time) + $elapsed_time)")
done

avg_elapsed_time_in_ms=$(perl -e "printf('%.9f', ($elapsed_time / 30))")
echo $avg_elapsed_time_in_ms
done

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d7..fcb200503f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -29,6 +29,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_lfind.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -1068,9 +1069,18 @@ CopyAttributeOutText(CopyToState cstate, const char 
*string)
else
{
start = ptr;
-   while ((c = *ptr) != '\0')
+
+   while (true)
{
-

Re: Flushing large data immediately in pqcomm

2024-04-04 Thread Melih Mutlu

Jelte Fennema-Nio , 4 Nis 2024 Per, 16:34 tarihinde
şunu yazdı:

> On Thu, 4 Apr 2024 at 13:08, Melih Mutlu  wrote:
> > I changed internal_flush() to an inline function, results look better
> this way.
>
> It seems you also change internal_flush_buffer to be inline (but only
> in the actual function definition, not declaration at the top). I
> don't think inlining internal_flush_buffer should be necessary to
> avoid the perf regressions, i.e. internal_flush is adding extra
> indirection compared to master and is only a single line, so that one
> makes sense to inline.
>

Right. It was a mistake, forgot to remove that. Fixed it in v5.



> Other than that the code looks good to me.
>
> The new results look great.
>
> One thing that is quite interesting about these results is that
> increasing the buffer size results in even better performance (by
> quite a bit). I don't think we can easily choose a perfect number, as
> it seems to be a trade-off between memory usage and perf. But allowing
> people to configure it through a GUC like in your second patchset
> would be quite useful I think, especially because larger buffers could
> be configured for connections that would benefit most for it (e.g.
> replication connections or big COPYs).
>
> I think your add-pq_send_buffer_size-GUC.patch is essentially what we
> would need there but it would need some extra changes to actually be
> merge-able:
> 1. needs docs
> 2. rename PQ_SEND_BUFFER_SIZE (at least make it not UPPER_CASE, but
> maybe also remove the PQ_ prefix)
> 3. It's marked as PGC_USERSET, but there's no logic to grow/shrink it
> after initial allocation
>

I agree that the GUC patch requires more work to be in good shape. I
created that for testing purposes. But if we decide to make the buffer size
customizable, then I'll start polishing up that patch and address your
suggestions.

One case that could benefit from increased COPY performance is table sync
of logical replication. It might make sense letting users to configure
buffer size to speed up table sync. I'm not sure what kind of problems this
GUC would bring though.

Thanks,
-- 
Melih Mutlu
Microsoft


v5-0001-Flush-large-data-immediately-in-pqcomm.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-04-04 Thread Jelte Fennema-Nio

On Thu, 4 Apr 2024 at 13:08, Melih Mutlu  wrote:
> I changed internal_flush() to an inline function, results look better this 
> way.

It seems you also change internal_flush_buffer to be inline (but only
in the actual function definition, not declaration at the top). I
don't think inlining internal_flush_buffer should be necessary to
avoid the perf regressions, i.e. internal_flush is adding extra
indirection compared to master and is only a single line, so that one
makes sense to inline.

Other than that the code looks good to me.

The new results look great.

One thing that is quite interesting about these results is that
increasing the buffer size results in even better performance (by
quite a bit). I don't think we can easily choose a perfect number, as
it seems to be a trade-off between memory usage and perf. But allowing
people to configure it through a GUC like in your second patchset
would be quite useful I think, especially because larger buffers could
be configured for connections that would benefit most for it (e.g.
replication connections or big COPYs).

I think your add-pq_send_buffer_size-GUC.patch is essentially what we
would need there but it would need some extra changes to actually be
merge-able:
1. needs docs
2. rename PQ_SEND_BUFFER_SIZE (at least make it not UPPER_CASE, but
maybe also remove the PQ_ prefix)
3. It's marked as PGC_USERSET, but there's no logic to grow/shrink it
after initial allocation

Re: Flushing large data immediately in pqcomm

2024-04-04 Thread Melih Mutlu

Hi,

Melih Mutlu , 28 Mar 2024 Per, 22:44 tarihinde şunu
yazdı:
>
> On Wed, Mar 27, 2024 at 14:39 David Rowley  wrote:
>>
>> On Fri, 22 Mar 2024 at 12:46, Melih Mutlu  wrote:
>> can you confirm if the test was done in debug with casserts on?   If
>> so, it would be much better to have asserts off and have
>> -Dbuildtype=release.
>
>
> Yes, previous numbers were with --buildtype debug -Dcassert=true. I can
share new numbers with release build and asserts off soon.

While testing the patch without --buildtype debug -Dcassert=true, I felt
like there was still a slight regression. I changed internal_flush() to an
inline function, results look better this way.


1- row size = 100 bytes, # of rows = 100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │  4KB  │  8KB  │  16KB │  32KB │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │  861   │ 765   │ 612   │  521  │  477  │  480  │
├───┼┼───┼───┼───┼───┼───┤
│ patch │  869   │ 766   │ 612   │  519  │  482  │  467  │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 13978  │ 13746 │ 13909 │ 13956 │ 13920 │ 13895 │
└───┴┴───┴───┴───┴───┴───┘

2-  row size = half of the rows are 1KB and rest is 10KB , # of rows =
100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │ 4KB   │ 8KB   │ 16KB  │ 32KB  │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │ 30195  │ 26455 │ 17338 │ 14562 │ 12844 │ 11652 │
├───┼┼───┼───┼───┼───┼───┤
│ patch │ 14744  │ 15830 │ 15697 │ 14273 │ 12794 │ 11652 │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 24054  │ 23992 │ 24162 │ 23951 │ 23901 │ 23925 │
└───┴┴───┴───┴───┴───┴───┘

3-  row size = half of the rows are 1KB and rest is 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 3546   │ 3029 │ 2373 │ 2032 │ 1873 │ 1806 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 1715   │ 1723 │ 1724 │ 1731 │ 1729 │ 1709 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 1749   │ 1748 │ 1742 │ 1744 │ 1757 │ 1744 │
└───┴┴──┴──┴──┴──┴──┘

4-  row size = all rows are 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 7089   │ 5987 │ 4697 │ 4048 │ 3737 │ 3523 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 3438   │ 3411 │ 3400 │ 3416 │ 3399 │ 3429 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 3432   │ 3432 │ 3416 │ 3424 │ 3378 │ 3429 │
└───┴┴──┴──┴──┴──┴──┘

Thanks,
-- 
Melih Mutlu
Microsoft


v4-0001-Flush-large-data-immediately-in-pqcomm.patch
Description: Binary data

Re: Flushing large data immediately in pqcomm

2024-03-28 Thread Melih Mutlu

On Wed, Mar 27, 2024 at 18:54 Robert Haas  wrote:

> On Wed, Mar 27, 2024 at 7:39 AM David Rowley  wrote:
> > Robert, I understand you'd like a bit more from this patch. I'm
> > wondering if you planning on blocking another committer from going
> > ahead with this? Or if you have a reason why the current state of the
> > patch is not a meaningful enough improvement that would justify
> > possibly not getting any improvements in this area for PG17?
>
> So, I think that the first version of the patch, when it got a big
> chunk of data, would just flush whatever was already in the buffer and
> then send the rest without copying.


Correct.

The current version, as I
> understand it, only does that if the buffer is empty; otherwise, it
> copies data as much data as it can into the partially-filled buffer.


Yes, currently it should fill and flush the buffer first, if it’s not
already empty. Only then it sends the rest without copying.

Thanks,
Melih

Re: Flushing large data immediately in pqcomm

2024-03-28 Thread Melih Mutlu

On Wed, Mar 27, 2024 at 14:39 David Rowley  wrote:

> On Fri, 22 Mar 2024 at 12:46, Melih Mutlu  wrote:
> > I did all of the above changes and it seems like those resolved the
> regression issue.
>
> Thanks for adjusting the patch.   The numbers do look better, but on
> looking at your test.sh script from [1], I see:
>
> meson setup --buildtype debug -Dcassert=true
> --prefix="$DESTDIR/usr/local/pgsql" $DESTDIR && \
>
> can you confirm if the test was done in debug with casserts on?   If
> so, it would be much better to have asserts off and have
> -Dbuildtype=release.


Yes, previous numbers were with --buildtype debug -Dcassert=true. I can
share new numbers with release build and asserts off soon.

Thanks,
Melih

Re: Flushing large data immediately in pqcomm

2024-03-27 Thread Robert Haas

On Wed, Mar 27, 2024 at 7:39 AM David Rowley  wrote:
> Robert, I understand you'd like a bit more from this patch. I'm
> wondering if you planning on blocking another committer from going
> ahead with this? Or if you have a reason why the current state of the
> patch is not a meaningful enough improvement that would justify
> possibly not getting any improvements in this area for PG17?

So, I think that the first version of the patch, when it got a big
chunk of data, would just flush whatever was already in the buffer and
then send the rest without copying. The current version, as I
understand it, only does that if the buffer is empty; otherwise, it
copies data as much data as it can into the partially-filled buffer. I
think that change addresses most of my concern about the approach; the
old way could, I believe, lead to an increased total number of flushes
with the right usage pattern, but I don't believe that's possible with
the revised approach. I do kind of wonder whether there is some more
fine-tuning of the approach that would improve things further, but I
realize that we have very limited time to figure this out, and there's
no sense letting the perfect be the enemy of the good.

So in short... no, I don't have big concerns at this point. Melih's
latest benchmarks look fairly promising to me, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-03-27 Thread David Rowley

On Fri, 22 Mar 2024 at 12:46, Melih Mutlu  wrote:
> I did all of the above changes and it seems like those resolved the 
> regression issue.

Thanks for adjusting the patch.   The numbers do look better, but on
looking at your test.sh script from [1], I see:

meson setup --buildtype debug -Dcassert=true
--prefix="$DESTDIR/usr/local/pgsql" $DESTDIR && \

can you confirm if the test was done in debug with casserts on?   If
so, it would be much better to have asserts off and have
-Dbuildtype=release.

I'm planning to run some benchmarks tomorrow.   My thoughts are that
the patch allows the memcpy() to be skipped without adding any
additional buffer flushes and demonstrates a good performance increase
in various scenarios from doing so.  I think that is a satisfactory
goal. If I don't see any issues from reviewing and benchmarking
tomorrow, I'd like to commit this.

Robert, I understand you'd like a bit more from this patch. I'm
wondering if you planning on blocking another committer from going
ahead with this? Or if you have a reason why the current state of the
patch is not a meaningful enough improvement that would justify
possibly not getting any improvements in this area for PG17?

David

[1] 
https://www.postgresql.org/message-id/CAGPVpCSX8bTF61ZL9jOgh1AaY3bgsWnQ6J7WmJK4TV0f2LPnJQ%40mail.gmail.com

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread Melih Mutlu

Hi,

PSA v3.

Jelte Fennema-Nio , 21 Mar 2024 Per, 12:58 tarihinde
şunu yazdı:

> On Thu, 21 Mar 2024 at 01:24, Melih Mutlu  wrote:
> > What if I do a simple comparison like PqSendStart == PqSendPointer
> instead of calling pq_is_send_pending()
>
> Yeah, that sounds worth trying out. So the new suggestions to fix the
> perf issues on small message sizes would be:
>
> 1. add "inline" to internal_flush function
> 2. replace pq_is_send_pending() with PqSendStart == PqSendPointer
> 3. (optional) swap the order of PqSendStart == PqSendPointer and len
> >= PqSendBufferSize
>

I did all of the above changes and it seems like those resolved the
regression issue.
Since the previous results were with unix sockets, I share here the results
of v3 when using unix sockets for comparison.
Sharing only the case where all messages are 100 bytes, since this was when
the regression was most visible.

row size = 100 bytes, # of rows = 100
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 1106   │ 1006 │ 947  │ 920  │ 899  │ 888  │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 1094   │ 997  │ 943  │ 913  │ 894  │ 881  │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 6389   │ 6195 │ 6214 │ 6271 │ 6325 │ 6211 │
└───┴┴──┴──┴──┴──┴──┘

David Rowley , 21 Mar 2024 Per, 00:57 tarihinde şunu
yazdı:

> On Fri, 15 Mar 2024 at 01:46, Heikki Linnakangas  wrote:
> > - the "(int *) &len)" cast is not ok, and will break visibly on
> > big-endian systems where sizeof(int) != sizeof(size_t).
>
> I think fixing this requires adjusting the signature of
> internal_flush_buffer() to use size_t instead of int.   That also
> means that PqSendStart and PqSendPointer must also become size_t, or
> internal_flush() must add local size_t variables to pass to
> internal_flush_buffer and assign these back again to the global after
> the call.  Upgrading the globals might be the cleaner option.
>
> David


This is done too.

I actually tried to test it over a real network for a while. However, I
couldn't get reliable-enough numbers with both HEAD and the patch due to
network related issues.
I've decided to go with Jelte's suggestion [1]  which is decreasing MTU of
the loopback interface to 1500 and using localhost.

Here are the results:

1- row size = 100 bytes, # of rows = 100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │  4KB  │  8KB  │  16KB │  32KB │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │ 1351   │ 1233  │ 1074  │  988  │  944  │  916  │
├───┼┼───┼───┼───┼───┼───┤
│ patch │ 1369   │ 1232  │ 1073  │  981  │  928  │  907  │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 14949  │ 14533 │ 14791 │ 14864 │ 14612 │ 14751 │
└───┴┴───┴───┴───┴───┴───┘

2-  row size = half of the rows are 1KB and rest is 10KB , # of rows =
100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │ 4KB   │ 8KB   │ 16KB  │ 32KB  │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │ 37212  │ 31372 │ 25520 │ 21980 │ 20311 │ 18864 │
├───┼┼───┼───┼───┼───┼───┤
│ patch │ 23006  │ 23127 │ 23147 │ 9 │ 20367 │ 19155 │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 30725  │ 31090 │ 30917 │ 30796 │ 30984 │ 30813 │
└───┴┴───┴───┴───┴───┴───┘

3-  row size = half of the rows are 1KB and rest is 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 4296   │ 3713 │ 3040 │ 2711 │ 2528 │ 2449 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 2401   │ 2411 │ 2404 │ 2374 │ 2395 │ 2408 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 2399   │ 2403 │ 2408 │ 2389 │ 2402 │ 2403 │
└───┴┴──┴──┴──┴──┴──┘

4-  row size = all rows are 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 8335   │ 7370 │ 6017 │ 5368 │ 5009 │ 4843 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 4711   │ 4722 │ 4708 │ 4693 │ 4724 │ 4717 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 4704   │ 4712 │ 4746 │ 4728 │ 4709 │ 4

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread Melih Mutlu

Heikki Linnakangas , 14 Mar 2024 Per, 15:46 tarihinde şunu
yazdı:

> On 14/03/2024 13:22, Melih Mutlu wrote:
> > @@ -1282,14 +1283,32 @@ internal_putbytes(const char *s, size_t len)
> >   if (internal_flush())
> >   return EOF;
> >   }
> > - amount = PqSendBufferSize - PqSendPointer;
> > - if (amount > len)
> > - amount = len;
> > - memcpy(PqSendBuffer + PqSendPointer, s, amount);
> > - PqSendPointer += amount;
> > - s += amount;
> > - len -= amount;
> > +
> > + /*
> > +  * If the buffer is empty and data length is larger than
> the buffer
> > +  * size, send it without buffering. Otherwise, put as much
> data as
> > +  * possible into the buffer.
> > +  */
> > + if (!pq_is_send_pending() && len >= PqSendBufferSize)
> > + {
> > + int start = 0;
> > +
> > + socket_set_nonblocking(false);
> > + if (internal_flush_buffer(s, &start, (int *)&len))
> > + return EOF;
> > + }
> > + else
> > + {
> > + amount = PqSendBufferSize - PqSendPointer;
> > + if (amount > len)
> > + amount = len;
> > + memcpy(PqSendBuffer + PqSendPointer, s, amount);
> > + PqSendPointer += amount;
> > + s += amount;
> > + len -= amount;
> > + }
> >   }
> > +
> >   return 0;
> >  }
>
> Two small bugs:
>
> - the "(int *) &len)" cast is not ok, and will break visibly on
> big-endian systems where sizeof(int) != sizeof(size_t).
>
> - If internal_flush_buffer() cannot write all the data in one call, it
> updates 'start' for how much it wrote, and leaves 'end' unchanged. You
> throw the updated 'start' value away, and will send the same data again
> on next iteration.
>

There are two possible options for internal_flush_buffer() in
internal_putbytes() case:
1- Write all the data and return 0. We don't need start or end of the data
in this case.
2- Cannot write all and return EOF. In this case internal_putbytes() also
returns EOF immediately and does not really retry. There will be no next
iteration.

If it was non-blocking, then we may need to keep the new value. But I think
we do not need the updated start value in both cases here. What do you
think?

Thanks,
-- 
Melih Mutlu
Microsoft

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread David Rowley

On Thu, 21 Mar 2024 at 22:44, Jelte Fennema-Nio  wrote:
>
> On Thu, 21 Mar 2024 at 01:45, David Rowley  wrote:
> > As I understand the code, there's no problem calling
> > internal_flush_buffer() when the buffer is empty and I suspect that if
> > we're sending a few buffers with "len > PqSendBufferSize" that it's
> > just so unlikely that the buffer is empty that we should just do the
> > function call and let internal_flush_buffer() handle doing nothing if
> > the buffer really is empty.  I think the chances of
> > internal_flush_buffer() having to do exactly nothing here is less than
> > 1 in 8192, so I just don't think the check is worthwhile.
>
> I think you're missing the exact case that we're trying to improve
> here: Calls to internal_putbytes with a very large len, e.g. 1MB.
> With the new code the buffer will be empty ~50% of the time (not less
> than 1 in 8192) with such large buffers, because the flow that will
> happen:

It was the code I misread. I understand what the aim is. I failed to
notice the while loop in internal_putbytes().  So what I mentioned
about trying to fill the buffer before flushing already happens.  I
now agree that the PqSendStart == PqSendPointer test.  I'd say since
the reported regression was with 100 byte rows that testing "len >=
PqSendBufferSize" before PqSendStart == PqSendPointer makes sense.

David

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread Jelte Fennema-Nio

On Thu, 21 Mar 2024 at 01:24, Melih Mutlu  wrote:
> What if I do a simple comparison like PqSendStart == PqSendPointer instead of 
> calling pq_is_send_pending()

Yeah, that sounds worth trying out. So the new suggestions to fix the
perf issues on small message sizes would be:

1. add "inline" to internal_flush function
2. replace pq_is_send_pending() with PqSendStart == PqSendPointer
3. (optional) swap the order of PqSendStart == PqSendPointer and len
>= PqSendBufferSize

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread Jelte Fennema-Nio

On Thu, 21 Mar 2024 at 01:45, David Rowley  wrote:
> As I understand the code, there's no problem calling
> internal_flush_buffer() when the buffer is empty and I suspect that if
> we're sending a few buffers with "len > PqSendBufferSize" that it's
> just so unlikely that the buffer is empty that we should just do the
> function call and let internal_flush_buffer() handle doing nothing if
> the buffer really is empty.  I think the chances of
> internal_flush_buffer() having to do exactly nothing here is less than
> 1 in 8192, so I just don't think the check is worthwhile.

I think you're missing the exact case that we're trying to improve
here: Calls to internal_putbytes with a very large len, e.g. 1MB.
With the new code the buffer will be empty ~50% of the time (not less
than 1 in 8192) with such large buffers, because the flow that will
happen:

1. We check len > PqSendBufferSize. There are some bytes in the buffer
e.g. the 5 bytes of the msgtype. So we fill up the buffer, but have
many bytes left in len.
2. We loop again, because len is not 0.
3. We flush the buffer (at the top of the loop) because the buffer is full.
4. We check len > PqSendBufferSize. Now the buffer is empty, so we
call internal_flush_buffer directly

As you can see we check len > PqSendBufferSize twice (in step 1. and
step 4.), and 1 out of 2 times it returns 0

To be clear, the code is done this way so our behaviour would only
ever be better than the status-quo, and cause no regressions. For
instance, flushing the 5 byte header separately and then flushing the
full input buffer might result in more IP packets being sent in total
in some cases due to our TCP_NODELAY.

Re: Flushing large data immediately in pqcomm

2024-03-20 Thread David Rowley

On Thu, 21 Mar 2024 at 13:24, Melih Mutlu  wrote:
> What if I do a simple comparison like PqSendStart == PqSendPointer instead of 
> calling pq_is_send_pending() as Heikki suggested, then this check should not 
> hurt that much. Right? Does that make sense?

As I understand the code, there's no problem calling
internal_flush_buffer() when the buffer is empty and I suspect that if
we're sending a few buffers with "len > PqSendBufferSize" that it's
just so unlikely that the buffer is empty that we should just do the
function call and let internal_flush_buffer() handle doing nothing if
the buffer really is empty.  I think the chances of
internal_flush_buffer() having to do exactly nothing here is less than
1 in 8192, so I just don't think the check is worthwhile.  The reason
I don't think the odds are exactly 1 in 8192 is because if we're
sending a large number of bytes then it will be common that the buffer
will contain exactly 5 bytes due to the previous flush and command
prefix just having been added.

It's worth testing both, however. I might be wrong. Performance is
hard to predict. It would be good to see your test.sh script run with
and without the PqSendStart == PqSendPointer condition.

David

Re: Flushing large data immediately in pqcomm

2024-03-20 Thread Melih Mutlu

David Rowley , 21 Mar 2024 Per, 00:54 tarihinde şunu
yazdı:

> On Fri, 15 Mar 2024 at 02:03, Jelte Fennema-Nio 
> wrote:
> >
> > On Thu, 14 Mar 2024 at 13:12, Robert Haas  wrote:
> > >
> > > On Thu, Mar 14, 2024 at 7:22 AM Melih Mutlu 
> wrote:
> > > > 1- Even though I expect both the patch and HEAD behave similarly in
> case of small data (case 1: 100 bytes), the patch runs slightly slower than
> HEAD.
> > >
> > > I wonder why this happens. It seems like maybe something that could be
> fixed.
> >
> > some wild guesses:
> > 1. maybe it's the extra call overhead of the new internal_flush
> > implementation. What happens if you make that an inline function?
> > 2. maybe swap these conditions around (the call seems heavier than a
> > simple comparison): !pq_is_send_pending() && len >= PqSendBufferSize
>
> I agree these are both worth trying.  For #2, I wonder if the
> pq_is_send_pending() call is even worth checking at all. It seems to
> me that the internal_flush_buffer() code will just do nothing if
> nothing is pending.  Also, isn't there almost always going to be
> something pending when the "len >= PqSendBufferSize" condition is met?
>  We've just added the msgtype and number of bytes to the buffer which
> is 5 bytes. If the previous message was also more than
> PqSendBufferSize, then the buffer is likely to have 5 bytes due to the
> previous flush, otherwise isn't it a 1 in 8192 chance that the buffer
> is empty?
>
> If that fails to resolve the regression, maybe it's worth memcpy()ing
> enough bytes out of the message to fill the buffer then flush it and
> check if we still have > PqSendBufferSize remaining and skip the
> memcpy() for the rest.  That way there are no small flushes of just 5
> bytes and only ever the possibility of reducing the flushes as no
> pattern should cause the number of flushes to increase.
>

In len > PqSendBufferSize cases, the buffer should be filled as much as
possible if we're sure that it will be flushed at some point. Otherwise we
might end up with small flushes. The cases where we're sure that the buffer
will be flushed is when the buffer is not empty. If it's empty, there is no
need to fill it unnecessarily as it might cause an additional flush. AFAIU
from what you said, we shouldn't be worried about such a case since it's
unlikely to have the buffer empty due to the first 5 bytes. I guess the
only case where the buffer can be empty is when the buffer has
PqSendBufferSize-5
bytes from previous messages and adding 5 bytes of the current message will
flush the buffer. I'm not sure if removing the check may cause any
regression in any case, but it's just there to be safe.

What if I do a simple comparison like PqSendStart == PqSendPointer instead
of calling pq_is_send_pending() as Heikki suggested, then this check should
not hurt that much. Right? Does that make sense?

-- 
Melih Mutlu
Microsoft

Re: Flushing large data immediately in pqcomm

2024-03-20 Thread David Rowley

On Fri, 15 Mar 2024 at 01:46, Heikki Linnakangas  wrote:
> - the "(int *) &len)" cast is not ok, and will break visibly on
> big-endian systems where sizeof(int) != sizeof(size_t).

I think fixing this requires adjusting the signature of
internal_flush_buffer() to use size_t instead of int.   That also
means that PqSendStart and PqSendPointer must also become size_t, or
internal_flush() must add local size_t variables to pass to
internal_flush_buffer and assign these back again to the global after
the call.  Upgrading the globals might be the cleaner option.

David

Re: Flushing large data immediately in pqcomm

2024-03-20 Thread David Rowley

On Fri, 15 Mar 2024 at 02:03, Jelte Fennema-Nio  wrote:
>
> On Thu, 14 Mar 2024 at 13:12, Robert Haas  wrote:
> >
> > On Thu, Mar 14, 2024 at 7:22 AM Melih Mutlu  wrote:
> > > 1- Even though I expect both the patch and HEAD behave similarly in case 
> > > of small data (case 1: 100 bytes), the patch runs slightly slower than 
> > > HEAD.
> >
> > I wonder why this happens. It seems like maybe something that could be 
> > fixed.
>
> some wild guesses:
> 1. maybe it's the extra call overhead of the new internal_flush
> implementation. What happens if you make that an inline function?
> 2. maybe swap these conditions around (the call seems heavier than a
> simple comparison): !pq_is_send_pending() && len >= PqSendBufferSize

I agree these are both worth trying.  For #2, I wonder if the
pq_is_send_pending() call is even worth checking at all. It seems to
me that the internal_flush_buffer() code will just do nothing if
nothing is pending.  Also, isn't there almost always going to be
something pending when the "len >= PqSendBufferSize" condition is met?
 We've just added the msgtype and number of bytes to the buffer which
is 5 bytes. If the previous message was also more than
PqSendBufferSize, then the buffer is likely to have 5 bytes due to the
previous flush, otherwise isn't it a 1 in 8192 chance that the buffer
is empty?

If that fails to resolve the regression, maybe it's worth memcpy()ing
enough bytes out of the message to fill the buffer then flush it and
check if we still have > PqSendBufferSize remaining and skip the
memcpy() for the rest.  That way there are no small flushes of just 5
bytes and only ever the possibility of reducing the flushes as no
pattern should cause the number of flushes to increase.

David

Re: Flushing large data immediately in pqcomm

2024-03-14 Thread Jelte Fennema-Nio

On Thu, 14 Mar 2024 at 13:12, Robert Haas  wrote:
>
> On Thu, Mar 14, 2024 at 7:22 AM Melih Mutlu  wrote:
> > 1- Even though I expect both the patch and HEAD behave similarly in case of 
> > small data (case 1: 100 bytes), the patch runs slightly slower than HEAD.
>
> I wonder why this happens. It seems like maybe something that could be fixed.

some wild guesses:
1. maybe it's the extra call overhead of the new internal_flush
implementation. What happens if you make that an inline function?
2. maybe swap these conditions around (the call seems heavier than a
simple comparison): !pq_is_send_pending() && len >= PqSendBufferSize

BTW, the improvements for the larger rows are awesome!

Re: Flushing large data immediately in pqcomm

2024-03-14 Thread Heikki Linnakangas


On 14/03/2024 13:22, Melih Mutlu wrote:

@@ -1282,14 +1283,32 @@ internal_putbytes(const char *s, size_t len)
if (internal_flush())
return EOF;
}
-   amount = PqSendBufferSize - PqSendPointer;
-   if (amount > len)
-   amount = len;
-   memcpy(PqSendBuffer + PqSendPointer, s, amount);
-   PqSendPointer += amount;
-   s += amount;
-   len -= amount;
+
+   /*
+* If the buffer is empty and data length is larger than the 
buffer
+* size, send it without buffering. Otherwise, put as much data 
as
+* possible into the buffer.
+*/
+   if (!pq_is_send_pending() && len >= PqSendBufferSize)
+   {
+   int start = 0;
+
+   socket_set_nonblocking(false);
+   if (internal_flush_buffer(s, &start, (int *)&len))
+   return EOF;
+   }
+   else
+   {
+   amount = PqSendBufferSize - PqSendPointer;
+   if (amount > len)
+   amount = len;
+   memcpy(PqSendBuffer + PqSendPointer, s, amount);
+   PqSendPointer += amount;
+   s += amount;
+   len -= amount;
+   }
}
+
return 0;
 }


Two small bugs:

- the "(int *) &len)" cast is not ok, and will break visibly on 
big-endian systems where sizeof(int) != sizeof(size_t).


- If internal_flush_buffer() cannot write all the data in one call, it 
updates 'start' for how much it wrote, and leaves 'end' unchanged. You 
throw the updated 'start' value away, and will send the same data again 
on next iteration.


Not a correctness issue, but instead of pq_is_send_pending(), I think it 
would be better to check "PqSendStart == PqSendPointer" directly, or 
call socket_is_send_pending() directly here. pq_is_send_pending() does 
the same, but it's at a higher level of abstraction.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Flushing large data immediately in pqcomm

2024-03-14 Thread Jelte Fennema-Nio

On Thu, 14 Mar 2024 at 12:22, Melih Mutlu  wrote:
> I did some experiments with this patch, after previous discussions

One thing I noticed is that the buffer sizes don't seem to matter much
in your experiments, even though Andres his expectation was that 1400
would be better. I think I know the reason for that:

afaict from your test.sh script you connect psql over localhost or
maybe even unix socket to postgres. Neither of those would not have an
MTU of 1500. You'd probably want to do those tests over an actual
network or at least change the MTU of the loopback interface. e.g. my
"lo" interface mtu is 65536 by default:

❯ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever

Re: Flushing large data immediately in pqcomm

2024-03-14 Thread Robert Haas

On Thu, Mar 14, 2024 at 7:22 AM Melih Mutlu  wrote:
> 1- Even though I expect both the patch and HEAD behave similarly in case of 
> small data (case 1: 100 bytes), the patch runs slightly slower than HEAD.

I wonder why this happens. It seems like maybe something that could be fixed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-03-14 Thread Melih Mutlu

Hi hackers,

I did some experiments with this patch, after previous discussions. This
probably does not answer all questions, but would be happy to do more if
needed.

First, I updated the patch according to what suggested here [1]. PSA  v2.
I tweaked the master branch a bit to not allow any buffering. I compared
HEAD, this patch and no buffering at all.
I also added a simple GUC to control PqSendBufferSize, this change only
allows to modify the buffer size and should not have any impact on
performance.

I again ran the COPY TO STDOUT command and timed it. AFAIU COPY sends data
row by row, and I tried running the command under different scenarios with
different # of rows and row sizes. You can find the test script attached
(see test.sh).
All timings are in ms.

1- row size = 100 bytes, # of rows = 100
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 1036   │ 998  │ 940  │ 910  │ 894  │ 874  │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 1107   │ 1032 │ 980  │ 957  │ 917  │ 909  │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 6230   │ 6125 │ 6282 │ 6279 │ 6255 │ 6221 │
└───┴┴──┴──┴──┴──┴──┘

2-  row size = half of the rows are 1KB and rest is 10KB , # of rows =
100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │ 4KB   │ 8KB   │ 16KB  │ 32KB  │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │ 25197  │ 23414 │ 20612 │ 19206 │ 18334 │ 18033 │
├───┼┼───┼───┼───┼───┼───┤
│ patch │ 19843  │ 19889 │ 19798 │ 19129 │ 18578 │ 18260 │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 23752  │ 23565 │ 23602 │ 23622 │ 23541 │ 23599 │
└───┴┴───┴───┴───┴───┴───┘

3-  row size = half of the rows are 1KB and rest is 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 3137   │ 2937 │ 2687 │ 2551 │ 2456 │ 2465 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 2399   │ 2390 │ 2402 │ 2415 │ 2417 │ 2422 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 2417   │ 2414 │ 2429 │ 2418 │ 2435 │ 2404 │
└───┴┴──┴──┴──┴──┴──┘

3-  row size = all rows are 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 6113   │ 5764 │ 5281 │ 5009 │ 4885 │ 4872 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 4759   │ 4754 │ 4754 │ 4758 │ 4782 │ 4805 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 4756   │ 4774 │ 4793 │ 4766 │ 4770 │ 4774 │
└───┴┴──┴──┴──┴──┴──┘

Some quick observations:
1- Even though I expect both the patch and HEAD behave similarly in case of
small data (case 1: 100 bytes), the patch runs slightly slower than HEAD.
2- In cases where the data does not fit into the buffer, the patch starts
performing better than HEAD. For example, in case 2, patch seems faster
until the buffer size exceeds the data length. When the buffer size is set
to something larger than 10KB (16KB/32KB in this case), there is again a
slight performance loss with the patch as in case 1.
3- With large row sizes (i.e. sizes that do not fit into the buffer) not
buffering at all starts performing better than HEAD. Similarly the patch
performs better too as it stops buffering if data does not fit into the
buffer.



[1]
https://www.postgresql.org/message-id/CAGECzQTYUhnC1bO%3DzNiSpUgCs%3DhCYxVHvLD2doXNx3My6ZAC2w%40mail.gmail.com


Thanks,
-- 
Melih Mutlu
Microsoft
From 7f1b72a0948156f8e35ce3b07b5e763a5578d641 Mon Sep 17 00:00:00 2001
From: Melih Mutlu 
Date: Mon, 26 Feb 2024 14:41:35 +0300
Subject: [PATCH] Added pq_send_buffer_size GUC

---
 src/backend/libpq/pqcomm.c  |  2 +-
 src/backend/utils/misc/guc_tables.c | 11 +++
 src/include/libpq/libpq.h   |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index c606bf3447..92708e46e6 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -116,7 +116,7 @@ static List *sock_paths = NIL;
  * enlarged by pq_putmessage_noblock() if the message doesn't fit otherwise.
  */
 
-#define PQ_SEND_BUFFER_SIZE 8192
+int PQ_SEND_BUFFER_SIZE = 8192;
 #define PQ_RECV_BUFFER_SIZE 8192
 
 static ch

Re: Flushing large data immediately in pqcomm

2024-02-02 Thread Andres Freund

Hi,

On 2024-02-01 15:02:57 -0500, Robert Haas wrote:
> On Thu, Feb 1, 2024 at 10:52 AM Robert Haas  wrote:
> There was probably a better way to phrase this email ... the sentiment
> is sincere, but there was almost certainly a way of writing it that
> didn't sound like I'm super-annoyed.

NP - I could have phrased mine better as well...

> > On Wed, Jan 31, 2024 at 10:24 PM Andres Freund  wrote:
> > > While not perfect - e.g. because networks might use jumbo packets / large 
> > > MTUs
> > > and we don't know how many outstanding bytes there are locally, I think a
> > > decent heuristic could be to always try to send at least one packet worth 
> > > of
> > > data at once (something like ~1400 bytes), even if that requires copying 
> > > some
> > > of the input data. It might not be sent on its own, but it should make it
> > > reasonably unlikely to end up with tiny tiny packets.
> >
> > I think that COULD be a decent heuristic but I think it should be
> > TESTED, including against the ~3 or so other heuristics proposed on
> > this thread, before we make a decision.
> >
> > I literally mentioned the Ethernet frame size as one of the things
> > that we should test whether it's relevant in the exact email to which
> > you're replying, and you replied by proposing that as a heuristic, but
> > also criticizing me for wanting more research before we settle on
> > something.

I mentioned the frame size thing because afaict nobody in the thread had
mentioned our use of TCP_NODELAY (which basically forces the kernel to send
out data immediately instead of waiting for further data to be sent). Without
that it'd be a lot less problematic to occasionally send data in small
increments inbetween larger sends. Nor would packet sizes be as relevant.

> > Are we just supposed to assume that your heuristic is better than the
> > others proposed here without testing anything, or, like, what? I don't
> > think this needs to be a completely exhaustive or exhausting process, but
> > I think trying a few different things out and seeing what happens is
> > smart.

I wasn't trying to say that my heuristic necessarily is better. What I was
trying to get at - and expressed badly - was that I doubt that testing can get
us all that far here. It's not too hard to test the effects of our buffering
with regards to syscall overhead, but once you actually take network effects
into account it gets quite hard. Bandwidth, latency, the specific network
hardware and operating systems involved all play a significant role. Given
how, uh, naive our current approach is, I think analyzing the situation from
first principles and then doing some basic validation of the results makes
more sense.

Separately, I think we shouldn't aim for perfect here. It's obviously
extremely inefficient to send a larger amount of data by memcpy()ing and
send()ing it in 8kB chunks. As mentioned by several folks upthread, we can
improve upon that without having worse behaviour than today.  Medium-long term
I suspect we're going to want to use asynchronous network interfaces, in
combination with zero-copy sending, which requires larger changes. Not that
relevant for things like query results, quite relevant for base backups etc.

It's perhaps also worth mentioning that the small send buffer isn't great for
SSL performance, the encryption overhead increases when sending in small
chunks.

I hacked up Melih's patch to send the pending data together with the first bit
of the large "to be sent" data and also added a patch to increased
SINK_BUFFER_LENGTH by 16x. With a 12GB database I tested the time for
  pg_basebackup -c fast -Ft --compress=none -Xnone -D - -d "$conn" > /dev/null

   time via
test   unix tcp tcp+ssl
master 6.305s   9.436s  15.596s
master-larger-buffer   6.535s   9.453s  15.208s
patch  5.900s   7.465s  13.634s
patch-larger-buffer5.233s   5.439s  11.730s

The increase when using tcp is pretty darn impressive.  If I had remembered in
time to disable manifests checksums, the win would have been even bigger.

The bottleneck for SSL is that it still ends up with ~16kB sends, not sure
why.

Greetings,

Andres Freund

Re: Flushing large data immediately in pqcomm

2024-02-01 Thread Robert Haas

On Thu, Feb 1, 2024 at 10:52 AM Robert Haas  wrote:
> On Wed, Jan 31, 2024 at 10:24 PM Andres Freund  wrote:
> > While not perfect - e.g. because networks might use jumbo packets / large 
> > MTUs
> > and we don't know how many outstanding bytes there are locally, I think a
> > decent heuristic could be to always try to send at least one packet worth of
> > data at once (something like ~1400 bytes), even if that requires copying 
> > some
> > of the input data. It might not be sent on its own, but it should make it
> > reasonably unlikely to end up with tiny tiny packets.
>
> I think that COULD be a decent heuristic but I think it should be
> TESTED, including against the ~3 or so other heuristics proposed on
> this thread, before we make a decision.
>
> I literally mentioned the Ethernet frame size as one of the things
> that we should test whether it's relevant in the exact email to which
> you're replying, and you replied by proposing that as a heuristic, but
> also criticizing me for wanting more research before we settle on
> something. Are we just supposed to assume that your heuristic is
> better than the others proposed here without testing anything, or,
> like, what? I don't think this needs to be a completely exhaustive or
> exhausting process, but I think trying a few different things out and
> seeing what happens is smart.

There was probably a better way to phrase this email ... the sentiment
is sincere, but there was almost certainly a way of writing it that
didn't sound like I'm super-annoyed.

Apologies for that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-02-01 Thread Robert Haas

On Wed, Jan 31, 2024 at 10:24 PM Andres Freund  wrote:
> While not perfect - e.g. because networks might use jumbo packets / large MTUs
> and we don't know how many outstanding bytes there are locally, I think a
> decent heuristic could be to always try to send at least one packet worth of
> data at once (something like ~1400 bytes), even if that requires copying some
> of the input data. It might not be sent on its own, but it should make it
> reasonably unlikely to end up with tiny tiny packets.

I think that COULD be a decent heuristic but I think it should be
TESTED, including against the ~3 or so other heuristics proposed on
this thread, before we make a decision.

I literally mentioned the Ethernet frame size as one of the things
that we should test whether it's relevant in the exact email to which
you're replying, and you replied by proposing that as a heuristic, but
also criticizing me for wanting more research before we settle on
something. Are we just supposed to assume that your heuristic is
better than the others proposed here without testing anything, or,
like, what? I don't think this needs to be a completely exhaustive or
exhausting process, but I think trying a few different things out and
seeing what happens is smart.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-01-31 Thread Andres Freund

Hi,

On 2024-01-31 14:57:35 -0500, Robert Haas wrote:
> > You're right and I'm open to doing more legwork. I'd also appreciate any
> > suggestion about how to test this properly and/or useful scenarios to
> > test. That would be really helpful.
>
> I think experimenting to see whether the long-short-long-short
> behavior that Heikki postulated emerges in practice would be a really
> good start.
>
> Another experiment that I think would be interesting is: suppose you
> create a patch that sends EVERY message without buffering and compare
> that to master. My naive expectation would be that this will lose if
> you pump short messages through that connection and win if you pump
> long messages through that connection. Is that true? If yes, at what
> point do we break even on performance? Does it depend on whether the
> connection is local or over a network? Does it depend on whether it's
> with or without SSL? Does it depend on Linux vs. Windows vs.
> whateverBSD? What happens if you twiddle the 8kB buffer size up or,
> say, down to just below the Ethernet frame size?

I feel like you're putting up a too high bar for something that can be a
pretty clear improvement on its own, without a downside. The current behaviour
is pretty absurd, doing all this research across all platforms isn't going to
disprove that - and it's a lot of work.  ISTM we can analyze this without
taking concrete hardware into account easily enough.

One thing that I haven't seen mentioned here that's relevant around using
small buffers: Postgres uses TCP_NODELAY and has to do so. That means doing
tiny sends can hurt substantially

> I think that what we really want to understand here is under what
> circumstances the extra layer of buffering is a win vs. being a loss.

It's quite easy to see that doing no buffering isn't viable - we end up with
tiny tiny TCP packets, one for each send(). And then there's the syscall
overhead.

Here's a quickly thrown together benchmark using netperf. First with -D, which
instructs it to use TCP_NODELAY, as we do.

10gbit network, remote host:

$ (fields="request_size,throughput"; echo "$fields";for i in $(seq 0 16); do 
s=$((2**$i));netperf -P0 -t TCP_STREAM -l1 -H alap5-10gbe  -- -r $s,$s -D 1 -o 
"$fields";done)|column -t -s,

request_size  throughput
1 22.73
2 45.77
4 108.64
8 225.78
16560.32
321035.61
642177.91
128   3604.71
256   5878.93
512   9334.70
1024  9031.13
2048  9405.35
4096  9334.60
8192  9275.33
16384 9406.29
32768 9385.52
65536 9399.40

localhost:
request_size  throughput
1 2.76
2 5.10
4 9.89
8 20.51
1643.42
3287.13
64173.72
128   343.70
256   647.89
512   1328.79
1024  2550.14
2048  4998.06
4096  9482.06
8192  17130.76
16384 29048.02
32768 42106.33
65536 48579.95

I'm slightly baffled by the poor performance of localhost with tiny packet
sizes. Ah, I see - it's the NODELA, without that:

localhost:
1 32.02
2 60.58
4 114.32
8 262.71
16558.42
321053.66
642099.39
128   3815.60
256   6566.19
512   11751.79
1024  18976.11
2048  27222.99
4096  33838.07
8192  38219.60
16384 39146.37
32768 44784.98
65536 44214.70

NODELAY triggers many more context switches, because there's immediately data
available for the receiving side. Whereas with real network the interrupts get
coalesced.

I think that's pretty clear evidence that we need buffering.  But I think we
can probably be smarter than we are right now, and then what's been proposed
in the patch. Because of TCP_NODELAY we shouldn't send a tiny buffer on its
own, it may trigger sending a small TCP packet, which is quite inefficient.

While not perfect - e.g. because networks might use jumbo packets / large MTUs
and we don't know how many outstanding bytes there are locally, I think a
decent heuristic could be to always try to send at least one packet worth of
data at once (something like ~1400 bytes), even if that requires copying some
of the input data. It might not be sent on its own, but it should make it
reasonably unlikely to end up with tiny tiny packets.

Greetings,

Andres Freund

Re: Flushing large data immediately in pqcomm

2024-01-31 Thread Robert Haas

On Wed, Jan 31, 2024 at 2:23 PM Melih Mutlu  wrote:
>> That seems like it might be a useful refinement of Melih Mutlu's
>> original proposal, but consider a message stream that consists of
>> messages exactly 8kB in size. If that message stream begins when the
>> buffer is empty, all messages are sent directly. If it begins when
>> there are any number of bytes in the buffer, we buffer every message
>> forever. That's kind of an odd artifact, but maybe it's fine in
>> practice. I say again that it's good to test out a bunch of scenarios
>> and see what shakes out.
>
> Isn't this already the case? Imagine sending exactly 8kB messages, the first 
> pq_putmessage() call will buffer 8kB. Any call after this point simply sends 
> a 8kB message already buffered from the previous call and buffers a new 8kB 
> message. Only difference here is we keep the message in the buffer for a 
> while instead of sending it directly. In theory, the proposed idea should not 
> bring any difference in the number of flushes and the size of data we send in 
> each time, but can remove unnecessary copies to the buffer in this case. I 
> guess the behaviour is also the same with or without the patch in case the 
> buffer has already some bytes.

Yes, it's never worse than today in terms of number of buffer flushes,
but it doesn't feel like great behavior, either. Users tend not to
like it when the behavior of an algorithm depends heavily on
incidental factors that shouldn't really be relevant, like whether the
buffer starts with 1 byte in it or 0 at the beginning of a long
sequence of messages. They see the performance varying "for no reason"
and they dislike it. They don't say "even the bad performance is no
worse than earlier versions so it's fine."

> You're right and I'm open to doing more legwork. I'd also appreciate any 
> suggestion about how to test this properly and/or useful scenarios to test. 
> That would be really helpful.

I think experimenting to see whether the long-short-long-short
behavior that Heikki postulated emerges in practice would be a really
good start.

Another experiment that I think would be interesting is: suppose you
create a patch that sends EVERY message without buffering and compare
that to master. My naive expectation would be that this will lose if
you pump short messages through that connection and win if you pump
long messages through that connection. Is that true? If yes, at what
point do we break even on performance? Does it depend on whether the
connection is local or over a network? Does it depend on whether it's
with or without SSL? Does it depend on Linux vs. Windows vs.
whateverBSD? What happens if you twiddle the 8kB buffer size up or,
say, down to just below the Ethernet frame size?

I think that what we really want to understand here is under what
circumstances the extra layer of buffering is a win vs. being a loss.
If all the stuff I just mentioned doesn't really matter and the answer
is, say, that an 8kB buffer is great and the breakpoint where extra
buffering makes sense is also 8kB, and that's consistent regardless of
other variables, then your algorithm or Jelte's variant or something
of that nature is probably just right. But if it turns out, say, that
the extra buffering is only a win for sub-1kB messages, that would be
rather nice to know before we finalize the approach. Also, if it turns
out that the answer differs dramatically based on whether you're using
a UNIX socket or TCP, that would also be nice to know before
finalizing an algorithm.

> I understand that I should provide more/better analysis around this change to 
> prove that it doesn't hurt (hopefully) but improves some cases even though 
> not all the cases. That may even help us to find a better approach than 
> what's already proposed. Just to clarify, I don't think anyone here suggests 
> that the bar should be at "if it can't lose relative to today, it's good 
> enough". IMHO "a change that improves some cases, but regresses nowhere" does 
> not translate to that.

Well, I thought those were fairly similar sentiments, so maybe I'm not
quite understanding the statement in the way it was meant.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-01-31 Thread Melih Mutlu

Robert Haas , 31 Oca 2024 Çar, 20:23 tarihinde şunu
yazdı:

> On Tue, Jan 30, 2024 at 6:39 PM Jelte Fennema-Nio 
> wrote:
> > I agree that it's hard to prove that such heuristics will always be
> > better in practice than the status quo. But I feel like we shouldn't
> > let perfect be the enemy of good here.
>
> Sure, I agree.
>
> > I one approach that is a clear
> > improvement over the status quo is:
> > 1. If the buffer is empty AND the data we are trying to send is larger
> > than the buffer size, then don't use the buffer.
> > 2. If not, fill up the buffer first (just like we do now) then send
> > that. And if the left over data is then still larger than the buffer,
> > then now the buffer is empty so 1. applies.
>
> That seems like it might be a useful refinement of Melih Mutlu's
> original proposal, but consider a message stream that consists of
> messages exactly 8kB in size. If that message stream begins when the
> buffer is empty, all messages are sent directly. If it begins when
> there are any number of bytes in the buffer, we buffer every message
> forever. That's kind of an odd artifact, but maybe it's fine in
> practice. I say again that it's good to test out a bunch of scenarios
> and see what shakes out.
>

Isn't this already the case? Imagine sending exactly 8kB messages, the
first pq_putmessage() call will buffer 8kB. Any call after this point
simply sends a 8kB message already buffered from the previous call and
buffers a new 8kB message. Only difference here is we keep the message in
the buffer for a while instead of sending it directly. In theory, the
proposed idea should not bring any difference in the number of flushes and
the size of data we send in each time, but can remove unnecessary copies to
the buffer in this case. I guess the behaviour is also the same with or
without the patch in case the buffer has already some bytes.

Robert Haas , 31 Oca 2024 Çar, 21:28 tarihinde şunu
yazdı:

> Personally, I don't think it's likely that anything will get committed
> here without someone doing more legwork than I've seen on the thread
> so far. I don't have any plan to pick up this patch anyway, but if I
> were thinking about it, I would abandon the idea unless I were
> prepared to go test a bunch of stuff myself. I agree with the core
> idea of this work, but not with the idea that the bar is as low as "if
> it can't lose relative to today, it's good enough."
>

You're right and I'm open to doing more legwork. I'd also appreciate any
suggestion about how to test this properly and/or useful scenarios to test.
That would be really helpful.

I understand that I should provide more/better analysis around this change
to prove that it doesn't hurt (hopefully) but improves some cases even
though not all the cases. That may even help us to find a better approach
than what's already proposed. Just to clarify, I don't think anyone here
suggests that the bar should be at "if it can't lose relative to today,
it's good enough". IMHO "a change that improves some cases, but regresses
nowhere" does not translate to that.

Thanks,
-- 
Melih Mutlu
Microsoft

Re: Flushing large data immediately in pqcomm

2024-01-31 Thread Robert Haas

On Wed, Jan 31, 2024 at 12:49 PM Jelte Fennema-Nio  wrote:
> Testing a bunch of scenarios to find a good one sounds like a good
> idea, which can probably give us a more optimal heuristic. But it also
> sounds like a lot of work, and probably results in a lot of
> discussion. That extra effort might mean that we're not going to
> commit any change for PG17 (or even at all). If so, then I'd rather
> have a modest improvement from my refinement of Melih's proposal, than
> none at all.

Personally, I don't think it's likely that anything will get committed
here without someone doing more legwork than I've seen on the thread
so far. I don't have any plan to pick up this patch anyway, but if I
were thinking about it, I would abandon the idea unless I were
prepared to go test a bunch of stuff myself. I agree with the core
idea of this work, but not with the idea that the bar is as low as "if
it can't lose relative to today, it's good enough."

Of course, another committer may see it differently.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-01-31 Thread Jelte Fennema-Nio

On Wed, 31 Jan 2024 at 18:23, Robert Haas  wrote:
> That's kind of an odd artifact, but maybe it's fine in
> practice.

I agree it's an odd artifact, but it's not a regression over the
status quo. Achieving that was the intent of my suggestion: A change
that improves some cases, but regresses nowhere.

> I say again that it's good to test out a bunch of scenarios
> and see what shakes out.

Testing a bunch of scenarios to find a good one sounds like a good
idea, which can probably give us a more optimal heuristic. But it also
sounds like a lot of work, and probably results in a lot of
discussion. That extra effort might mean that we're not going to
commit any change for PG17 (or even at all). If so, then I'd rather
have a modest improvement from my refinement of Melih's proposal, than
none at all.

Re: Flushing large data immediately in pqcomm

2024-01-31 Thread Robert Haas

On Tue, Jan 30, 2024 at 6:39 PM Jelte Fennema-Nio  wrote:
> I agree that it's hard to prove that such heuristics will always be
> better in practice than the status quo. But I feel like we shouldn't
> let perfect be the enemy of good here.

Sure, I agree.

> I one approach that is a clear
> improvement over the status quo is:
> 1. If the buffer is empty AND the data we are trying to send is larger
> than the buffer size, then don't use the buffer.
> 2. If not, fill up the buffer first (just like we do now) then send
> that. And if the left over data is then still larger than the buffer,
> then now the buffer is empty so 1. applies.

That seems like it might be a useful refinement of Melih Mutlu's
original proposal, but consider a message stream that consists of
messages exactly 8kB in size. If that message stream begins when the
buffer is empty, all messages are sent directly. If it begins when
there are any number of bytes in the buffer, we buffer every message
forever. That's kind of an odd artifact, but maybe it's fine in
practice. I say again that it's good to test out a bunch of scenarios
and see what shakes out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Jelte Fennema-Nio

On Tue, 30 Jan 2024 at 19:48, Robert Haas  wrote:
>
> On Tue, Jan 30, 2024 at 12:58 PM Melih Mutlu  wrote:
> > Sounds like it's difficult to come up with a heuristic that would work well 
> > enough for most cases.
> > One thing with sending data instead of copying it if the buffer is empty is 
> > that initially the buffer is empty. I believe it will stay empty forever if 
> > we do not copy anything when the buffer is empty. We can maybe simply set 
> > the threshold to the buffer size/2 (4kB) and hope that will work better. Or 
> > copy the data only if it fits into the remaining space in the buffer. What 
> > do you think?
> >
> > An additional note while I mentioned pq_putmessage_noblock(), I've been 
> > testing sending input data immediately in pq_putmessage_noblock() without 
> > blocking and copy the data into PqSendBuffer only if the socket would block 
> > and cannot send it. Unfortunately, I don't have strong numbers to 
> > demonstrate any improvement in perf or timing yet. But I still like to know 
> > what would you think about it?
>
> I think this is an area where it's very difficult to foresee on
> theoretical grounds what will be right in practice

I agree that it's hard to prove that such heuristics will always be
better in practice than the status quo. But I feel like we shouldn't
let perfect be the enemy of good here. I one approach that is a clear
improvement over the status quo is:
1. If the buffer is empty AND the data we are trying to send is larger
than the buffer size, then don't use the buffer.
2. If not, fill up the buffer first (just like we do now) then send
that. And if the left over data is then still larger than the buffer,
then now the buffer is empty so 1. applies.

Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Robert Haas

On Tue, Jan 30, 2024 at 12:58 PM Melih Mutlu  wrote:
> Sounds like it's difficult to come up with a heuristic that would work well 
> enough for most cases.
> One thing with sending data instead of copying it if the buffer is empty is 
> that initially the buffer is empty. I believe it will stay empty forever if 
> we do not copy anything when the buffer is empty. We can maybe simply set the 
> threshold to the buffer size/2 (4kB) and hope that will work better. Or copy 
> the data only if it fits into the remaining space in the buffer. What do you 
> think?
>
> An additional note while I mentioned pq_putmessage_noblock(), I've been 
> testing sending input data immediately in pq_putmessage_noblock() without 
> blocking and copy the data into PqSendBuffer only if the socket would block 
> and cannot send it. Unfortunately, I don't have strong numbers to demonstrate 
> any improvement in perf or timing yet. But I still like to know what would 
> you think about it?

I think this is an area where it's very difficult to foresee on
theoretical grounds what will be right in practice. The problem is
that the best algorithm probably depends on what usage patterns are
common in practice. I think one common usage pattern will be a bunch
of roughly equal-sized messages in a row, like CopyData or DataRow
messages -- but those messages won't have a consistent width. It would
probably be worth testing what behavior you see in such cases -- start
with say a stream of 100 byte messages and then gradually increase and
see how the behavior evolves.

But you can also have other patterns, with messages of different sizes
interleaved. In the case of FE-to-BE traffic, the extended query
protocol might be a good example of that: the Parse message could be
quite long, or not, but the Bind Describe Execute Sync messages that
follow are probably all short. That case doesn't arise in this
direction, but I can't think exactly of what cases that do. It seems
like someone would need to play around and try some different cases
and maybe log the sizes of the secure_write() calls with various
algorithms, and then try to figure out what's best. For example, if
the alternating short-write, long-write behavior that Heikki mentioned
is happening, and I do think that particular thing is a very real
risk, then you haven't got it figured out yet...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Melih Mutlu

Hi Robert,

Robert Haas , 29 Oca 2024 Pzt, 20:48 tarihinde şunu
yazdı:

> > If there's already some data in PqSendBuffer, I wonder if it would be
> > better to fill it up with data, flush it, and then send the rest of the
> > data directly. Instead of flushing the partial data first. I'm afraid
> > that you'll make a tiny call to secure_write(), followed by a large one,
> > then a tine one again, and so forth. Especially when socket_putmessage
> > itself writes the msgtype and len, which are tiny, before the payload.
> >
> > Perhaps we should invent a new pq_putmessage() function that would take
> > an input buffer with 5 bytes of space reserved before the payload.
> > pq_putmessage() could then fill in the msgtype and len bytes in the
> > input buffer and send that directly. (Not wedded to that particular API,
> > but something that would have the same effect)
>
> I share the concern; I'm not sure about the best solution. I wonder if
> it would be useful to have pq_putmessagev() in the style of writev()
> et al. Or maybe what we need is secure_writev().
>

I thought about using writev() for not only pq_putmessage() but
pq_putmessage_noblock() too. Currently, pq_putmessage_noblock()
repallocs PqSendBuffer
and copies input buffer, which can easily be larger than 8kB, into
PqSendBuffer.I
also discussed it with Thomas off-list. The thing is that I believe we
would need secure_writev() with SSL/GSS cases handled properly. I'm just
not sure if the effort would be worthwhile considering what we gain from it.


> I also wonder if the threshold for sending data directly should be
> smaller than the buffer size, and/or whether it should depend on the
> buffer being empty.


You might be right. I'm not sure what the ideal threshold would be.


> If we have an 8kB buffer that currently has
> nothing in it, and somebody writes 2kB, I suspect it might be wrong to
> copy that into the buffer. If the same buffer had 5kB used and 3kB
> free, copying sounds a lot more likely to work out. The goal here is
> probably to condense sequences of short messages into a single
> transmission while sending long messages individually. I'm just not
> quite sure what heuristic would do that most effectively.
>

Sounds like it's difficult to come up with a heuristic that would work well
enough for most cases.
One thing with sending data instead of copying it if the buffer is empty is
that initially the buffer is empty. I believe it will stay empty forever if
we do not copy anything when the buffer is empty. We can maybe simply set
the threshold to the buffer size/2 (4kB) and hope that will work better. Or
copy the data only if it fits into the remaining space in the buffer. What
do you think?


An additional note while I mentioned pq_putmessage_noblock(), I've been
testing sending input data immediately in pq_putmessage_noblock() without
blocking and copy the data into PqSendBuffer only if the socket would block
and cannot send it. Unfortunately, I don't have strong numbers to
demonstrate any improvement in perf or timing yet. But I still like to know
what would you think about it?

Thanks,
-- 
Melih Mutlu
Microsoft

Re: Flushing large data immediately in pqcomm

2024-01-30 Thread Melih Mutlu

Hi Heikki,

Heikki Linnakangas , 29 Oca 2024 Pzt, 19:12 tarihinde şunu
yazdı:

> > Proposed change modifies socket_putmessage to send any data larger than
> > 8K immediately without copying it into the send buffer. Assuming that
> > the send buffer would be flushed anyway due to reaching its limit, the
> > patch just gets rid of the copy part which seems unnecessary and sends
> > data without waiting.
>
> If there's already some data in PqSendBuffer, I wonder if it would be
> better to fill it up with data, flush it, and then send the rest of the
> data directly. Instead of flushing the partial data first. I'm afraid
> that you'll make a tiny call to secure_write(), followed by a large one,
> then a tine one again, and so forth. Especially when socket_putmessage
> itself writes the msgtype and len, which are tiny, before the payload.
>

I agree that I could do better there without flushing twice for both
PqSendBuffer and
input data. PqSendBuffer always has some data, even if it's tiny, since
msgtype and len are added.


> Perhaps we should invent a new pq_putmessage() function that would take
> an input buffer with 5 bytes of space reserved before the payload.
> pq_putmessage() could then fill in the msgtype and len bytes in the
> input buffer and send that directly. (Not wedded to that particular API,
> but something that would have the same effect)
>

I thought about doing this. The reason why I didn't was because I think
that such a change would require adjusting all input buffers wherever
pq_putmessage is called, and I did not want to touch that many different
places. These places where we need pq_putmessage might not be that many
though, I'm not sure.


>
> > This change affects places where pq_putmessage is used such as
> > pg_basebackup, COPY TO, walsender etc.
> >
> > I did some experiments to see how the patch performs.
> > Firstly, I loaded ~5GB data into a table [1], then ran "COPY test TO
> > STDOUT". Here are perf results of both the patch and HEAD > ...
> > The patch brings a ~5% gain in socket_putmessage.
> >
> > [1]
> > CREATE TABLE test(id int, name text, time TIMESTAMP);
> > INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 100)
> > AS name, NOW() AS time FROM generate_series(1, 1) AS i;
>
> I'm surprised by these results, because each row in that table is < 600
> bytes. PqSendBufferSize is 8kB, so the optimization shouldn't kick in in
> that test. Am I missing something?
>

You're absolutely right. I made a silly mistake there. I also think that
the way I did perf analysis does not make much sense, even if one row of
the table is greater than 8kB.
Here are some quick timing results after being sure that it triggers this
patch's optimization. I need to think more on how to profile this with
perf. I hope to share proper results soon.

I just added a bit more zeros [1] and ran [2] (hopefully measured the
correct thing)

HEAD:
real2m48,938s
user0m9,226s
sys 1m35,342s

Patch:
real2m40,690s
user0m8,492s
sys 1m31,001s

[1]
 INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 1)
AS name, NOW() AS time FROM generate_series(1, 100) AS i;

[2]
 rm /tmp/dummy && echo 3 | sudo tee /proc/sys/vm/drop_caches && time psql
-d postgres -c "COPY test TO STDOUT;" > /tmp/dummy

Thanks,
-- 
Melih Mutlu
Microsoft

Re: Flushing large data immediately in pqcomm

2024-01-29 Thread Robert Haas

On Mon, Jan 29, 2024 at 11:12 AM Heikki Linnakangas  wrote:
> Agreed, that's silly.

+1.

> If there's already some data in PqSendBuffer, I wonder if it would be
> better to fill it up with data, flush it, and then send the rest of the
> data directly. Instead of flushing the partial data first. I'm afraid
> that you'll make a tiny call to secure_write(), followed by a large one,
> then a tine one again, and so forth. Especially when socket_putmessage
> itself writes the msgtype and len, which are tiny, before the payload.
>
> Perhaps we should invent a new pq_putmessage() function that would take
> an input buffer with 5 bytes of space reserved before the payload.
> pq_putmessage() could then fill in the msgtype and len bytes in the
> input buffer and send that directly. (Not wedded to that particular API,
> but something that would have the same effect)

I share the concern; I'm not sure about the best solution. I wonder if
it would be useful to have pq_putmessagev() in the style of writev()
et al. Or maybe what we need is secure_writev().

I also wonder if the threshold for sending data directly should be
smaller than the buffer size, and/or whether it should depend on the
buffer being empty. If we have an 8kB buffer that currently has
nothing in it, and somebody writes 2kB, I suspect it might be wrong to
copy that into the buffer. If the same buffer had 5kB used and 3kB
free, copying sounds a lot more likely to work out. The goal here is
probably to condense sequences of short messages into a single
transmission while sending long messages individually. I'm just not
quite sure what heuristic would do that most effectively.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-01-29 Thread Heikki Linnakangas


On 20/11/2023 14:21, Melih Mutlu wrote:

Hi hackers

I've been looking into ways to reduce the overhead we're having in 
pqcomm and I'd like to propose a small patch to modify how 
socket_putmessage works.


Currently socket_putmessage copies any input data into the pqcomm send 
buffer (PqSendBuffer) and the size of this buffer is 8K. When the send 
buffer gets full, it's flushed and we continue to copy more data into 
the send buffer until we have no data left to be sent.
Since the send buffer is flushed whenever it's full, I think we are safe 
to say that if the size of input data is larger than the buffer size, 
which is 8K, then the buffer will be flushed at least once (or more, 
depends on the input size) to store and all the input data.


Agreed, that's silly.

Proposed change modifies socket_putmessage to send any data larger than 
8K immediately without copying it into the send buffer. Assuming that 
the send buffer would be flushed anyway due to reaching its limit, the 
patch just gets rid of the copy part which seems unnecessary and sends 
data without waiting.


If there's already some data in PqSendBuffer, I wonder if it would be 
better to fill it up with data, flush it, and then send the rest of the 
data directly. Instead of flushing the partial data first. I'm afraid 
that you'll make a tiny call to secure_write(), followed by a large one, 
then a tine one again, and so forth. Especially when socket_putmessage 
itself writes the msgtype and len, which are tiny, before the payload.


Perhaps we should invent a new pq_putmessage() function that would take 
an input buffer with 5 bytes of space reserved before the payload. 
pq_putmessage() could then fill in the msgtype and len bytes in the 
input buffer and send that directly. (Not wedded to that particular API, 
but something that would have the same effect)


This change affects places where pq_putmessage is used such as 
pg_basebackup, COPY TO, walsender etc.


I did some experiments to see how the patch performs.
Firstly, I loaded ~5GB data into a table [1], then ran "COPY test TO 
STDOUT". Here are perf results of both the patch and HEAD > ...

The patch brings a ~5% gain in socket_putmessage.

[1]
CREATE TABLE test(id int, name text, time TIMESTAMP);
INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 100) 
AS name, NOW() AS time FROM generate_series(1, 1) AS i;


I'm surprised by these results, because each row in that table is < 600 
bytes. PqSendBufferSize is 8kB, so the optimization shouldn't kick in in 
that test. Am I missing something?


--
Heikki Linnakangas
Neon (https://neon.tech)

52 matches

Mail list logo