Re: benchmark results comparing versions 15.2 and 16

2023-07-07 Thread Alexander Lakhin

Hello David,

11.05.2023 16:00, Alexander Lakhin wrote:

Yeah, I see. It's also interesting to me, which tests perform better after
that commit. It takes several hours to run all tests, so I can't present
results quickly, but I'll try to collect this information next week.


To my regret, I could not find such tests that week, so I decided to try
later, after reviewing my benchmark running infrastructure.

But for now, as Postgres Pro company graciously shared the benchmarking
infrastructure project [1], it's possible for anyone to confirm or deny my
results. (You can also see which tests were performed and how.)

Having done one more round of comprehensive testing, I still couldn't find
winning tests for commit 3c6fc5820 (but reassured that
test s64da_tpcds.query87 loses a little (and also s64da_tpcds.query38)).
So it seems to me that the tests I performed or their parameters is not
representative enough for that improvement, unfortunately.

[1] https://github.com/postgrespro/pg-mark

Best regards,
Alexander




Re: benchmark results comparing versions 15.2 and 16

2023-05-31 Thread Peter Geoghegan
Hi Mark,

On Tue, May 30, 2023 at 1:03 PM MARK CALLAGHAN  wrote:
> Do you want me to try PG 16 without ICU or PG 15 with ICU?
> I can do that, but it will take a few days before the server is available.

Sorry for the late reply. Most of the Postgres developers (myself
included) are attending pgCon right now.

It would be nice to ascertain just how much of a boost we're getting
from our use of ICU as far as sysbench goes. I'd appreciate having
that information. We discussed the choice of ICU as default collation
provider at yesterday's developer meeting:

https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Meeting#High_level_thoughts_and_feedback_on_moving_toward_ICU_as_the_preferred_collation_provider
https://wiki.postgresql.org/wiki/StateOfICU

Just confirming my theory about abbreviated keys (without rerunning
the benchmark) should be simple - perhaps that would be a useful place
to start. You could just rerun the two individual queries of interest
from an interactive psql session. There are even low-level debug
messages available through the trace_sort GUC. From a psql session
you'd run something along the lines of "set client_min_messages=log;
set trace_sort=on; $QUERY". You'll see lots of LOG messages with
specific information about the use of abbreviated keys and the
progress of each sort.

Thanks
-- 
Peter Geoghegan




Re: benchmark results comparing versions 15.2 and 16

2023-05-30 Thread MARK CALLAGHAN
Do you want me to try PG 16 without ICU or PG 15 with ICU?
I can do that, but it will take a few days before the server is available.

On Mon, May 29, 2023 at 9:55 AM Peter Geoghegan  wrote:

> On Sun, May 28, 2023 at 2:42 PM David Rowley  wrote:
> > c6e0fe1f2 might have helped improve some of that performance, but I
> > suspect there must be something else as ~3x seems much more than I'd
> > expect from reducing the memory overheads.  Testing versions before
> > and after that commit might give a better indication.
>
> I'm virtually certain that this is due to the change in default
> collation provider, from libc to ICU. Mostly due to the fact that ICU
> is capable of using abbreviated keys, and  the system libc isn't
> (unless you go out of your way to define TRUST_STRXFRM when building
> Postgres).
>
> Many individual test cases involving larger non-C collation text sorts
> showed similar improvements back when I worked on this. Offhand, I
> believe that 3x - 3.5x improvements in execution times were common
> with high entropy abbreviated keys on high cardinality input columns
> at that time (this was with glibc). Low cardinality inputs were more
> like 2.5x.
>
> I believe that ICU is faster than glibc in general -- even with
> TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the
> most important factor here, by far.
>
> --
> Peter Geoghegan
>


-- 
Mark Callaghan
mdcal...@gmail.com


Re: benchmark results comparing versions 15.2 and 16

2023-05-29 Thread Peter Geoghegan
On Sun, May 28, 2023 at 2:42 PM David Rowley  wrote:
> c6e0fe1f2 might have helped improve some of that performance, but I
> suspect there must be something else as ~3x seems much more than I'd
> expect from reducing the memory overheads.  Testing versions before
> and after that commit might give a better indication.

I'm virtually certain that this is due to the change in default
collation provider, from libc to ICU. Mostly due to the fact that ICU
is capable of using abbreviated keys, and  the system libc isn't
(unless you go out of your way to define TRUST_STRXFRM when building
Postgres).

Many individual test cases involving larger non-C collation text sorts
showed similar improvements back when I worked on this. Offhand, I
believe that 3x - 3.5x improvements in execution times were common
with high entropy abbreviated keys on high cardinality input columns
at that time (this was with glibc). Low cardinality inputs were more
like 2.5x.

I believe that ICU is faster than glibc in general -- even with
TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the
most important factor here, by far.

-- 
Peter Geoghegan




Re: benchmark results comparing versions 15.2 and 16

2023-05-28 Thread David Rowley
On Tue, 23 May 2023 at 07:40, MARK CALLAGHAN  wrote:

(pg15)

> --- Q2.10k : explain analyze SELECT c FROM sbtest1 WHERE id BETWEEN 1000 
> AND 1001 order by c;
>   QUERY PLAN
> ---
>  Sort  (cost=1114.85..1137.28 rows=8971 width=121) (actual 
> time=36.561..37.429 rows=10001 loops=1)
>Sort Key: c
>Sort Method: quicksort  Memory: 2025kB
>->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.44..525.86 rows=8971 
> width=121) (actual time=0.022..3.776 rows=10001 loops=1)
>  Index Cond: ((id >= 1000) AND (id <= 1001))
>  Planning Time: 0.059 ms
>  Execution Time: 38.224 ms

(pg16 b1)

>QUERY PLAN
> 
>  Sort  (cost=1259.19..1284.36 rows=10068 width=121) (actual 
> time=14.419..15.042 rows=10001 loops=1)
>Sort Key: c
>Sort Method: quicksort  Memory: 1713kB
>->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.44..589.80 
> rows=10068 width=121) (actual time=0.023..3.473 rows=10001 loops=1)
>  Index Cond: ((id >= 1000) AND (id <= 1001))
>  Planning Time: 0.049 ms
>  Execution Time: 15.700 ms

It looks like the improvements here are due to qsort being faster on
v16.  To get an idea of the time taken to perform the actual qsort,
you can't use the first row time vs last row time in the Sort node, as
we must (obviously) have performed the sort before outputting the
first row.  I think you could get a decent idea of the time taken to
perform the qsort by subtracting the actual time for the final Index
scan row from the actual time for the Sort's first row.  That's 36.561
- 3.776 = 32.785 ms for pg15's plan, and 14.419 - 3.473 = 10.946 ms
pg16 b1's

c6e0fe1f2 might have helped improve some of that performance, but I
suspect there must be something else as ~3x seems much more than I'd
expect from reducing the memory overheads.  Testing versions before
and after that commit might give a better indication.

David




Re: benchmark results comparing versions 15.2 and 16

2023-05-27 Thread MARK CALLAGHAN
Results for 16 beta1 are similar to what I shared above:
* no regressions
* a few queries are >= 1.5 times faster which make the read-only
transaction >= 1.5 times faster

http://smalldatum.blogspot.com/2023/05/postgres-16beta1-looks-good-vs-sysbench.html

-- 
Mark Callaghan
mdcal...@gmail.com


Re: benchmark results comparing versions 15.2 and 16

2023-05-22 Thread MARK CALLAGHAN
I ran sysbench on Postgres 15.2, 15.3 and 16 prebeta at git sha 1c006c0
(built on May19).
The workload was in-memory on a small server (8 cores, 16G RAM) and the
workload had 1 connection (no concurrency).
For some details on past benchmarks like this see:
http://smalldatum.blogspot.com/2023/03/searching-for-performance-regressions.html

My focus is on changes >= 10%, so a value <= 0.90 or >= 1.10.
I used 3 builds of Postgres that I call def, o2_nofp, o3_nofp and ran the
benchmark once per build. The results for each build are similar
and I only share the o2_nofp results there.

Good news, that I have not fully explained ...

One of the microbenchmarks gets ~1.5X more transactions/second (TPS) in PG
16 prebeta vs 15.2 and 15.3 for a read-only transaction that does:
* 2 select statements that scans 10k rows from an index (these are Q1 and
Q3 below and are slower in PG 16)
* 2 select statements that scans 10k rows from an index and does
aggregation (these are Q2 and Q4 below are are a lot faster in PG 16)

The speedup for Q2 and Q4 is larger than the slowdown for Q1/Q3 so TPS is
~1.5X more for PG 16.
Query plans don't appear to have changed. I assume some code got slower and
some got faster for the same plan.

The microbenchmarks are read-only_range=1 and read-only.pre_range=1
show.
Each of these microbenchmarks run a read-only transaction with 4 SQL
statements. The statements are here:
https://github.com/mdcallag/mytools/blob/master/bench/sysbench.lua/lua/oltp_common.lua#LL301C1-L312C21

read-only.pre_range runs before a large number of writes, so the b-tree
will be more read-friendly.
read-only.range runs after a large number of writes.

The =1 means that each SQL statement processes 1 rows. Note that
the microbenchmarks are also run for =100 and =10
and for those the perf with PG16 is similar to 15.x rather than ~1.5X
faster.

---

This table shows throughput relative to the base case. The base case is PG
15.2 with the o2_nofp build.
Throughput relative < 1.0 means perf regressed, > 1.0 means perf improved

col-1 : PG 15.3 with the o2_nofp build
col-2 : PG 16 prebeta build on May 19 at git sha 1c006c0 with the o2_nofp
build

col-1   col-2
0.991.03hot-points_range=100
1.021.05point-query.pre_range=100
1.061.10point-query_range=100
0.971.01points-covered-pk.pre_range=100
0.981.02points-covered-pk_range=100
0.981.01points-covered-si.pre_range=100
1.001.00points-covered-si_range=100
1.001.01points-notcovered-pk.pre_range=100
1.001.01points-notcovered-pk_range=100
1.011.03points-notcovered-si.pre_range=100
1.011.01points-notcovered-si_range=100
1.000.99random-points.pre_range=1000
1.001.02random-points.pre_range=100
1.011.01random-points.pre_range=10
1.011.00random-points_range=1000
1.011.01random-points_range=100
1.021.01random-points_range=10
1.001.00range-covered-pk.pre_range=100
1.001.00range-covered-pk_range=100
1.000.99range-covered-si.pre_range=100
1.000.99range-covered-si_range=100
1.031.01range-notcovered-pk.pre_range=100
1.021.00range-notcovered-pk_range=100
1.011.01range-notcovered-si.pre_range=100
1.011.01range-notcovered-si_range=100
1.041.54read-only.pre_range=1   <<
1.001.00read-only.pre_range=100
1.011.01read-only.pre_range=10
1.031.45read-only_range=1<<
1.011.01read-only_range=100
1.041.00read-only_range=10
1.000.99scan_range=100
1.001.02delete_range=100
1.011.02insert_range=100
1.011.00read-write_range=100
1.010.98read-write_range=10
1.011.01update-index_range=100
1.001.00update-inlist_range=100
1.021.02update-nonindex_range=100
1.031.03update-one_range=100
1.021.02update-zipf_range=100
1.031.03write-only_range=1

---

The read-only transaction has 4 SQL statements. I ran explain analyze for
each of them assuming the range scan fetches 10k rows and then 100k rows.
The 10k result is similar to what was done above, then I added the 100k
result to see if the perf difference changes with more rows.

In each case there are two "Execution Time" entries. The top one is from PG
15.2 and the bottom from PG 16 prebeta

Summary:
* Queries that do a sort show the largest improvement in PG 16 (Q2, Q4)
* Queries that don't do a sort are slower in PG 16 (Q1, Q3)

Q1.10k: explain analyze SELECT c FROM sbtest1 WHERE id BETWEEN 1000 AND
1001;

 Execution Time: 4.222 ms

 Execution Time: 6.243 ms


Q1.100k: explain analyze SELECT c FROM sbtest1 WHERE id BETWEEN 1000
AND 1010;

 Execution Time: 36.508 ms

 Execution Time: 49.344 ms


Q2.10k: explain analyze SELECT c FROM sbtest1 WHERE id BETWEEN 1000 AND
1001 order by c;

 Execution Time: 38.224 ms

 Execution Time: 

Re: benchmark results comparing versions 15.2 and 16

2023-05-20 Thread MARK CALLAGHAN
On Fri, May 19, 2023 at 4:04 PM Andres Freund  wrote:

> With "yet to see any significant changes" do you mean that the runs are
> comparable with earlier runs, showing the same regression? Or that the
> regression vanished? Or ...?
>

I mean that I might be chasing noise and the mean+stddev for throughput in
version 16 pre-beta so far appears to be similar to 15.2. When I ran the
insert benchmark a few times, I focused on the cases where 16 pre-beta was
worse than 15.2 while ignoring the cases where it was better. Big
regressions are easy to document, small ones not so much.

Regardless, I am repeating tests from both the insert benchmark and
sysbench for version 16 (pre-beta, and soon beta1).


Re: benchmark results comparing versions 15.2 and 16

2023-05-19 Thread Andres Freund
Hi,

On 2023-05-19 15:44:09 -0700, MARK CALLAGHAN wrote:
> On Tue, May 9, 2023 at 10:36 AM MARK CALLAGHAN  wrote:
> 
> >
> >
> > On Fri, May 5, 2023 at 10:01 PM MARK CALLAGHAN  wrote:
> >
> >> I have two more runs of the benchmark in progress so we will have 3
> >> results for each of the test cases to confirm that the small regressions
> >> are repeatable.
> >>
> >
> I repeated the benchmark a few times using a more recent PG16 build (git
> sha 08c45ae2) and have yet to see any significant changes. So that is good
> news. My testing scripts have been improved so I should be able to finish
> the next round of tests in less time.

With "yet to see any significant changes" do you mean that the runs are
comparable with earlier runs, showing the same regression? Or that the
regression vanished? Or ...?

Greetings,

Andres Freund




Re: benchmark results comparing versions 15.2 and 16

2023-05-19 Thread MARK CALLAGHAN
On Tue, May 9, 2023 at 10:36 AM MARK CALLAGHAN  wrote:

>
>
> On Fri, May 5, 2023 at 10:01 PM MARK CALLAGHAN  wrote:
>
>> I have two more runs of the benchmark in progress so we will have 3
>> results for each of the test cases to confirm that the small regressions
>> are repeatable.
>>
>
I repeated the benchmark a few times using a more recent PG16 build (git
sha 08c45ae2) and have yet to see any significant changes. So that is good
news. My testing scripts have been improved so I should be able to finish
the next round of tests in less time.

--
Mark Callaghan
mdcal...@gmail.com


Re: benchmark results comparing versions 15.2 and 16

2023-05-17 Thread Alexander Lakhin

17.05.2023 04:25, Michael Paquier wrote:

The numbers between f193883fc~1 and HEAD+patch are close to each
other.  It does not seem to make the whole difference with 15.3, but
most of it.  The difference can also be explained with some noise,
based on the number patterns of the third runs?

I have now applied the revert, ready for beta1.  Thanks for the
feedback!



Thank you for paying attention to it!

Yes, I ran the benchmark on my workstation, so numbers could vary due parallel
activity. Now I've compared 15.3 (8382864eb) with d8c3106bb and 1d369c9e9,
this time with the CPU boost mode disabled:
1d369c9e9:
10007.130326, 10047.722122, 9920.612426, 10016.053846, 10060.606408
d8c3106bb:
10492.100485, 10505.326827, 10535.918137, 10625.904871, 10573.608859
15.3:
10458.752330, 10308.677192, 10366.496526, 10489.395275, 10319.458041

Best "1d369c9e9" worse than "15.3" by 4.1 percents (10060.61 < 10489.40)
Average "1d369c9e9" worse than "15.3" by 3.6 percents (10010.43 < 10388.56)

Best "d8c3106bb" better than "15.3" by 1.3 percents (10625.90 > 10489.40)
Average "d8c3106bb" better than "15.3" by 1.5 percents (10546.57 > 10388.56)

So it seems that there is nothing left on this plate.

Best regards,
Alexander

Re: benchmark results comparing versions 15.2 and 16

2023-05-16 Thread Michael Paquier
On Tue, May 16, 2023 at 06:00:00PM +0300, Alexander Lakhin wrote:
> I can confirm that the patches improve (restore) performance for my test:
> pgbench -i benchdb
> pgbench -c 10 -T 300 benchdb

Thanks for running these!

> tps (over three runs):
> HEAD (08c45ae23):
> 10238.441580, 10697.202119, 10706.764703
> 
> HEAD with the patches:
> 11134.510118, 11176.554996, 11150.338488
> 
> f193883fc~1 (240e0dbac)
> 11082.561388, 11233.604446, 11087.071768
> 
> 15.3 (8382864eb)
> 11328.699555, 11128.057883, 11057.934392

The numbers between f193883fc~1 and HEAD+patch are close to each
other.  It does not seem to make the whole difference with 15.3, but
most of it.  The difference can also be explained with some noise,
based on the number patterns of the third runs?

I have now applied the revert, ready for beta1.  Thanks for the
feedback!
--
Michael


signature.asc
Description: PGP signature


Re: benchmark results comparing versions 15.2 and 16

2023-05-16 Thread Alexander Lakhin

15.05.2023 08:20, Michael Paquier wrote:

I am not going to let that hang in the air with beta1 getting released
next week, though, so attached are two patches to revert the change
(these would be combined, just posted this way for clarity).


I can confirm that the patches improve (restore) performance for my test:
pgbench -i benchdb
pgbench -c 10 -T 300 benchdb

tps (over three runs):
HEAD (08c45ae23):
10238.441580, 10697.202119, 10706.764703

HEAD with the patches:
11134.510118, 11176.554996, 11150.338488

f193883fc~1 (240e0dbac)
11082.561388, 11233.604446, 11087.071768

15.3 (8382864eb)
11328.699555, 11128.057883, 11057.934392

Best regards,
Alexander




Re: benchmark results comparing versions 15.2 and 16

2023-05-15 Thread Michael Paquier
On Mon, May 15, 2023 at 05:54:53PM -0700, Andres Freund wrote:
> Yes. numactl --physcpubind ... in my case.  Linux has an optimization where it
> does not need to send an IPI when the client and server are scheduled on the
> same core. For single threaded ping-pong tasks like pgbench -c1, that can make
> a huge difference, particularly on larger CPUs.  So you get a lot better
> performance when forcing things to be colocated.

Yes, that's not bringing the numbers higher with the simple cases I
reported previously, either.

Anyway, even if I cannot see such a high difference, I don't see how
to bring back the original numbers you are reporting without doing
more inlining and tying COERCE_SQL_SYNTAX more tightly within the
executor's portions for the FuncExprs, and there are the collation
assumptions as well.  Perhaps that's not the correct thing to do with
SQLValueFunction remaining around, but nothing can be done for v16, so
I am planning to just revert the change before beta1, and look at it
again later, from scratch.
--
Michael


signature.asc
Description: PGP signature


Re: benchmark results comparing versions 15.2 and 16

2023-05-15 Thread Andres Freund
Hi,

On 2023-05-16 09:42:31 +0900, Michael Paquier wrote:
> > I get quite variable performance if I don't pin client / server to the same
> > core, but even the slow performance is faster than 45k.
> 
> Okay.  You mean with something like taskset or similar, I guess?

Yes. numactl --physcpubind ... in my case.  Linux has an optimization where it
does not need to send an IPI when the client and server are scheduled on the
same core. For single threaded ping-pong tasks like pgbench -c1, that can make
a huge difference, particularly on larger CPUs.  So you get a lot better
performance when forcing things to be colocated.

Greetings,

Andres Freund




Re: benchmark results comparing versions 15.2 and 16

2023-05-15 Thread Michael Paquier
On Mon, May 15, 2023 at 05:14:47PM -0700, Andres Freund wrote:
> On 2023-05-15 14:20:24 +0900, Michael Paquier wrote:
>> On Thu, May 11, 2023 at 01:28:40PM +0900, Michael Paquier wrote:
>> Extreme is adapted for a worst-case scenario.  Looking at my notes
>> from a few months back, that's kind of what I did on my laptop, which
>> was the only machine I had at hand back then:
>> - Compilation of code with -O2.
> 
> I assume without assertions as well?

Yup, no assertions.

> 45k seems too low for a modern machine, given that I get > 80k in such a
> workload, on a workstation with server CPUs (i.e. many cores, but not that
> fast individually).  Hence wondering about assertions being enabled...

Nope, disabled.

> I get quite variable performance if I don't pin client / server to the same
> core, but even the slow performance is faster than 45k.

Okay.  You mean with something like taskset or similar, I guess?
--
Michael


signature.asc
Description: PGP signature


Re: benchmark results comparing versions 15.2 and 16

2023-05-15 Thread Andres Freund
Hi,

On 2023-05-15 14:20:24 +0900, Michael Paquier wrote:
> On Thu, May 11, 2023 at 01:28:40PM +0900, Michael Paquier wrote:
> > On Tue, May 09, 2023 at 09:48:24AM -0700, Andres Freund wrote:
> >> On 2023-05-08 12:11:17 -0700, Andres Freund wrote:
> >>> I can reproduce a significant regression due to f193883fc of a workload 
> >>> just
> >>> running
> >>> SELECT CURRENT_TIMESTAMP;
> >>> 
> >>> A single session running it on my workstation via pgbench -Mprepared gets
> >>> before:
> >>> tps = 89359.128359 (without initial connection time)
> >>> after:
> >>> tps = 83843.585152 (without initial connection time)
> >>> 
> >>> Obviously this is an extreme workload, but that nevertheless seems too 
> >>> large
> >>> to just accept...
> 
> Extreme is adapted for a worst-case scenario.  Looking at my notes
> from a few months back, that's kind of what I did on my laptop, which
> was the only machine I had at hand back then:
> - Compilation of code with -O2.

I assume without assertions as well?


> I have re-run a bit more pgbench (1 client, prepared query with a
> single SELECT on a SQL keyword, etc.).  And, TBH, I am not seeing as
> much difference as you do (nothing with default pgbench setup, FWIW),
> still that's able to show a bit more difference than the other two
> cases.

> HEAD shows me an average output close to 43900 TPS (3 run of
> 60s each, for instance), while relying on SQLValueFunction shows an
> average of 45000TPS.  That counts for ~2.4% output regression here
> on bigbox based on these numbers.  Not a regression as high as
> mentioned above, still that's visible.

45k seems too low for a modern machine, given that I get > 80k in such a
workload, on a workstation with server CPUs (i.e. many cores, but not that
fast individually).  Hence wondering about assertions being enabled...

I get quite variable performance if I don't pin client / server to the same
core, but even the slow performance is faster than 45k.

Greetings,

Andres Freund




Re: benchmark results comparing versions 15.2 and 16

2023-05-14 Thread Michael Paquier
On Thu, May 11, 2023 at 01:28:40PM +0900, Michael Paquier wrote:
> On Tue, May 09, 2023 at 09:48:24AM -0700, Andres Freund wrote:
>> On 2023-05-08 12:11:17 -0700, Andres Freund wrote:
>>> I can reproduce a significant regression due to f193883fc of a workload just
>>> running
>>> SELECT CURRENT_TIMESTAMP;
>>> 
>>> A single session running it on my workstation via pgbench -Mprepared gets
>>> before:
>>> tps = 89359.128359 (without initial connection time)
>>> after:
>>> tps = 83843.585152 (without initial connection time)
>>> 
>>> Obviously this is an extreme workload, but that nevertheless seems too large
>>> to just accept...

Extreme is adapted for a worst-case scenario.  Looking at my notes
from a few months back, that's kind of what I did on my laptop, which
was the only machine I had at hand back then:
- Compilation of code with -O2.
- Prepared statement of a simple SELECT combined with a DO block
running executed the query in a loop a few million times on a single
backend:
PREPARE test AS SELECT CURRENT_TIMESTAMP;
DO $$
BEGIN
  FOR i IN 1..1000
  LOOP
EXECUTE 'EXECUTE test';
  END LOOP;
END $$;
- The second test is mentioned at [1], with a generate_series() on a
keyword.
- And actually I recall some pgbench runs similar to that..  But I
don't have any traces of that in my notes.

This was not showing much difference, and it does not now, either.
Funny thing is that the pre-patch period was showing signs of being a
bit slower in this environment.  Anyway, I have just tested the DO
case in a second "bigbox" environment that I have set up for
benchmarking a few days ago, and the DO test is showing me a 1%~1.5%
regression in runtime.  That's repeatable so that's not noise.

I have re-run a bit more pgbench (1 client, prepared query with a
single SELECT on a SQL keyword, etc.).  And, TBH, I am not seeing as
much difference as you do (nothing with default pgbench setup, FWIW),
still that's able to show a bit more difference than the other two
cases.  HEAD shows me an average output close to 43900 TPS (3 run of
60s each, for instance), while relying on SQLValueFunction shows an
average of 45000TPS.  That counts for ~2.4% output regression here
on bigbox based on these numbers.  Not a regression as high as
mentioned above, still that's visible.

>> Added an open item for this.
> 
> Thanks for the report, I'll come back to it and look at it at the
> beginning of next week.  In the worst case, that would mean a revert
> of this refactoring, I assume.

So, this involves commits 7aa81c6, f193883 and fb32748.  7aa81c6 has
added some tests, so I would let the tests it added be on HEAD as the
precision was not tested for the SQL keywords this has added cover
for.

fb32748 has removed the dependency of SQLValueFunction on collations
by making the name functions use COERCE_SQL_SYNTAX.  One case where
these could be heavily used is RLS, for example, so that could be
visible.  f193883 has removed the rest and the timestamp keywords.

I am not going to let that hang in the air with beta1 getting released
next week, though, so attached are two patches to revert the change
(these would be combined, just posted this way for clarity).  The only
conflict I could see is caused by the query jumbling where a
SQLValueFunction needs "typmod" and "op" in the computation, ignoring
the rest, so the node definition primnodes.h gains the required
node_attr.  Making the execution path cheaper while avoiding the 
collation tweaks for SQLValueFunction would be nice, but not at this
time of the year on this branch.

[1]: https://www.postgresql.org/message-id/Y0+dHDYA46UnnLs/@paquier.xyz
--
Michael
From 29d4b8b47ee521b573cf091b3c5794616e5a6963 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 11 May 2023 14:34:17 +0900
Subject: [PATCH v1 1/2] Revert "Replace SQLValueFunction by COERCE_SQL_SYNTAX"

This reverts commit f193883fc9cebe8fa20359b0797832837a788112.
---
 src/include/catalog/catversion.h  |  2 +-
 src/include/catalog/pg_proc.dat   | 17 -
 src/include/executor/execExpr.h   |  8 ++
 src/include/nodes/primnodes.h | 37 +
 src/include/utils/date.h  |  4 +
 src/include/utils/timestamp.h |  4 +
 src/backend/catalog/system_functions.sql  | 26 ---
 src/backend/executor/execExpr.c   | 11 +++
 src/backend/executor/execExprInterp.c | 46 
 src/backend/jit/llvm/llvmjit_expr.c   |  6 ++
 src/backend/jit/llvm/llvmjit_types.c  |  1 +
 src/backend/nodes/nodeFuncs.c | 27 ++-
 src/backend/optimizer/path/costsize.c |  1 +
 src/backend/optimizer/util/clauses.c  | 39 --
 src/backend/parser/gram.y | 59 ++-
 src/backend/parser/parse_expr.c   | 52 +
 src/backend/parser/parse_target.c | 25 ++
 src/backend/utils/adt/date.c  | 75 --
 src/backend/utils/adt/ruleutils.c | 92 +++
 

Re: benchmark results comparing versions 15.2 and 16

2023-05-11 Thread Alexander Lakhin

11.05.2023 01:27, David Rowley wrote:

On Thu, 11 May 2023 at 01:00, Alexander Lakhin  wrote:

This time `git bisect` pointed at 3c6fc5820. Having compared execution plans
(both attached), I see the following differences (3c6fc5820~1 vs 3c6fc5820):

Based on what you've sent, I'm uninspired to want to try to do
anything about it.  The patched version finds a plan that's cheaper.
The row estimates are miles off with both plans.


I've made sure that s64da-benchmark performs analyze before running the
queries (pg_class.reltuples fields for tables in question contain actual
counts), so it seems that nothing can be done on the benchmark side to
improve those estimates.


... It's pretty hard to make changes to the
planner's path generation without risking that a bad plan is chosen
when it wasn't beforehand with bad row estimates.


Yeah, I see. It's also interesting to me, which tests perform better after
that commit. It takes several hours to run all tests, so I can't present
results quickly, but I'll try to collect this information next week.


Is the new plan still slower if you increase work_mem so that the sort
no longer goes to disk?  Maybe the planner would have picked Hash
Aggregate if the row estimates had been such that cost_tuplesort()
knew that the sort would have gone to disk.


Yes, increasing work_mem to 50MB doesn't affect the plans (new plans
attached), though the sort method changed to quicksort. The former plan is
still executed slightly faster.

Best regards,
Alexander






 QUERY PLAN 








 Aggregate  (cost=254862.89..254862.90 rows=1 width=8) (actual 
time=2567.064..2581.619 rows=1 loops=1)
   Output: count(*)
   ->  Subquery Scan on cool_cust  (cost=147645.17..254820.81 rows=16834 
width=0) (actual time=2551.091..2578.005 rows=93140 loops=1)
 Output: cool_cust.c_last_name, cool_cust.c_first_name, cool_cust.d_date
 ->  HashSetOp Except  (cost=147645.17..254652.47 rows=16834 width=144) 
(actual time=2551.090..2569.217 rows=93140 loops=1)
   Output: "*SELECT* 1".c_last_name, "*SELECT* 1".c_first_name, 
"*SELECT* 1".d_date, (0)
   ->  Append  (cost=147645.17..254481.44 rows=22804 width=144) 
(actual time=2162.417..2542.403 rows=117004 loops=1)
 ->  Result  (cost=147645.17..202935.12 rows=16834 
width=144) (actual time=2162.416..2200.221 rows=93267 loops=1)
   Output: "*SELECT* 1".c_last_name, "*SELECT* 
1".c_first_name, "*SELECT* 1".d_date, 0
   ->  HashSetOp Except  (cost=147645.17..202766.78 
rows=16834 width=144) (actual time=2162.414..2190.284 rows=93267 loops=1)
  

Re: benchmark results comparing versions 15.2 and 16

2023-05-10 Thread Michael Paquier
On Tue, May 09, 2023 at 09:48:24AM -0700, Andres Freund wrote:
> On 2023-05-08 12:11:17 -0700, Andres Freund wrote:
>> I can reproduce a significant regression due to f193883fc of a workload just
>> running
>> SELECT CURRENT_TIMESTAMP;
>> 
>> A single session running it on my workstation via pgbench -Mprepared gets
>> before:
>> tps = 89359.128359 (without initial connection time)
>> after:
>> tps = 83843.585152 (without initial connection time)
>> 
>> Obviously this is an extreme workload, but that nevertheless seems too large
>> to just accept...
> 
> Added an open item for this.

Thanks for the report, I'll come back to it and look at it at the
beginning of next week.  In the worst case, that would mean a revert
of this refactoring, I assume.
--
Michael


signature.asc
Description: PGP signature


Re: benchmark results comparing versions 15.2 and 16

2023-05-10 Thread David Rowley
On Thu, 11 May 2023 at 01:00, Alexander Lakhin  wrote:
> This time `git bisect` pointed at 3c6fc5820. Having compared execution plans
> (both attached), I see the following differences (3c6fc5820~1 vs 3c6fc5820):

Based on what you've sent, I'm uninspired to want to try to do
anything about it.  The patched version finds a plan that's cheaper.
The row estimates are miles off with both plans. I'm not sure what
we're supposed to do here. It's pretty hard to make changes to the
planner's path generation without risking that a bad plan is chosen
when it wasn't beforehand with bad row estimates.

Is the new plan still slower if you increase work_mem so that the sort
no longer goes to disk?  Maybe the planner would have picked Hash
Aggregate if the row estimates had been such that cost_tuplesort()
knew that the sort would have gone to disk.

David




Re: benchmark results comparing versions 15.2 and 16

2023-05-10 Thread Alexander Lakhin

08.05.2023 16:00, Alexander Lakhin wrote:

... Having compared 15.3 (56e869a09) with master
(58f5edf84) I haven't seen significant regressions except a few minor ones.
First regression observed with a simple pgbench test:


Another noticeable, but not critical performance degradation is revealed by
query 87 from TPC-DS (I use s64da-benchmark):
https://github.com/swarm64/s64da-benchmark-toolkit/blob/master/benchmarks/tpcds/queries/queries_10/87.sql

With `prepare_benchmark --scale-factor=2`, `run_benchmark --scale-factor=10`
I get on master:
2023-05-10 09:27:52,888 INFO    : finished 80/103: query 87 of stream  0: 2.26s 
OK
but on REL_15_STABLE:
2023-05-10 08:13:40,648 INFO    : finished 80/103: query 87 of stream  0: 1.94s 
OK

This time `git bisect` pointed at 3c6fc5820. Having compared execution plans
(both attached), I see the following differences (3c6fc5820~1 vs 3c6fc5820):
->  Subquery Scan on "*SELECT* 1"  (cost=149622.00..149958.68 rows=16834 width=21) (actual time=1018.606..1074.468 
rows=93891 loops=1)

 ->  Unique  (cost=149622.00..149790.34 rows=16834 width=17) (actual 
time=1018.604..1064.790 rows=93891 loops=1)
  ->  Sort  (cost=149622.00..149664.09 rows=16834 width=17) (actual 
time=1018.603..1052.591 rows=94199 loops=1)
   ->  Gather  (cost=146588.59..148440.33 rows=16834 width=17) (actual 
time=880.899..913.978 rows=94199 loops=1)
vs
->  Subquery Scan on "*SELECT* 1"  (cost=147576.79..149829.53 rows=16091 width=21) (actual time=1126.489..1366.751 
rows=93891 loops=1)

 ->  Unique  (cost=147576.79..149668.62 rows=16091 width=17) (actual 
time=1126.487..1356.938 rows=93891 loops=1)
  ->  Gather Merge  (cost=147576.79..149547.94 rows=16091 width=17) (actual 
time=1126.487..1345.253 rows=94204 loops=1)
   ->  Unique  (cost=146576.78..146737.69 rows=16091 width=17) (actual 
time=1124.426..1306.532 rows=47102 loops=2)
    ->  Sort  (cost=146576.78..146617.01 rows=16091 width=17) (actual 
time=1124.424..1245.110 rows=533434 loops=2)

->  Subquery Scan on "*SELECT* 2"  (cost=52259.82..52428.16 rows=8417 width=21) (actual time=653.640..676.879 rows=62744 
loops=1)

 ->  Unique  (cost=52259.82..52343.99 rows=8417 width=17) (actual 
time=653.639..670.405 rows=62744 loops=1)
  ->  Sort  (cost=52259.82..52280.86 rows=8417 width=17) (actual 
time=653.637..662.428 rows=62744 loops=1)
   ->  Gather  (cost=50785.20..51711.07 rows=8417 width=17) (actual 
time=562.158..571.737 rows=62744 loops=1)
    ->  HashAggregate  (cost=49785.20..49869.37 rows=8417 width=17) (actual 
time=538.263..544.336 rows=31372 loops=2)
 ->  Nested Loop  (cost=0.85..49722.07 rows=8417 width=17) (actual 
time=2.049..469.747 rows=284349 loops=2)
vs
->  Subquery Scan on "*SELECT* 2"  (cost=48503.68..49630.12 rows=8046 width=21) (actual time=700.050..828.388 rows=62744 
loops=1)

 ->  Unique  (cost=48503.68..49549.66 rows=8046 width=17) (actual 
time=700.047..821.836 rows=62744 loops=1)
  ->  Gather Merge  (cost=48503.68..49489.31 rows=8046 width=17) (actual 
time=700.047..814.136 rows=62744 loops=1)
   ->  Unique  (cost=47503.67..47584.13 rows=8046 width=17) (actual 
time=666.348..763.403 rows=31372 loops=2)
    ->  Sort  (cost=47503.67..47523.78 rows=8046 width=17) (actual 
time=666.347..730.336 rows=284349 loops=2)
 ->  Nested Loop  (cost=0.85..46981.72 rows=8046 width=17) (actual 
time=1.852..454.111 rows=284349 loops=2)

->  Subquery Scan on "*SELECT* 3"  (cost=50608.83..51568.70 rows=7165 width=21) (actual time=302.571..405.305 rows=23737 
loops=1)

 ->  Unique  (cost=50608.83..51497.05 rows=7165 width=17) (actual 
time=302.568..402.818 rows=23737 loops=1)
  ->  Gather Merge  (cost=50608.83..51443.31 rows=7165 width=17) (actual 
time=302.567..372.246 rows=287761 loops=1)
   ->  Sort  (cost=49608.81..49616.27 rows=2985 width=17) (actual 
time=298.204..310.075 rows=95920 loops=3)
    ->  Nested Loop  (cost=2570.65..49436.52 rows=2985 width=17) (actual 
time=3.205..229.192 rows=95920 loops=3)
vs
->  Subquery Scan on "*SELECT* 3"  (cost=50541.84..51329.11 rows=5708 width=21) (actual time=302.042..336.820 rows=23737 
loops=1)

 ->  Unique  (cost=50541.84..51272.03 rows=5708 width=17) (actual 
time=302.039..334.329 rows=23737 loops=1)
  ->  Gather Merge  (cost=50541.84..51229.22 rows=5708 width=17) (actual 
time=302.039..331.296 rows=24128 loops=1)
   ->  Unique  (cost=49541.81..49570.35 rows=2854 width=17) (actual 
time=298.771..320.560 rows=8043 loops=3)
    ->  Sort  (cost=49541.81..49548.95 rows=2854 width=17) (actual 
time=298.770..309.603 rows=95920 loops=3)
 ->  Nested Loop  (cost=2570.52..49378.01 rows=2854 width=17) (actual 
time=3.209..230.291 rows=95920 loops=3)

From the commit message and the discussion [1], it's not clear to me, whether
this plan change is expected. Perhaps it's too minor issue to bring attention
to, but maybe this information could be useful for v16 performance analysis.

[1] 
https://postgr.es/m/CAApHDvo8Lz2H=42urbbfp65ltceuoh288mt7dsg2_ewtw1a...@mail.gmail.com

Best regards,
Alexander 

Re: benchmark results comparing versions 15.2 and 16

2023-05-09 Thread MARK CALLAGHAN
On Fri, May 5, 2023 at 10:01 PM MARK CALLAGHAN  wrote:

> I have two more runs of the benchmark in progress so we will have 3
> results for each of the test cases to confirm that the small regressions
> are repeatable.
>

They get similar results. Then I tried Linux perf but the hierarchical call
stacks, to be used for Flamegraph, have too many "[unknown]" entries.
I was using: ./configure --prefix=$pfx --enable-debug CFLAGS="-O3
-march=native -mtune=native -flto" LDFLAGS="-flto" > o.cf.$x 2> e.cf.$x
Adding -no-omit-frame-pointer fixes the problem, so I am repeating the
benchmark with that to confirm there are still regressions and then I will
get flamegraphs.

-- 
Mark Callaghan
mdcal...@gmail.com


Re: benchmark results comparing versions 15.2 and 16

2023-05-09 Thread Andres Freund
Hi,

On 2023-05-08 12:11:17 -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-05-08 16:00:01 +0300, Alexander Lakhin wrote:
> > This difference is confirmed by multiple test runs. `git bisect` for this
> > regression pointed at f193883fc.
> 
> I can reproduce a significant regression due to f193883fc of a workload just
> running
> SELECT CURRENT_TIMESTAMP;
> 
> A single session running it on my workstation via pgbench -Mprepared gets
> before:
> tps = 89359.128359 (without initial connection time)
> after:
> tps = 83843.585152 (without initial connection time)
> 
> Obviously this is an extreme workload, but that nevertheless seems too large
> to just accept...

Added an open item for this.

Greetings,

Andres Freund




Re: benchmark results comparing versions 15.2 and 16

2023-05-08 Thread Andres Freund
Hi,

On 2023-05-08 16:00:01 +0300, Alexander Lakhin wrote:
> This difference is confirmed by multiple test runs. `git bisect` for this
> regression pointed at f193883fc.

I can reproduce a significant regression due to f193883fc of a workload just
running
SELECT CURRENT_TIMESTAMP;

A single session running it on my workstation via pgbench -Mprepared gets
before:
tps = 89359.128359 (without initial connection time)
after:
tps = 83843.585152 (without initial connection time)

Obviously this is an extreme workload, but that nevertheless seems too large
to just accept...


Michael, the commit message notes that there were no measured performance
regression - yet I see one in a trivial test. What were you measuring?


I'm a bit surprised by the magnitude of the regression, but it's not
surprising that there is a performance effect. You're replacing something that
doesn't go through the whole generic function rigamarole, and replace it with
something that does...

Looking at two perf profiles, the biggest noticable difference is

Before:

-5.51% 0.13%  postgres  postgres  [.] ExecInitResult
   - 5.38% ExecInitResult
  + 2.29% ExecInitResultTupleSlotTL
  - 2.22% ExecAssignProjectionInfo
 - 2.19% ExecBuildProjectionInfo
  0.47% ExecReadyInterpretedExpr
- 0.43% ExecInitExprRec
   - 0.10% palloc
AllocSetAlloc.localalias (inlined)
+ 0.32% expression_tree_walker_impl.localalias (inlined)
+ 0.28% get_typlen
  0.09% ExecPushExprSlots
+ 0.06% MemoryContextAllocZeroAligned
+ 0.04% MemoryContextAllocZeroAligned
  0.02% exprType.localalias (inlined)
  + 0.41% ExecAssignExprContext
  + 0.35% MemoryContextAllocZeroAligned
0.11% ExecInitQual.localalias (inlined)
   + 0.11% _start
   + 0.02% 0x55b89c764d7f

After:

-6.57% 0.17%  postgres  postgres  [.] ExecInitResult
   - 6.40% ExecInitResult
  - 3.00% ExecAssignProjectionInfo
 - ExecBuildProjectionInfo
- 0.91% ExecInitExprRec
   - 0.65% ExecInitFunc
0.23% fmgr_info_cxt_security
  + 0.18% palloc0
  + 0.07% object_aclcheck
0.04% fmgr_info
 0.05% check_stack_depth
   + 0.05% palloc
+ 0.58% expression_tree_walker_impl.localalias (inlined)
+ 0.55% get_typlen
  0.37% ExecReadyInterpretedExpr
+ 0.11% MemoryContextAllocZeroAligned
  0.09% ExecPushExprSlots
  0.04% exprType.localalias (inlined)
  + 2.77% ExecInitResultTupleSlotTL
  + 0.50% ExecAssignExprContext
  + 0.09% MemoryContextAllocZeroAligned
0.05% ExecInitQual.localalias (inlined)
   + 0.10% _start

I.e. we spend more time building the expression state for expression
evaluation, because we now go through the generic ExecInitFunc(), instead of
something dedicated. We also now need to do permission checking etc.


I don't think that's the entirety of the regression...

Greetings,

Andres Freund




Re: benchmark results comparing versions 15.2 and 16

2023-05-08 Thread Alexander Lakhin

Hello Mark,

05.05.2023 20:45, MARK CALLAGHAN wrote:
This is mostly a hobby project for me - my other hobby is removing invasive weeds. I am happy to answer questions and 
run more tests, but turnaround for answers won't be instant. Getting results from Linux perf for these tests is on my 
TODO list. For now I am just re-running a subset of these to get more certainty that the regressions are real and not 
noise.




It's a very interesting topic to me, too. I had developed some scripts to
measure and compare postgres`s performance using miscellaneous public
benchmarks (ycsb, tpcds, benchmarksql_tpcc, htapbench, benchbase, gdprbench,
s64da-benchmark, ...). Having compared 15.3 (56e869a09) with master
(58f5edf84) I haven't seen significant regressions except a few minor ones.
First regression observed with a simple pgbench test:
pgbench -i benchdb
pgbench -c 10 -T 300 benchdb
(with default compilation options and fsync = off)

On master I get:
tps = 10349.826645 (without initial connection time)
On 15.3:
tps = 11296.064992 (without initial connection time)

This difference is confirmed by multiple test runs. `git bisect` for this
regression pointed at f193883fc.

Best regards,
Alexander




Re: benchmark results comparing versions 15.2 and 16

2023-05-05 Thread MARK CALLAGHAN
I have two more runs of the benchmark in progress so we will have 3 results
for each of the test cases to confirm that the small regressions are
repeatable.

On Fri, May 5, 2023 at 1:34 PM Andres Freund  wrote:


> One thing that's somewhat odd is that there's very marked changes in l.i0's
> p99 latency for the four clients cases - but whether 15 or 16 are better
> differs between the runs.
>

>From the response time sections for l.i0 the [1ms, 4ms) bucket has 99% or
more for all 6 cases.
For example,
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.1u.1tno.cached/all.html#l.i0.rt
But yes, the p99 is as you state. I will wade through my test scripts
tomorrow to see how the p99 is computed.

I do wonder if there's something getting scheduled in some of these runs
> increasing latency?
>

Do you mean interference from other processes? Both the big and small
servers run Ubuntu 22.04 server (no X) and there shouldn't be many extra
things, although Google Cloud adds a few extra things that run in the
background.


> Or what we're seeing depends on the time between the start of the server
> and
> the start of the benchmark? It is interesting that the per-second
> throughput
> graph shows a lot of up/down at the end:
>
> https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.4u.1tyes.1g/tput.l.i0.html
>
> Both 15 and 16 have one very high result at 70s, 15 then has one low, but
> 16
> has two low results.
>

Immediately prior to l.i0 the database directory is wiped and then Postgres
is initialized and started.

The IPS vs time graphs are more interesting for benchmark steps that run
longer. Alas, this can't run too long if the resulting database is to fit
in <= 16G. But that is a problem for another day. The IPS vs time graphs
are not a flat line, but I am not ready to pursue that as problem unless it
shows multi-second write-stalls (fortunately it does not).

-- 
Mark Callaghan
mdcal...@gmail.com


Re: benchmark results comparing versions 15.2 and 16

2023-05-05 Thread Andres Freund
Hi,

On 2023-05-05 10:45:12 -0700, MARK CALLAGHAN wrote:
> This is mostly a hobby project for me - my other hobby is removing
> invasive weeds.

Hah :)


> Summary of the results:
> * from r1 - insert-heavy (l.i0, l.i1) and create indexes (l.x) steps are
> ~2% slower in PG 16
> * from r2 - create index (l.x) step is ~4% slower in PG 16
> * from r3 - regressions are similar to r1
> * from r4, r5 and r6 - regressions are mostly worse than r1, r2, r3. Note
> r4, r5, r6 are the same workload as r1, r2, r3 except the database is
> cached by PG for r1, r2, r3 so the r4, r5 and r6 benchmarks will do much
> more copying from the OS page cache into the Postgres buffer pool.

One thing that's somewhat odd is that there's very marked changes in l.i0's
p99 latency for the four clients cases - but whether 15 or 16 are better
differs between the runs.

r2)
p99
20m.pg152_o3_native_lto.cx7 300
20m.pg16prebeta.cx7 23683

r3)
p99
20m.pg152_o3_native_lto.cx7 70245
20m.pg16prebeta.cx7 8191

r5)
p99
20m.pg152_o3_native_lto.cx7 11188
20m.pg16prebeta.cx7 72720

r6)
p99
20m.pg152_o3_native_lto.cx7 1898
20m.pg16prebeta.cx7 31666


I do wonder if there's something getting scheduled in some of these runs
increasing latency?

Or what we're seeing depends on the time between the start of the server and
the start of the benchmark? It is interesting that the per-second throughput
graph shows a lot of up/down at the end:
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.4u.1tyes.1g/tput.l.i0.html

Both 15 and 16 have one very high result at 70s, 15 then has one low, but 16
has two low results.

Greetings,

Andres Freund




benchmark results comparing versions 15.2 and 16

2023-05-05 Thread MARK CALLAGHAN
Let me know if results like this shouldn't be posted here.

This is mostly a hobby project for me - my other hobby is removing
invasive weeds. I am happy to answer questions and run more tests, but
turnaround for answers won't be instant. Getting results from Linux perf
for these tests is on my TODO list. For now I am just re-running a subset
of these to get more certainty that the regressions are real and not noise.

These are results for the insert benchmark on a small server comparing
performance for versions 15.2 and 16. For version 16 I built from source at
git sha 1ab763fc.
https://github.com/postgres/postgres/commit/1ab763fc22adc88e5d779817e7b42b25a9dd7c9e

Late in the version 15 beta release cycle this benchmark found a
significant regression. I don't see anything significant this time, but
there are potential small regressions.

More detail on how I run the benchmark and the HW is here, the server is
small -- Beelink with 8 cores, 16G RAM and 1T NVMe SSD.
http://smalldatum.blogspot.com/2023/05/the-insert-benchmark-postgres-versions.html

Performance reports are linked below. But first, disclaimers:
* the goal is to determine whether there are CPU improvements or
regressions. To make that easier to spot I disable fsync on commit.
* my scripts compute CPU/operation where operation is a SQL statement.
However, CPU in this case is measured by vmstat and includes CPU from the
benchmark client and Postgres server
* the regressions here are small, usually less than 5% which means it can
be hard to distinguish between normal variance and a regression but the
results are repeatable
* the links below are to the Summary section which includes throughput
(absolute and relative). The relative throughput is the (throughput for PG
16 / throughput for PG 15.2) and
* I used the same compiler options for the builds of 15.2 and 16

Summary of the results:
* from r1 - insert-heavy (l.i0, l.i1) and create indexes (l.x) steps are
~2% slower in PG 16
* from r2 - create index (l.x) step is ~4% slower in PG 16
* from r3 - regressions are similar to r1
* from r4, r5 and r6 - regressions are mostly worse than r1, r2, r3. Note
r4, r5, r6 are the same workload as r1, r2, r3 except the database is
cached by PG for r1, r2, r3 so the r4, r5 and r6 benchmarks will do much
more copying from the OS page cache into the Postgres buffer pool.

I will repeat r1, r2, r4 and r5 but with the tests run in a different order
to confirm this isn't just noise.

Database cached by Postgres:
r1) 1 table, 1 client -
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.1u.1tno.cached/all.html#summary
r2) 4 tables, 4 clients -
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.4u.1tno.cached/all.html#summary
r3) 1 table, 4 clients -
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.4u.1tyes.cached/all.html#summary

Database cached by OS but not by Postgres:
r4) 1 table, 1 client -
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.1u.1tno.1g/all.html#summary
r5) 4 tables, 4 clients -
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.4u.1tno.1g/all.html#summary
r6) 1 table, 4 clients -
https://mdcallag.github.io/reports/23_05_04_ibench.beelink.pg16b.4u.1tyes.1g/all.html#summary


-- 
Mark Callaghan
mdcal...@gmail.com