Re: track_planning causing performance regression

2021-07-07 Thread Julien Rouhaud
On Wed, Jul 7, 2021 at 8:57 PM Fujii Masao  wrote:
>
> Pushed. Thanks!

Thanks!




Re: track_planning causing performance regression

2021-07-07 Thread Fujii Masao




On 2021/07/07 18:09, Julien Rouhaud wrote:

On Thu, Jul 1, 2021 at 4:28 PM Fujii Masao  wrote:


I'm fine with this. So what about the following diff? I added  tag.

 pg_stat_statements.track_planning controls whether
 planning operations and duration are tracked by the module.
 Enabling this parameter may incur a noticeable performance penalty,
-  especially when a fewer kinds of queries are executed on many
-  concurrent connections.
+  especially when statements with identical query structure are executed
+  by many concurrent connections which compete to update a small number of
+  pg_stat_statements entries.
 The default value is off.
 Only superusers can change this setting.


It seems perfect, thanks!


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2021-07-07 Thread Julien Rouhaud
On Thu, Jul 1, 2021 at 4:28 PM Fujii Masao  wrote:
>
> I'm fine with this. So what about the following diff? I added  
> tag.
>
> pg_stat_statements.track_planning controls whether
> planning operations and duration are tracked by the module.
> Enabling this parameter may incur a noticeable performance penalty,
> -  especially when a fewer kinds of queries are executed on many
> -  concurrent connections.
> +  especially when statements with identical query structure are executed
> +  by many concurrent connections which compete to update a small number 
> of
> +  pg_stat_statements entries.
> The default value is off.
> Only superusers can change this setting.

It seems perfect, thanks!




Re: track_planning causing performance regression

2021-07-01 Thread Fujii Masao




On 2021/06/30 0:12, Julien Rouhaud wrote:

Enabling this parameter may incur a noticeable performance penalty,
-  especially when a fewer kinds of queries are executed on many
-  concurrent connections.
+  especially when queries with identical structure are executed by many
+  concurrent connections which compete to update a small number of
+  pg_stat_statements entries.

It could say "identical structure" or "the same queryid" or "identical queryid".


I think we should try to reuse the previous formulation.  How about
"statements with identical query structure"?


I'm fine with this. So what about the following diff? I added  tag.

   pg_stat_statements.track_planning controls whether
   planning operations and duration are tracked by the module.
   Enabling this parameter may incur a noticeable performance penalty,
-  especially when a fewer kinds of queries are executed on many
-  concurrent connections.
+  especially when statements with identical query structure are executed
+  by many concurrent connections which compete to update a small number of
+  pg_stat_statements entries.
   The default value is off.
   Only superusers can change this setting.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2021-06-29 Thread Julien Rouhaud
On Tue, Jun 29, 2021 at 10:45 AM Justin Pryzby  wrote:
>
> We borrowed that language from the previous text:
>
> | Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are 
> combined into a single pg_stat_statements entry whenever they have identical 
> query structures according to an internal hash calculation

Yes, but here's it's "identical query structure", which seems less
ambiguous than "identical structure" as iI think one could think it
refer to internal representation as much as as the query text.  And
it's also removing any doubt with the final "internal hash
calculation".

> Really, I'm only trying to fix where it currently says "a fewer kinds".

I agree that "fewer kinds" should be improved.

>Enabling this parameter may incur a noticeable performance penalty,
> -  especially when a fewer kinds of queries are executed on many
> -  concurrent connections.
> +  especially when queries with identical structure are executed by many
> +  concurrent connections which compete to update a small number of
> +  pg_stat_statements entries.
>
> It could say "identical structure" or "the same queryid" or "identical 
> queryid".

I think we should try to reuse the previous formulation.  How about
"statements with identical query structure"?  Or replace query
structure with "internal representation", in both places?




Re: track_planning causing performance regression

2021-06-28 Thread Justin Pryzby
On Tue, Jun 29, 2021 at 10:29:43AM +0800, Julien Rouhaud wrote:
> On Tue, Jun 29, 2021 at 10:09 AM Justin Pryzby  wrote:
> 
> Is "identical structure" really accurate here?  For instance a multi
> tenant application could rely on the search_path and only use
> unqualified relation name.  So while they have queries with identical
> structure, those will generate a large number of different query_id.

We borrowed that language from the previous text:

| Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are combined 
into a single pg_stat_statements entry whenever they have identical query 
structures according to an internal hash calculation

Note that it continues to say:
|In some cases, queries with visibly different texts might get merged into a 
single pg_stat_statements entry. Normally this will happen only for 
semantically equivalent queries, but there is a small chance of hash collisions 
causing unrelated queries to be merged into one entry. (This cannot happen for 
queries belonging to different users or databases, however.)
|
|Since the queryid hash value is computed on the post-parse-analysis 
representation of the queries, the opposite is also possible: queries with 
identical texts might appear as separate entries, if they have different 
meanings as a result of factors such as different search_path settings.

Really, I'm only trying to fix where it currently says "a fewer kinds".

It looks like I'd sent the wrong diff (git diff with a previous patch applied).

I think this is the latest proposal:

   Enabling this parameter may incur a noticeable performance penalty,
-  especially when a fewer kinds of queries are executed on many
-  concurrent connections.
+  especially when queries with identical structure are executed by many

 
+  concurrent connections which compete to update a small number of 

 
+  pg_stat_statements entries.  

 

It could say "identical structure" or "the same queryid" or "identical queryid".

-- 
Justin




Re: track_planning causing performance regression

2021-06-28 Thread Julien Rouhaud
On Tue, Jun 29, 2021 at 10:09 AM Justin Pryzby  wrote:
>
> Checking back - here's the latest patch.
>
> diff --git a/doc/src/sgml/pgstatstatements.sgml 
> b/doc/src/sgml/pgstatstatements.sgml
> index 930081c429..9e98472c5c 100644
> --- a/doc/src/sgml/pgstatstatements.sgml
> +++ b/doc/src/sgml/pgstatstatements.sgml
> @@ -696,8 +696,9 @@
>pg_stat_statements.track_planning controls whether
>planning operations and duration are tracked by the module.
>Enabling this parameter may incur a noticeable performance penalty,
> -  especially when queries with the same queryid are executed on many
> -  concurrent connections.
> +  especially when queries with identical structure are executed by many
> +  concurrent connections which compete to update a small number of
> +  pg_stat_statements entries.
>The default value is off.
>Only superusers can change this setting.
>   

Is "identical structure" really accurate here?  For instance a multi
tenant application could rely on the search_path and only use
unqualified relation name.  So while they have queries with identical
structure, those will generate a large number of different query_id.




Re: track_planning causing performance regression

2021-06-28 Thread Justin Pryzby
On Wed, Apr 21, 2021 at 10:40:07AM -0500, Justin Pryzby wrote:
> On Thu, Apr 22, 2021 at 12:13:17AM +0900, Fujii Masao wrote:
> > On 2021/04/21 23:53, Justin Pryzby wrote:
> > > Or:
> > > 
> > > Enabling this parameter may incur a noticeable performance 
> > > penalty,
> > > especially similar queries are executed by many concurrent 
> > > connections
> > > and compete to update a small number of pg_stat_statements 
> > > entries.
> > 
> > I prefer this. But what about using "identical" instead of "similar"
> > because pg_stat_statements docs already uses "identical" in some places?
> 
> I also missed "when", again...
> 
> > > Enabling this parameter may incur a noticeable performance 
> > > penalty,
> > > especially when queries with identical structure are executed by 
> > > many concurrent connections
> > > which compete to update a small number of pg_stat_statements 
> > > entries.

Checking back - here's the latest patch.

diff --git a/doc/src/sgml/pgstatstatements.sgml 
b/doc/src/sgml/pgstatstatements.sgml
index 930081c429..9e98472c5c 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -696,8 +696,9 @@
   pg_stat_statements.track_planning controls whether
   planning operations and duration are tracked by the module.
   Enabling this parameter may incur a noticeable performance penalty,
-  especially when queries with the same queryid are executed on many
-  concurrent connections.
+  especially when queries with identical structure are executed by many
+  concurrent connections which compete to update a small number of
+  pg_stat_statements entries.
   The default value is off.
   Only superusers can change this setting.
  




Re: track_planning causing performance regression

2021-04-21 Thread Justin Pryzby
On Thu, Apr 22, 2021 at 12:13:17AM +0900, Fujii Masao wrote:
> On 2021/04/21 23:53, Justin Pryzby wrote:
> > Or:
> > 
> > Enabling this parameter may incur a noticeable performance penalty,
> > especially similar queries are executed by many concurrent 
> > connections
> > and compete to update a small number of pg_stat_statements entries.
> 
> I prefer this. But what about using "identical" instead of "similar"
> because pg_stat_statements docs already uses "identical" in some places?

I also missed "when", again...

> > Enabling this parameter may incur a noticeable performance penalty,
> > especially when queries with identical structure are executed by 
> > many concurrent connections
> > which compete to update a small number of pg_stat_statements 
> > entries.




Re: track_planning causing performance regression

2021-04-21 Thread Fujii Masao




On 2021/04/21 23:53, Justin Pryzby wrote:

Or:

Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are executed by many concurrent connections
and compete to update a small number of pg_stat_statements entries.


I prefer this. But what about using "identical" instead of "similar"
because pg_stat_statements docs already uses "identical" in some places?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2021-04-21 Thread Justin Pryzby
On Wed, Apr 21, 2021 at 11:38:52PM +0900, Fujii Masao wrote:
> On 2021/04/19 23:55, Justin Pryzby wrote:
> > What does "kind" mean ?  I think it means a "normalized" query or a "query
> > structure".
> > 
> > "a fewer kinds" is wrong, so I think the docs should say "a small number of
> > queries" or maybe:
> 
> Okay, I agree to update the description.
> 
> > > > >   Enabling this parameter may incur a noticeable performance 
> > > > > penalty,
> > > > >   especially similar queries are run by many concurrent 
> > > > > connections and
> > > > >   compete to update the same pg_stat_statements entry
> 
> "a small number of" is better than "similar" at the above because
> "similar" sounds a bit unclear in this case?
> 
> It's better to use "entries" rather than "entry" at the above?

How about like this?

   Enabling this parameter may incur a noticeable performance penalty,
-  especially when a fewer kinds of queries are executed on many
+  especially when queries with the same queryid are executed by many
   concurrent connections.

Or:

   Enabling this parameter may incur a noticeable performance penalty,
   especially similar queries are executed by many concurrent connections
   and compete to update a small number of pg_stat_statements entries.

-- 
Justin




Re: track_planning causing performance regression

2021-04-21 Thread Fujii Masao




On 2021/04/19 23:55, Justin Pryzby wrote:

What does "kind" mean ?  I think it means a "normalized" query or a "query
structure".

"a fewer kinds" is wrong, so I think the docs should say "a small number of
queries" or maybe:


Okay, I agree to update the description.


  Enabling this parameter may incur a noticeable performance penalty,
  especially similar queries are run by many concurrent connections and
  compete to update the same pg_stat_statements entry


"a small number of" is better than "similar" at the above because
"similar" sounds a bit unclear in this case?

It's better to use "entries" rather than "entry" at the above?

Regards,
 


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2021-04-19 Thread Justin Pryzby
On Mon, Apr 19, 2021 at 11:44:05PM +0900, Fujii Masao wrote:
> On 2021/04/19 8:36, Justin Pryzby wrote:
> > Reviewing this change which was committed last year as
> > 321fa6a4a26c9b649a0fbec9fc8b019f19e62289
> > 
> > On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:
> > > On 2020/07/03 13:05, Pavel Stehule wrote:
> > > > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao 
> > > >  napsal:
> > > > 
> > > > Maybe there can be documented so enabling this option can have a 
> > > > negative impact on performance.
> > > 
> > > Yes. What about adding either of the followings into the doc?
> > > 
> > >  Enabling this parameter may incur a noticeable performance penalty.
> > > 
> > > or
> > > 
> > >  Enabling this parameter may incur a noticeable performance penalty,
> > >  especially when a fewer kinds of queries are executed on many
> > >  concurrent connections.
> > 
> > Something seems is wrong with this sentence, and I'm not sure what it's 
> > trying
> > to say.  Is this right ?
> 
> pg_stat_statements users different spinlock for each kind of query.
> So fewer kinds of queries many sessions execute, fewer spinlocks
> they try to acquire. This may lead to spinlock contention and
> significant performance degration. This is what the statement is
> trying to say.

What does "kind" mean ?  I think it means a "normalized" query or a "query
structure".

"a fewer kinds" is wrong, so I think the docs should say "a small number of
queries" or maybe:

> > >  Enabling this parameter may incur a noticeable performance penalty,
> > >  especially similar queries are run by many concurrent connections and
> > >  compete to update the same pg_stat_statements entry

-- 
Justin




Re: track_planning causing performance regression

2021-04-19 Thread Fujii Masao




On 2021/04/19 8:36, Justin Pryzby wrote:

Reviewing this change which was committed last year as
321fa6a4a26c9b649a0fbec9fc8b019f19e62289

On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:

On 2020/07/03 13:05, Pavel Stehule wrote:

pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao  
napsal:

Maybe there can be documented so enabling this option can have a negative 
impact on performance.


Yes. What about adding either of the followings into the doc?

 Enabling this parameter may incur a noticeable performance penalty.

or

 Enabling this parameter may incur a noticeable performance penalty,
 especially when a fewer kinds of queries are executed on many
 concurrent connections.


Something seems is wrong with this sentence, and I'm not sure what it's trying
to say.  Is this right ?


pg_stat_statements users different spinlock for each kind of query.
So fewer kinds of queries many sessions execute, fewer spinlocks
they try to acquire. This may lead to spinlock contention and
significant performance degration. This is what the statement is
trying to say.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2021-04-18 Thread Justin Pryzby
Reviewing this change which was committed last year as
321fa6a4a26c9b649a0fbec9fc8b019f19e62289

On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:
> On 2020/07/03 13:05, Pavel Stehule wrote:
> > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao  
> > napsal:
> > 
> > Maybe there can be documented so enabling this option can have a negative 
> > impact on performance.
> 
> Yes. What about adding either of the followings into the doc?
> 
> Enabling this parameter may incur a noticeable performance penalty.
> 
> or
> 
> Enabling this parameter may incur a noticeable performance penalty,
> especially when a fewer kinds of queries are executed on many
> concurrent connections.

Something seems is wrong with this sentence, and I'm not sure what it's trying
to say.  Is this right ?

> Enabling this parameter may incur a noticeable performance penalty,
> especially when a small number of queries are executed on many
> concurrent connections.

-- 
Justin




Re: track_planning causing performance regression

2020-09-30 Thread Michael Paquier
On Fri, Sep 11, 2020 at 03:32:54PM -0700, Andres Freund wrote:
> The piece about a single shared lwlocks is/was about protecting the set
> of entries that are currently in-memory - which can't easily be
> implemented just using atomics (at least without the risk of increasing
> the counters of an entry since replaced with another query).

This discussion has stalled, and the patch proposed is incorrect, so I
have marked it as RwF in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: track_planning causing performance regression

2020-09-17 Thread Tom Lane
"Tharakan, Robins"  writes:
>> During fully-cached SELECT-only test using pgbench, Postgres v13Beta1
>> shows ~45% performance drop [2] at high DB connection counts (when
>> compared with v12.3)

> It'd be great if we could also give credit to "Sean Massey" for this find.

I poked through my Postgres inbox, and couldn't find any previous
mail from or mentioning a Sean Massey.

While there's not much of a formal policy around this, we usually limit
ourselves to crediting people who have directly interacted with the PG
community.  I'm aware that there's more than a few cases where someone
talks to the community on behalf of a company-internal team ... but
since we have little visibility into such situations, whoever's doing
the talking gets all the credit.

Given that background, it seems like crediting Sean here would be
slightly unfair special treatment.  I'd encourage him to join the
mailing lists and be part of the community directly --- then we'll
definitely know what he's contributed.

regards, tom lane




Re: track_planning causing performance regression

2020-09-11 Thread Andres Freund
Hi,

On 2020-09-11 19:10:05 -0300, Alvaro Herrera wrote:
> Andres suggested in [1] to use atomics for the counters together with a
> single lwlock to be used in shared mode only.  I didn't quite understand
> what the lwlock is *for*, but maybe you do.
> 
> [1] https://postgr.es/m/20200629231015.qlej5b3qpfe4u...@alap3.anarazel.de

Just to be clear - I am saying that in the first iteration I would just
straight up replace the spinlock with an lwlock, i.e. having many
lwlocks.

The piece about a single shared lwlocks is/was about protecting the set
of entries that are currently in-memory - which can't easily be
implemented just using atomics (at least without the risk of increasing
the counters of an entry since replaced with another query).

Greetings,

Andres Freund




Re: track_planning causing performance regression

2020-09-11 Thread Alvaro Herrera
On 2020-Sep-11, Fujii Masao wrote:

> Ok, so my proposed patch degrated the performance in this case :(
> This means that replacing spinlock with lwlock in pgss is not proper
> approach for the lock contention issue on pgss...
> 
> I proposed to split the spinlock for each pgss entry into two
> to reduce the lock contention, upthread. One is for planner stats,
> and the other is for executor stats. Is it worth working on
> this approach as an alternative idea? Or does anyone have any better idea?

It does seem that the excl-locked section in pgss_store is rather large.
(I admit I don't understand why would a LWLock decrease performance.)

Andres suggested in [1] to use atomics for the counters together with a
single lwlock to be used in shared mode only.  I didn't quite understand
what the lwlock is *for*, but maybe you do.

[1] https://postgr.es/m/20200629231015.qlej5b3qpfe4u...@alap3.anarazel.de

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: track_planning causing performance regression

2020-09-11 Thread Julien Rouhaud
On Fri, Sep 11, 2020 at 4:04 PM Fujii Masao  wrote:
>
> On 2020/09/11 16:23, bttanakahbk wrote:
> >
> > pgbench:
> >   initialization: pgbench -i -s 100
> >   benchmarking  : pgbench -j16 -c128 -T180 -r -n -f 

Re: track_planning causing performance regression

2020-09-11 Thread Fujii Masao




On 2020/09/11 16:23, bttanakahbk wrote:

Hi,

On 2020-08-19 00:43, Fujii Masao wrote:

Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with lwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would cause
this confusing thing... Anyway, thanks for your comment!


To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.


I tested pgss_lwlock_v2.patch with 3 workloads. And I couldn't observe 
performance
improvement in our environment and I'm afraid to say that even worser in some 
case.
  - Workload1: pgbench select-only mode
  - Workload2: pgbench custom scripts which run "SELECT 1;"
  - Workload3: pgbench custom scripts which run 1000 types of different simple 
queries


Thanks for running the benchmarks!




- Workload1
First we set the pg_stat_statements.track_planning to on/off and run the 
fully-cached pgbench
select-only mode on pg14head which is installed in on-premises server(32CPU, 
256GB mem).
However in this enveronment we couldn't reproduce 45% performance drop due to 
s_lock conflict
(Tharakan-san mentioned in his post on 
2895b53b033c47ccb22972b589050...@ex13d05uwc001.ant.amazon.com).

- Workload2
Then we adopted pgbench custom script "SELECT 1;" which supposed to increase 
the s_lock and
make it easier to reproduce the issue. In this case around 10% of performance 
decrease
which also shows slightly increase in s_lock (~10%). With this senario, despite 
a s_lock
absence, the patch shows more than 50% performance degradation regardless of 
track_planning.
And also we couldn't see performance improvement in this workload.

pgbench:
  initialization: pgbench -i -s 100
  benchmarking  : pgbench -j16 -c128 -T180 -r -n -f 

Re: track_planning causing performance regression

2020-09-11 Thread bttanakahbk

Hi,

On 2020-08-19 00:43, Fujii Masao wrote:

Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with 
lwlocks

in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would 
cause

this confusing thing... Anyway, thanks for your comment!


To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.


I tested pgss_lwlock_v2.patch with 3 workloads. And I couldn't observe 
performance
improvement in our environment and I'm afraid to say that even worser in 
some case.

 - Workload1: pgbench select-only mode
 - Workload2: pgbench custom scripts which run "SELECT 1;"
 - Workload3: pgbench custom scripts which run 1000 types of different 
simple queries


- Workload1
First we set the pg_stat_statements.track_planning to on/off and run the 
fully-cached pgbench
select-only mode on pg14head which is installed in on-premises 
server(32CPU, 256GB mem).
However in this enveronment we couldn't reproduce 45% performance drop 
due to s_lock conflict
(Tharakan-san mentioned in his post on 
2895b53b033c47ccb22972b589050...@ex13d05uwc001.ant.amazon.com).


- Workload2
Then we adopted pgbench custom script "SELECT 1;" which supposed to 
increase the s_lock and
make it easier to reproduce the issue. In this case around 10% of 
performance decrease
which also shows slightly increase in s_lock (~10%). With this senario, 
despite a s_lock
absence, the patch shows more than 50% performance degradation 
regardless of track_planning.

And also we couldn't see performance improvement in this workload.

pgbench:
 initialization: pgbench -i -s 100
 benchmarking  : pgbench -j16 -c128 -T180 -r -n -f 

Re: track_planning causing performance regression

2020-08-19 Thread Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Overall, the patch works fine. However, I have a few observations:

(1) Code Comments:
- The code comments should be added for the 2 new macros, in particular for 
PGSS_NUM_LOCK_PARTITIONS. As you explained in your email, this may be used to 
limit the number of locks if a very large value for pgss_max is specified.
- From the code I inferred that the number of locks can in future be less than 
pgss_max (per your email where in future this macro could be used to limit the 
number of locks). I suggest to perhaps add some notes helping future changes in 
this code area.

(2) It seems like that "pgss->lock = &(pgss->base + pgss_max)->lock;" statement 
should not use pgss_max directly and instead use PGSS_NUM_LOCK_PARTITIONS 
macro, as when a limit is imposed on number of locks, this statement will cause 
an overrun.


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

The new status of this patch is: Waiting on Author


Re: track_planning causing performance regression

2020-08-18 Thread Hamid Akhtar
On Tue, Aug 18, 2020 at 8:43 PM Fujii Masao 
wrote:

>
> > Yes, I pushed the document_overhead_by_track_planning.patch, but this
> > CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with
> lwlocks
> > in pg_stat_statements. The latter patch has not been committed yet.
> > Probably attachding the different patches in the same thread would cause
> > this confusing thing... Anyway, thanks for your comment!
>
> To avoid further confusion, I attached the rebased version of
> the patch that was registered at CF. I'd appreciate it if
> you review this version.
>

Thank you. Reviewing it now.


>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: track_planning causing performance regression

2020-08-18 Thread Fujii Masao



Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with lwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would cause
this confusing thing... Anyway, thanks for your comment!


To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index 6b91c62c31..3035c14ed5 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -39,7 +39,7 @@
  * in an entry except the counters requires the same.  To look up an entry,
  * one must hold the lock shared.  To read or update the counters within
  * an entry, one must hold the lock shared or exclusive (so the entry doesn't
- * disappear!) and also take the entry's mutex spinlock.
+ * disappear!) and also take the entry's partition lock.
  * The shared state variable pgss->extent (the next free spot in the external
  * query-text file) should be accessed only while holding either the
  * pgss->mutex spinlock, or exclusive lock on pgss->lock.  We use the mutex to
@@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM 
/ 100;
 
 #define JUMBLE_SIZE1024/* query serialization 
buffer size */
 
+#definePGSS_NUM_LOCK_PARTITIONS()  (pgss_max)
+#definePGSS_HASH_PARTITION_LOCK(key)   \
+   (&(pgss->base + \
+  (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -207,7 +212,7 @@ typedef struct pgssEntry
Sizequery_offset;   /* query text offset in external file */
int query_len;  /* # of valid bytes in 
query string, or -1 */
int encoding;   /* query text encoding 
*/
-   slock_t mutex;  /* protects the counters only */
+   LWLock  *lock;  /* protects the counters only */
 } pgssEntry;
 
 /*
@@ -216,6 +221,7 @@ typedef struct pgssEntry
 typedef struct pgssSharedState
 {
LWLock *lock;   /* protects hashtable 
search/modification */
+   LWLockPadded*base;  /* base address of this LWLock */
double  cur_median_usage;   /* current median usage in 
hashtable */
Sizemean_query_len; /* current mean entry text length */
slock_t mutex;  /* protects following fields 
only: */
@@ -468,7 +474,8 @@ _PG_init(void)
 * resources in pgss_shmem_startup().
 */
RequestAddinShmemSpace(pgss_memsize());
-   RequestNamedLWLockTranche("pg_stat_statements", 1);
+   RequestNamedLWLockTranche("pg_stat_statements",
+ 
PGSS_NUM_LOCK_PARTITIONS() + 1);
 
/*
 * Install hooks.
@@ -547,7 +554,8 @@ pgss_shmem_startup(void)
if (!found)
{
/* First time through ... */
-   pgss->lock = 
&(GetNamedLWLockTranche("pg_stat_statements"))->lock;
+   pgss->base = GetNamedLWLockTranche("pg_stat_statements");
+   pgss->lock = &(pgss->base + pgss_max)->lock;
pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
pgss->mean_query_len = ASSUMED_LENGTH_INIT;
SpinLockInit(&pgss->mutex);
@@ -1384,14 +1392,14 @@ pgss_store(const char *query, uint64 queryId,
if (!jstate)
{
/*
-* Grab the spinlock while updating the counters (see comment 
about
-* locking rules at the head of the file)
+* Grab the partition lock while updating the counters (see 
comment
+* about locking rules at the head of the file)
 */
volatile pgssEntry *e = (volatile pgssEntry *) entry;
 
Assert(kind == PGSS_PLAN || kind == PGSS_EXEC);
 
-   SpinLockAcquire(&e->mutex);
+   LWLockAcquire(e->lock, LW_EXCLUSIVE);
 
/* "Unstick" entry if it was previously sticky */
if (IS_STICKY(e->counters))
@@ -1443,7 +1451,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.wal_fpi += walusage->wal_fpi;
e->counters.wal_bytes += walusage->wal_bytes;
 
-   SpinLockRelease(&e->mutex);
+   LWLockRelease(e->lock);
}
 
 done:
@@ -1770,9 +1778,9 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo

Re: track_planning causing performance regression

2020-08-17 Thread Fujii Masao



On 2020/08/17 18:34, Hamid Akhtar wrote:



On Mon, Aug 17, 2020 at 2:21 PM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:



On 2020/07/31 21:40, Hamid Akhtar wrote:
 > 
 >
 > On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com> >> wrote:
 >
 >
 >
 >     On 2020/07/04 12:22, Pavel Stehule wrote:
 >      >
 >      >
 >      > pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> > 
       >
 >      >
 >      >
 >      >     On 2020/07/03 16:02, Pavel Stehule wrote:
 >      >      >
 >      >      >
 >      >      > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> 
>  
>>  
>  
 napsal:
 >      >      >
 >      >      >
 >      >      >
 >      >      >     On 2020/07/03 13:05, Pavel Stehule wrote:
 >      >      >      > Hi
 >      >      >      >
 >      >      >      > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> >  >> 
 >   
>  >>  >  
> napsal:
 >      >      >      >
 >      >      >      >
 >      >      >      >
 >      >      >      >     On 2020/07/01 7:37, Peter Geoghegan wrote:
 >      >      >      >      > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com> >  >> 
 >   
>  >>
 >  
> wrote:
 >      >      >      >      >> Ants and Andres suggested to replace the 
spinlock used in pgss_store() with
 >      >      >      >      >> LWLock. I agreed with them and posted the 
POC patch doing that. But I think
 >      >      >      >      >> the patch is an item for v14. The patch may 
address the reported performance
 >      >      >      >      >> issue, but may cause other performance 
issues in other workloads. We would
 >      >      >      >      >> need to measure how the patch affects the 
performance in vario

Re: track_planning causing performance regression

2020-08-17 Thread Hamid Akhtar
On Mon, Aug 17, 2020 at 2:21 PM Fujii Masao 
wrote:

>
>
> On 2020/07/31 21:40, Hamid Akhtar wrote:
> > 
> >
> > On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao  > wrote:
> >
> >
> >
> > On 2020/07/04 12:22, Pavel Stehule wrote:
> >  >
> >  >
> >  > pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >> napsal:
> >  >
> >  >
> >  >
> >  > On 2020/07/03 16:02, Pavel Stehule wrote:
> >  >  >
> >  >  >
> >  >  > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com  napsal:
> >  >  >
> >  >  >
> >  >  >
> >  >  > On 2020/07/03 13:05, Pavel Stehule wrote:
> >  >  >  > Hi
> >  >  >  >
> >  >  >  > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >>
> 
> >
> 
> 
> napsal:
> >  >  >  >
> >  >  >  >
> >  >  >  >
> >  >  >  > On 2020/07/01 7:37, Peter Geoghegan wrote:
> >  >  >  >  > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >>
> 
> >
> 
> 
> wrote:
> >  >  >  >  >> Ants and Andres suggested to replace the
> spinlock used in pgss_store() with
> >  >  >  >  >> LWLock. I agreed with them and posted the
> POC patch doing that. But I think
> >  >  >  >  >> the patch is an item for v14. The patch may
> address the reported performance
> >  >  >  >  >> issue, but may cause other performance
> issues in other workloads. We would
> >  >  >  >  >> need to measure how the patch affects the
> performance in various workloads.
> >  >  >  >  >> It seems too late to do that at this stage
> of v13. Thought?
> >  >  >  >  >
> >  >  >  >  > I agree that it's too late for v13.
> >  >  >  >
> >  >  >  > Thanks for the comment!
> >  >  >  >
> >  >  >  > So I pushed the patch and changed default of
> track_planning to off.
> >  >  >  >
> >  >  >  >
> >  >  >  > Maybe there can be documented so enabling this
> option can have a negative impact on performance.
> >  >  >
> >  >  > Yes. What about adding either of the followings into
> the doc?
> >  >  >
> >  >  >   Enabling this parameter may incur a noticeable
> performance penalty.
> >  >  >
> >  >  > or
> >  >  >
> >  >  >   Enabling this parameter may incur a noticeable
> performance penalty,
> >  >  >   especially when a fewer kinds of queries are
> executed on many
> >  >  >   concurrent connections.
> >  >  >
> >  >  >
> >  >  > This second variant looks perfect for this case.
> >  >
> >  > Ok, so patch attached.
> >  >
> >  >
> >  > +1
> >
> > Thanks for the review! Pushed.
> >
> > Regards,
> >
> > --
> > Fujii Masao
> > Advanced Computing Technology Center
> > Research and Development Headquarters
> > NTT DATA CORPORATION
> >
> >
> >
> > You might also want to update this patch's status in the commitfest:
> > https://commitfest.postgresql.org/29/2634/
>
> The patch added into this CF entry has not been committed yet.
> So I was thinking that there is no need to update the status yet. No?
>

Your previous email suggested that i

Re: track_planning causing performance regression

2020-08-17 Thread Fujii Masao




On 2020/07/31 21:40, Hamid Akhtar wrote:



On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:



On 2020/07/04 12:22, Pavel Stehule wrote:
 >
 >
 > pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> >> napsal:
 >
 >
 >
 >     On 2020/07/03 16:02, Pavel Stehule wrote:
 >      >
 >      >
 >      > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> > 
       >
 >      >
 >      >
 >      >     On 2020/07/03 13:05, Pavel Stehule wrote:
 >      >      > Hi
 >      >      >
 >      >      > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> 
>  
>>  
>  
 napsal:
 >      >      >
 >      >      >
 >      >      >
 >      >      >     On 2020/07/01 7:37, Peter Geoghegan wrote:
 >      >      >      > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com> 
>  
>>  
>  
 wrote:
 >      >      >      >> Ants and Andres suggested to replace the spinlock 
used in pgss_store() with
 >      >      >      >> LWLock. I agreed with them and posted the POC 
patch doing that. But I think
 >      >      >      >> the patch is an item for v14. The patch may 
address the reported performance
 >      >      >      >> issue, but may cause other performance issues in 
other workloads. We would
 >      >      >      >> need to measure how the patch affects the 
performance in various workloads.
 >      >      >      >> It seems too late to do that at this stage of v13. 
Thought?
 >      >      >      >
 >      >      >      > I agree that it's too late for v13.
 >      >      >
 >      >      >     Thanks for the comment!
 >      >      >
 >      >      >     So I pushed the patch and changed default of 
track_planning to off.
 >      >      >
 >      >      >
 >      >      > Maybe there can be documented so enabling this option can 
have a negative impact on performance.
 >      >
 >      >     Yes. What about adding either of the followings into the doc?
 >      >
 >      >           Enabling this parameter may incur a noticeable 
performance penalty.
 >      >
 >      >     or
 >      >
 >      >           Enabling this parameter may incur a noticeable 
performance penalty,
 >      >           especially when a fewer kinds of queries are executed 
on many
 >      >           concurrent connections.
 >      >
 >      >
 >      > This second variant looks perfect for this case.
 >
 >     Ok, so patch attached.
 >
 >
 > +1

Thanks for the review! Pushed.

Regards,

-- 
Fujii Masao

Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



You might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/


The patch added into this CF entry has not been committed yet.
So I was thinking that there is no need to update the status yet. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-07-31 Thread Hamid Akhtar
 

On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao 
wrote:

>
>
> On 2020/07/04 12:22, Pavel Stehule wrote:
> >
> >
> > pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com > napsal:
> >
> >
> >
> > On 2020/07/03 16:02, Pavel Stehule wrote:
> >  >
> >  >
> >  > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >> napsal:
> >  >
> >  >
> >  >
> >  > On 2020/07/03 13:05, Pavel Stehule wrote:
> >  >  > Hi
> >  >  >
> >  >  > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com  napsal:
> >  >  >
> >  >  >
> >  >  >
> >  >  > On 2020/07/01 7:37, Peter Geoghegan wrote:
> >  >  >  > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >  masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com  >  >  >  >> Ants and Andres suggested to replace the spinlock
> used in pgss_store() with
> >  >  >  >> LWLock. I agreed with them and posted the POC
> patch doing that. But I think
> >  >  >  >> the patch is an item for v14. The patch may
> address the reported performance
> >  >  >  >> issue, but may cause other performance issues in
> other workloads. We would
> >  >  >  >> need to measure how the patch affects the
> performance in various workloads.
> >  >  >  >> It seems too late to do that at this stage of v13.
> Thought?
> >  >  >  >
> >  >  >  > I agree that it's too late for v13.
> >  >  >
> >  >  > Thanks for the comment!
> >  >  >
> >  >  > So I pushed the patch and changed default of
> track_planning to off.
> >  >  >
> >  >  >
> >  >  > Maybe there can be documented so enabling this option can
> have a negative impact on performance.
> >  >
> >  > Yes. What about adding either of the followings into the doc?
> >  >
> >  >   Enabling this parameter may incur a noticeable
> performance penalty.
> >  >
> >  > or
> >  >
> >  >   Enabling this parameter may incur a noticeable
> performance penalty,
> >  >   especially when a fewer kinds of queries are executed
> on many
> >  >   concurrent connections.
> >  >
> >  >
> >  > This second variant looks perfect for this case.
> >
> > Ok, so patch attached.
> >
> >
> > +1
>
> Thanks for the review! Pushed.
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
>
>
You might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: track_planning causing performance regression

2020-07-05 Thread Fujii Masao




On 2020/07/04 12:22, Pavel Stehule wrote:



pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com>> napsal:



On 2020/07/03 16:02, Pavel Stehule wrote:
 >
 >
 > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> >> napsal:
 >
 >
 >
 >     On 2020/07/03 13:05, Pavel Stehule wrote:
 >      > Hi
 >      >
 >      > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> > 
       >
 >      >
 >      >
 >      >     On 2020/07/01 7:37, Peter Geoghegan wrote:
 >      >      > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com> > 
       >      >> Ants and Andres suggested to replace the spinlock used in 
pgss_store() with
 >      >      >> LWLock. I agreed with them and posted the POC patch doing 
that. But I think
 >      >      >> the patch is an item for v14. The patch may address the 
reported performance
 >      >      >> issue, but may cause other performance issues in other 
workloads. We would
 >      >      >> need to measure how the patch affects the performance in 
various workloads.
 >      >      >> It seems too late to do that at this stage of v13. 
Thought?
 >      >      >
 >      >      > I agree that it's too late for v13.
 >      >
 >      >     Thanks for the comment!
 >      >
 >      >     So I pushed the patch and changed default of track_planning 
to off.
 >      >
 >      >
 >      > Maybe there can be documented so enabling this option can have a 
negative impact on performance.
 >
 >     Yes. What about adding either of the followings into the doc?
 >
 >           Enabling this parameter may incur a noticeable performance 
penalty.
 >
 >     or
 >
 >           Enabling this parameter may incur a noticeable performance 
penalty,
 >           especially when a fewer kinds of queries are executed on many
 >           concurrent connections.
 >
 >
 > This second variant looks perfect for this case.

Ok, so patch attached.


+1


Thanks for the review! Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-07-03 Thread Pavel Stehule
pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao 
napsal:

>
>
> On 2020/07/03 16:02, Pavel Stehule wrote:
> >
> >
> > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao  > napsal:
> >
> >
> >
> > On 2020/07/03 13:05, Pavel Stehule wrote:
> >  > Hi
> >  >
> >  > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >> napsal:
> >  >
> >  >
> >  >
> >  > On 2020/07/01 7:37, Peter Geoghegan wrote:
> >  >  > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >> wrote:
> >  >  >> Ants and Andres suggested to replace the spinlock used in
> pgss_store() with
> >  >  >> LWLock. I agreed with them and posted the POC patch doing
> that. But I think
> >  >  >> the patch is an item for v14. The patch may address the
> reported performance
> >  >  >> issue, but may cause other performance issues in other
> workloads. We would
> >  >  >> need to measure how the patch affects the performance in
> various workloads.
> >  >  >> It seems too late to do that at this stage of v13.
> Thought?
> >  >  >
> >  >  > I agree that it's too late for v13.
> >  >
> >  > Thanks for the comment!
> >  >
> >  > So I pushed the patch and changed default of track_planning
> to off.
> >  >
> >  >
> >  > Maybe there can be documented so enabling this option can have a
> negative impact on performance.
> >
> > Yes. What about adding either of the followings into the doc?
> >
> >   Enabling this parameter may incur a noticeable performance
> penalty.
> >
> > or
> >
> >   Enabling this parameter may incur a noticeable performance
> penalty,
> >   especially when a fewer kinds of queries are executed on many
> >   concurrent connections.
> >
> >
> > This second variant looks perfect for this case.
>
> Ok, so patch attached.
>

+1

Thank you

Pavel


> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


Re: track_planning causing performance regression

2020-07-03 Thread Fujii Masao



On 2020/07/03 16:02, Pavel Stehule wrote:



pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com>> napsal:



On 2020/07/03 13:05, Pavel Stehule wrote:
 > Hi
 >
 > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com> >> napsal:
 >
 >
 >
 >     On 2020/07/01 7:37, Peter Geoghegan wrote:
 >      > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com> >> wrote:
 >      >> Ants and Andres suggested to replace the spinlock used in 
pgss_store() with
 >      >> LWLock. I agreed with them and posted the POC patch doing that. 
But I think
 >      >> the patch is an item for v14. The patch may address the reported 
performance
 >      >> issue, but may cause other performance issues in other 
workloads. We would
 >      >> need to measure how the patch affects the performance in various 
workloads.
 >      >> It seems too late to do that at this stage of v13. Thought?
 >      >
 >      > I agree that it's too late for v13.
 >
 >     Thanks for the comment!
 >
 >     So I pushed the patch and changed default of track_planning to off.
 >
 >
 > Maybe there can be documented so enabling this option can have a 
negative impact on performance.

Yes. What about adding either of the followings into the doc?

      Enabling this parameter may incur a noticeable performance penalty.

or

      Enabling this parameter may incur a noticeable performance penalty,
      especially when a fewer kinds of queries are executed on many
      concurrent connections.


This second variant looks perfect for this case.


Ok, so patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/pgstatstatements.sgml 
b/doc/src/sgml/pgstatstatements.sgml
index 430d8bf07c..cf2d25b7b2 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -607,6 +607,9 @@
  
   pg_stat_statements.track_planning controls whether
   planning operations and duration are tracked by the module.
+  Enabling this parameter may incur a noticeable performance penalty,
+  especially when a fewer kinds of queries are executed on many
+  concurrent connections.
   The default value is off.
   Only superusers can change this setting.
  


Re: track_planning causing performance regression

2020-07-03 Thread Pavel Stehule
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao 
napsal:

>
>
> On 2020/07/03 13:05, Pavel Stehule wrote:
> > Hi
> >
> > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao  > napsal:
> >
> >
> >
> > On 2020/07/01 7:37, Peter Geoghegan wrote:
> >  > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
> masao.fu...@oss.nttdata.com > wrote:
> >  >> Ants and Andres suggested to replace the spinlock used in
> pgss_store() with
> >  >> LWLock. I agreed with them and posted the POC patch doing that.
> But I think
> >  >> the patch is an item for v14. The patch may address the reported
> performance
> >  >> issue, but may cause other performance issues in other
> workloads. We would
> >  >> need to measure how the patch affects the performance in various
> workloads.
> >  >> It seems too late to do that at this stage of v13. Thought?
> >  >
> >  > I agree that it's too late for v13.
> >
> > Thanks for the comment!
> >
> > So I pushed the patch and changed default of track_planning to off.
> >
> >
> > Maybe there can be documented so enabling this option can have a
> negative impact on performance.
>
> Yes. What about adding either of the followings into the doc?
>
>  Enabling this parameter may incur a noticeable performance penalty.
>
> or
>
>  Enabling this parameter may incur a noticeable performance penalty,
>  especially when a fewer kinds of queries are executed on many
>  concurrent connections.
>

This second variant looks perfect for this case.

Thank you

Pavel




> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/03 13:05, Pavel Stehule wrote:

Hi

pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao mailto:masao.fu...@oss.nttdata.com>> napsal:



On 2020/07/01 7:37, Peter Geoghegan wrote:
 > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:
 >> Ants and Andres suggested to replace the spinlock used in pgss_store() 
with
 >> LWLock. I agreed with them and posted the POC patch doing that. But I 
think
 >> the patch is an item for v14. The patch may address the reported 
performance
 >> issue, but may cause other performance issues in other workloads. We 
would
 >> need to measure how the patch affects the performance in various 
workloads.
 >> It seems too late to do that at this stage of v13. Thought?
 >
 > I agree that it's too late for v13.

Thanks for the comment!

So I pushed the patch and changed default of track_planning to off.


Maybe there can be documented so enabling this option can have a negative 
impact on performance.


Yes. What about adding either of the followings into the doc?

Enabling this parameter may incur a noticeable performance penalty.

or

Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-07-02 Thread Pavel Stehule
Hi

pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao 
napsal:

>
>
> On 2020/07/01 7:37, Peter Geoghegan wrote:
> > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao 
> wrote:
> >> Ants and Andres suggested to replace the spinlock used in pgss_store()
> with
> >> LWLock. I agreed with them and posted the POC patch doing that. But I
> think
> >> the patch is an item for v14. The patch may address the reported
> performance
> >> issue, but may cause other performance issues in other workloads. We
> would
> >> need to measure how the patch affects the performance in various
> workloads.
> >> It seems too late to do that at this stage of v13. Thought?
> >
> > I agree that it's too late for v13.
>
> Thanks for the comment!
>
> So I pushed the patch and changed default of track_planning to off.
>

Maybe there can be documented so enabling this option can have a negative
impact on performance.

Regards

Pavel

>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
>
>


Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/03 11:43, Peter Geoghegan wrote:

On Thu, Jul 2, 2020 at 7:39 PM Fujii Masao  wrote:

So I pushed the patch and changed default of track_planning to off.


I have closed out the open item I created for this.


Thanks!!

I added the patch that replaces spinlock with lwlock in pgss, into CF-2020-09.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-07-02 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 7:39 PM Fujii Masao  wrote:
> So I pushed the patch and changed default of track_planning to off.

I have closed out the open item I created for this.

Thanks!
-- 
Peter Geoghegan




Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/01 7:37, Peter Geoghegan wrote:

On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao  wrote:

Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?


I agree that it's too late for v13.


Thanks for the comment!

So I pushed the patch and changed default of track_planning to off.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-07-02 Thread Fujii Masao




On 2020/07/02 1:54, Andres Freund wrote:

Hi,

On 2020-07-01 22:20:50 +0900, Fujii Masao wrote:

On 2020/07/01 4:03, Andres Freund wrote:

Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.


Because pgssEntry is not array entry but hashtable entry. First I was
thinking to assign per-process lwlock to each entry in the array at the
startup. But each entry is created every time new entry is required.
So lwlock needs to be assigned to each entry at that creation time.
We cannnot easily assign lwlock to all the entries at the startup.


But why not just do it exactly at the place the SpinLockInit() is done
currently?


Sorry I failed to understand your point... You mean that new lwlock should
be initialized at the place the SpinLockInit() is done currently instead of
requesting postmaster to initialize all the lwlocks required for pgss
at _PG_init()?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-07-01 Thread Andres Freund
Hi,

On 2020-07-01 22:20:50 +0900, Fujii Masao wrote:
> On 2020/07/01 4:03, Andres Freund wrote:
> > Why did you add the hashing here? It seems a lot better to just add an
> > lwlock in-place instead of the spinlock? The added size is neglegible
> > compared to the size of pgssEntry.
> 
> Because pgssEntry is not array entry but hashtable entry. First I was
> thinking to assign per-process lwlock to each entry in the array at the
> startup. But each entry is created every time new entry is required.
> So lwlock needs to be assigned to each entry at that creation time.
> We cannnot easily assign lwlock to all the entries at the startup.

But why not just do it exactly at the place the SpinLockInit() is done
currently?

Greetings,

Andres Freund




Re: track_planning causing performance regression

2020-07-01 Thread Fujii Masao




On 2020/07/01 4:03, Andres Freund wrote:

Hi,

On 2020-06-30 14:43:39 +0900, Fujii Masao wrote:

I did a prototype patch that replaces spinlocks with futexes, but was not able 
to find a workload where it mattered.


I'm not familiar with futex, but could you tell me why you used futex instead
of LWLock that we already have? Is futex portable?


We can't rely on futexes, they're linux only.


Understood. Thanks!




I also don't see much of a
reason to use spinlocks (rather than lwlocks) here in the first place.



diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index cef8bb5a49..aa506f6c11 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -39,7 +39,7 @@
   * in an entry except the counters requires the same.  To look up an entry,
   * one must hold the lock shared.  To read or update the counters within
   * an entry, one must hold the lock shared or exclusive (so the entry doesn't
- * disappear!) and also take the entry's mutex spinlock.
+ * disappear!) and also take the entry's partition lock.
   * The shared state variable pgss->extent (the next free spot in the external
   * query-text file) should be accessed only while holding either the
   * pgss->mutex spinlock, or exclusive lock on pgss->lock.  We use the mutex to
@@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM 
/ 100;
  
  #define JUMBLE_SIZE1024	/* query serialization buffer size */
  
+#define	PGSS_NUM_LOCK_PARTITIONS()		(pgss_max)

+#definePGSS_HASH_PARTITION_LOCK(key)   \
+   (&(pgss->base +  \
+  (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
+
  /*
   * Extension version number, for supporting older extension versions' objects
   */
@@ -207,7 +212,7 @@ typedef struct pgssEntry
Sizequery_offset;   /* query text offset in external file */
int query_len;  /* # of valid bytes in 
query string, or -1 */
int encoding;   /* query text encoding 
*/
-   slock_t mutex;  /* protects the counters only */
+   LWLock  *lock;  /* protects the counters only */
  } pgssEntry;


Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.


Because pgssEntry is not array entry but hashtable entry. First I was
thinking to assign per-process lwlock to each entry in the array at the
startup. But each entry is created every time new entry is required.
So lwlock needs to be assigned to each entry at that creation time.
We cannnot easily assign lwlock to all the entries at the startup.

Also each entry can be dropped from the hashtable. In this case,
maybe already-assigned lwlock needs to be moved back to "freelist"
so that it will be able to be assigned again to new entry later. We can
implement this probably, but which looks a bit complicated.

Since the hasing addresses these issues, I just used it in POC patch.
But I'd like to hear better idea!


+#define  PGSS_NUM_LOCK_PARTITIONS()  (pgss_max)


Currently pgss_max is used as the number of lwlock for entries.
But if too large number of lwlock is useless (or a bit harmful?), we can
set the upper limit here, e.g., max(pgss_max, 1).

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-06-30 Thread Peter Geoghegan
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao  wrote:
> Ants and Andres suggested to replace the spinlock used in pgss_store() with
> LWLock. I agreed with them and posted the POC patch doing that. But I think
> the patch is an item for v14. The patch may address the reported performance
> issue, but may cause other performance issues in other workloads. We would
> need to measure how the patch affects the performance in various workloads.
> It seems too late to do that at this stage of v13. Thought?

I agree that it's too late for v13.

Thanks
--
Peter Geoghegan




Re: track_planning causing performance regression

2020-06-30 Thread Andres Freund
Hi,

On 2020-06-30 14:30:03 +0300, Ants Aasma wrote:
> Futex is a Linux kernel call that allows to build a lock that has
> uncontended cases work fully in user space almost exactly like a spinlock,
> while falling back to syscalls that wait for wakeup in case of contention.
> It's not portable, but probably something similar could be implemented for
> other operating systems. I did not pursue this further because it became
> apparent that every performance critical spinlock had already been removed.

Our lwlock implementation does have that property already, though. While
the kernel wait is implemented using semaphores, those are implemented
using futexes internally (posix ones, not sysv ones, so only after
whatever version we switched the default to posix semas on linux).

I'd rather move towards removing spinlocks from postgres than making
their implementation more complicated...

Greetings,

Andres Freund




Re: track_planning causing performance regression

2020-06-30 Thread Andres Freund
Hi,

On 2020-06-30 14:43:39 +0900, Fujii Masao wrote:
> > I did a prototype patch that replaces spinlocks with futexes, but was not 
> > able to find a workload where it mattered.
> 
> I'm not familiar with futex, but could you tell me why you used futex instead
> of LWLock that we already have? Is futex portable?

We can't rely on futexes, they're linux only. I also don't see much of a
reason to use spinlocks (rather than lwlocks) here in the first place.


> diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
> b/contrib/pg_stat_statements/pg_stat_statements.c
> index cef8bb5a49..aa506f6c11 100644
> --- a/contrib/pg_stat_statements/pg_stat_statements.c
> +++ b/contrib/pg_stat_statements/pg_stat_statements.c
> @@ -39,7 +39,7 @@
>   * in an entry except the counters requires the same.  To look up an entry,
>   * one must hold the lock shared.  To read or update the counters within
>   * an entry, one must hold the lock shared or exclusive (so the entry doesn't
> - * disappear!) and also take the entry's mutex spinlock.
> + * disappear!) and also take the entry's partition lock.
>   * The shared state variable pgss->extent (the next free spot in the external
>   * query-text file) should be accessed only while holding either the
>   * pgss->mutex spinlock, or exclusive lock on pgss->lock.  We use the mutex 
> to
> @@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = 
> PG_VERSION_NUM / 100;
>  
>  #define JUMBLE_SIZE  1024/* query serialization 
> buffer size */
>  
> +#define  PGSS_NUM_LOCK_PARTITIONS()  (pgss_max)
> +#define  PGSS_HASH_PARTITION_LOCK(key)   \
> + (&(pgss->base + \
> +(get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
> +
>  /*
>   * Extension version number, for supporting older extension versions' objects
>   */
> @@ -207,7 +212,7 @@ typedef struct pgssEntry
>   Sizequery_offset;   /* query text offset in external file */
>   int query_len;  /* # of valid bytes in 
> query string, or -1 */
>   int encoding;   /* query text encoding 
> */
> - slock_t mutex;  /* protects the counters only */
> + LWLock  *lock;  /* protects the counters only */
>  } pgssEntry;

Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.

Greetings,

Andres Freund




Re: track_planning causing performance regression

2020-06-30 Thread Fujii Masao




On 2020/06/30 7:29, Peter Geoghegan wrote:

On Mon, Jun 29, 2020 at 3:23 PM Peter Geoghegan  wrote:

+1 -- this regression seems unacceptable to me.


I added an open item to track this.


Thanks!
I'm thinking to change the default value of track_planning to off for v13.

Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-06-30 Thread Fujii Masao




On 2020/06/30 20:30, Ants Aasma wrote:

On Tue, 30 Jun 2020 at 08:43, Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:

 > The problem looks to be that spinlocks are terrible with overloaded CPU 
and a contended spinlock. A process holding the spinlock might easily get 
scheduled out leading to excessive spinning by everybody. I think a simple thing 
to try would be to replace the spinlock with LWLock.

Yes. Attached is the POC patch that replaces per-counter spinlock with 
LWLock.


Great. I think this is the one that should get considered for testing.

 > I did a prototype patch that replaces spinlocks with futexes, but was 
not able to find a workload where it mattered.

I'm not familiar with futex, but could you tell me why you used futex 
instead
of LWLock that we already have? Is futex portable?


Futex is a Linux kernel call that allows to build a lock that has uncontended 
cases work fully in user space almost exactly like a spinlock, while falling 
back to syscalls that wait for wakeup in case of contention. It's not portable, 
but probably something similar could be implemented for other operating 
systems. I did not pursue this further because it became apparent that every 
performance critical spinlock had already been removed.

To be clear, I am not advocating for this patch to get included. I just had the 
patch immediately available and it could have confirmed that using a better 
lock fixes things.


Understood. Thanks for the explanation!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-06-30 Thread Ants Aasma
On Tue, 30 Jun 2020 at 08:43, Fujii Masao 
wrote:

> > The problem looks to be that spinlocks are terrible with overloaded
> CPU and a contended spinlock. A process holding the spinlock might easily
> get scheduled out leading to excessive spinning by everybody. I think a
> simple thing to try would be to replace the spinlock with LWLock.
>
> Yes. Attached is the POC patch that replaces per-counter spinlock with
> LWLock.
>

Great. I think this is the one that should get considered for testing.


> > I did a prototype patch that replaces spinlocks with futexes, but was
> not able to find a workload where it mattered.
>
> I'm not familiar with futex, but could you tell me why you used futex
> instead
> of LWLock that we already have? Is futex portable?
>

Futex is a Linux kernel call that allows to build a lock that has
uncontended cases work fully in user space almost exactly like a spinlock,
while falling back to syscalls that wait for wakeup in case of contention.
It's not portable, but probably something similar could be implemented for
other operating systems. I did not pursue this further because it became
apparent that every performance critical spinlock had already been removed.

To be clear, I am not advocating for this patch to get included. I just had
the patch immediately available and it could have confirmed that using a
better lock fixes things.

-- 
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com


Re: track_planning causing performance regression

2020-06-29 Thread Fujii Masao



On 2020/06/29 22:23, Ants Aasma wrote:

On Mon, 29 Jun 2020 at 12:17, Julien Rouhaud mailto:rjuju...@gmail.com>> wrote:

On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
mailto:masao.fu...@oss.nttdata.com>> wrote:
 >
 > On 2020/06/29 16:05, Julien Rouhaud wrote:
 > > On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins mailto:tha...@amazon.com>> wrote:
 > >>
 > >> During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 
shows
 >
 > Thanks for the benchmark!
 >
 >
 > >> ~45% performance drop [2] at high DB connection counts (when compared 
with v12.3)
 >
 > That's bad :(
 >
 >
 > >>
 > >> Disabling pg_stat_statements.track_planning (which is 'On' by default)
 > >> brings the TPS numbers up to v12.3 levels.
 > >>
 > >> The inflection point (in this test-case) is 128 Connections, beyond 
which the
 > >> TPS numbers are consistently low. Looking at the mailing list [1], 
this issue
 > >> didn't surface earlier possibly since the regression is trivial at 
low connection counts.
 > >>
 > >> It would be great if this could be optimized further, or 
track_planning
 > >> disabled (by default) so as to not trip users upgrading from v12 with 
pg_stat_statement
 > >> enabled (but otherwise not particularly interested in track_planning).
 >
 > Your benchmark result seems to suggest that the cause of the problem is
 > the contention of per-query spinlock in pgss_store(). Right?
 > This lock contention is likely to happen when multiple sessions run
 > the same queries.
 >
 > One idea to reduce that lock contention is to separate per-query spinlock
 > into two; one is for planning, and the other is for execution. 
pgss_store()
 > determines which lock to use based on the given "kind" argument.
 > To make this idea work, also every pgss counters like shared_blks_hit
 > need to be separated into two, i.e., for planning and execution.

This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?
It'll also quite increase the shared memory consumption.

I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.



The problem looks to be that spinlocks are terrible with overloaded CPU and a 
contended spinlock. A process holding the spinlock might easily get scheduled 
out leading to excessive spinning by everybody. I think a simple thing to try 
would be to replace the spinlock with LWLock.


Yes. Attached is the POC patch that replaces per-counter spinlock with LWLock.



I did a prototype patch that replaces spinlocks with futexes, but was not able 
to find a workload where it mattered.


I'm not familiar with futex, but could you tell me why you used futex instead
of LWLock that we already have? Is futex portable?


We have done a great job at eliminating spinlocks from contended code paths. 
Robins, perhaps you could try it to see if it reduces the regression you are 
observing.


Yes. Also we need to check that this change doesn't increase performance
overhead in other workloads.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index cef8bb5a49..aa506f6c11 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -39,7 +39,7 @@
  * in an entry except the counters requires the same.  To look up an entry,
  * one must hold the lock shared.  To read or update the counters within
  * an entry, one must hold the lock shared or exclusive (so the entry doesn't
- * disappear!) and also take the entry's mutex spinlock.
+ * disappear!) and also take the entry's partition lock.
  * The shared state variable pgss->extent (the next free spot in the external
  * query-text file) should be accessed only while holding either the
  * pgss->mutex spinlock, or exclusive lock on pgss->lock.  We use the mutex to
@@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM 
/ 100;
 
 #define JUMBLE_SIZE1024/* query serialization 
buffer size */
 
+#definePGSS_NUM_LOCK_PARTITIONS()  (pgss_max)
+#definePGSS_HASH_PARTITION_LOCK(key)   \
+   (&(pgss->base + \
+  (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -207,7 +212,7 @@ typedef struct pgssEntry
Sizequery_offset;   /* query text offset in external file */
int query

Re: track_planning causing performance regression

2020-06-29 Thread Andres Freund
Hi,

On 2020-06-29 17:55:28 +0900, Fujii Masao wrote:
> One idea to reduce that lock contention is to separate per-query spinlock
> into two; one is for planning, and the other is for execution. pgss_store()
> determines which lock to use based on the given "kind" argument.
> To make this idea work, also every pgss counters like shared_blks_hit
> need to be separated into two, i.e., for planning and execution.

I suspect that the best thing would be to just turn the spinlock into an
lwlock. Spinlocks deal terribly with contention. I suspect it'd solve
the performance issue entirely. And it might even be possible, further
down the line, to just use a shared lock, and use atomics for the
counters.

Greetings,

Andres Freund




Re: track_planning causing performance regression

2020-06-29 Thread Andres Freund
Hi,

On 2020-06-29 09:05:18 +0200, Julien Rouhaud wrote:
> I can't reproduce this on my laptop, but I can certainly believe that
> running the same 3 queries using more connections than available cores
> will lead to extra overhead.

> I disagree with the conclusion though.  It seems to me that if you
> really have this workload that consists in these few queries and want
> to get better performance, you'll anyway use a connection pooler
> and/or use prepared statements, which will make this overhead
> disappear entirely, and will also yield an even bigger performance
> improvement.

It's an extremely common to have have times where there's more active
queries than CPUs. And a pooler won't avoid that fully, at least not
without drastically reducing overall throughput.

Greetings,

Andres Freund




Re: track_planning causing performance regression

2020-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 3:23 PM Peter Geoghegan  wrote:
> +1 -- this regression seems unacceptable to me.

I added an open item to track this.

Thanks
-- 
Peter Geoghegan




Re: track_planning causing performance regression

2020-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 1:55 AM Fujii Masao  wrote:
> > I disagree with the conclusion though.  It seems to me that if you
> > really have this workload that consists in these few queries and want
> > to get better performance, you'll anyway use a connection pooler
> > and/or use prepared statements, which will make this overhead
> > disappear entirely, and will also yield an even bigger performance
> > improvement.  A quick test using pgbench -M prepared, with
> > track_planning enabled, with still way too many connections already
> > shows a 25% improvement over the -M simple without track_planning.
>
> I understand your point. But IMO the default setting basically should
> be safer value, i.e., off at least until the problem disappears.

+1 -- this regression seems unacceptable to me.

-- 
Peter Geoghegan




Re: track_planning causing performance regression

2020-06-29 Thread Ants Aasma
On Mon, 29 Jun 2020 at 12:17, Julien Rouhaud  wrote:

> On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
>  wrote:
> >
> > On 2020/06/29 16:05, Julien Rouhaud wrote:
> > > On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins 
> wrote:
> > >>
> > >> During fully-cached SELECT-only test using pgbench, Postgres v13Beta1
> shows
> >
> > Thanks for the benchmark!
> >
> >
> > >> ~45% performance drop [2] at high DB connection counts (when compared
> with v12.3)
> >
> > That's bad :(
> >
> >
> > >>
> > >> Disabling pg_stat_statements.track_planning (which is 'On' by default)
> > >> brings the TPS numbers up to v12.3 levels.
> > >>
> > >> The inflection point (in this test-case) is 128 Connections, beyond
> which the
> > >> TPS numbers are consistently low. Looking at the mailing list [1],
> this issue
> > >> didn't surface earlier possibly since the regression is trivial at
> low connection counts.
> > >>
> > >> It would be great if this could be optimized further, or
> track_planning
> > >> disabled (by default) so as to not trip users upgrading from v12 with
> pg_stat_statement
> > >> enabled (but otherwise not particularly interested in track_planning).
> >
> > Your benchmark result seems to suggest that the cause of the problem is
> > the contention of per-query spinlock in pgss_store(). Right?
> > This lock contention is likely to happen when multiple sessions run
> > the same queries.
> >
> > One idea to reduce that lock contention is to separate per-query spinlock
> > into two; one is for planning, and the other is for execution.
> pgss_store()
> > determines which lock to use based on the given "kind" argument.
> > To make this idea work, also every pgss counters like shared_blks_hit
> > need to be separated into two, i.e., for planning and execution.
>
> This can probably remove some overhead, but won't it eventually hit
> the same issue when multiple connections try to plan the same query,
> given the number of different queries and very low execution runtime?
> It'll also quite increase the shared memory consumption.
>
> I'm wondering if we could instead use atomics to store the counters.
> The only downside is that we won't guarantee per-row consistency
> anymore, which may be problematic.
>


The problem looks to be that spinlocks are terrible with overloaded CPU and
a contended spinlock. A process holding the spinlock might easily get
scheduled out leading to excessive spinning by everybody. I think a simple
thing to try would be to replace the spinlock with LWLock.

I did a prototype patch that replaces spinlocks with futexes, but was not
able to find a workload where it mattered. We have done a great job at
eliminating spinlocks from contended code paths. Robins, perhaps you could
try it to see if it reduces the regression you are observing. The patch is
against v13 stable branch.

-- 
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index 7fac0703419..56d45b7cfce 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -90,6 +90,7 @@ s_lock_stuck(const char *file, int line, const char *func)
 int
 s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
 {
+#ifndef HAS_FUTEX
 	SpinDelayStatus delayStatus;
 
 	init_spin_delay(&delayStatus, file, line, func);
@@ -102,6 +103,8 @@ s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
 	finish_spin_delay(&delayStatus);
 
 	return delayStatus.delays;
+#endif
+	elog(FATAL, "Should not be called");
 }
 
 #ifdef USE_DEFAULT_S_UNLOCK
@@ -218,6 +221,71 @@ update_spins_per_delay(int shared_spins_per_delay)
 	return (shared_spins_per_delay * 15 + spins_per_delay) / 16;
 }
 
+#ifdef HAS_FUTEX
+#include 
+#include 
+#include 
+
+static int
+futex(volatile uint32 *uaddr, int futex_op, int val,
+	  const struct timespec *timeout, int *uaddr2, int val3)
+{
+	return syscall(SYS_futex, uaddr, futex_op, val,
+   timeout, uaddr, val3);
+}
+
+int
+futex_lock(volatile slock_t *lock, uint32 current, const char *file, int line, const char *func)
+{
+	int i, s;
+	/*
+	 * First lets wait for a bit without involving the kernel, it is quite likely
+	 * the lock holder is still running.
+	 **/
+	if (likely(current < 2))
+	{
+		uint32 expected;
+		for (i = 0; i < DEFAULT_SPINS_PER_DELAY; i++)
+		{
+			SPIN_DELAY();
+			expected = lock->value;
+			if (expected == 0 && pg_atomic_compare_exchange_u32(lock, &expected, 1))
+return i;
+		}
+
+		while (expected != 2 && !pg_atomic_compare_exchange_u32(lock, &expected, 2)) {
+			if (expected == 0 && pg_atomic_compare_exchange_u32(lock, &expected, 2))
+return i;
+		}
+	}
+
+	/* At this point lock value is 2 and we will get waken up */
+	while (true)
+	{
+		uint32 expected = 0;
+		s = futex(&(lock->value), FUTEX_WAIT, 2, NULL, NULL, 0);
+		if (s == -1 && errno != EAGAIN)
+			elog(FATAL, "Futex wait failed with error: %m");
+
+		/* Maybe someone else was waiting too

Re: track_planning causing performance regression

2020-06-29 Thread Julien Rouhaud
On Mon, Jun 29, 2020 at 1:14 PM Fujii Masao  wrote:
>
> On 2020/06/29 18:56, Fujii Masao wrote:
> >
> >
> > On 2020/06/29 18:53, Julien Rouhaud wrote:
> >> On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
> >>  wrote:
> >>>
> > Your benchmark result seems to suggest that the cause of the problem is
> > the contention of per-query spinlock in pgss_store(). Right?
> > This lock contention is likely to happen when multiple sessions run
> > the same queries.
> >
> > One idea to reduce that lock contention is to separate per-query 
> > spinlock
> > into two; one is for planning, and the other is for execution. 
> > pgss_store()
> > determines which lock to use based on the given "kind" argument.
> > To make this idea work, also every pgss counters like shared_blks_hit
> > need to be separated into two, i.e., for planning and execution.
> 
>  This can probably remove some overhead, but won't it eventually hit
>  the same issue when multiple connections try to plan the same query,
>  given the number of different queries and very low execution runtime?
> >>>
> >>> Yes. But maybe we can expect that the idea would improve
> >>> the performance to the near same level as v12?
> >>
> >> A POC patch should be easy to do and see how much it solves this
> >> problem.  However I'm not able to reproduce the issue, and IMHO unless
> >> we specifically want to be able to distinguish planner-time counters
> >> from execution-time counters, I'd prefer to disable track_planning by
> >> default than going this way, so that users with a sane usage won't
> >> have to suffer from a memory increase.
> >
> > Agreed. +1 to change that default to off.
>
> Attached patch does this.

Patch looks good to me.

> I also add the following into the description about each *_plan_time column
> in the docs. IMO this is helpful for users when they see that those columns
> report zero by default and try to understand why.
>
> (if pg_stat_statements.track_planning is enabled, 
> otherwise zero)

+1

Do you intend to wait for other input before pushing?  FWIW I'm still
not convinced that the exposed problem is representative of any
realistic workload.  I of course entirely agree with the other
documentation changes.




Re: track_planning causing performance regression

2020-06-29 Thread Fujii Masao



On 2020/06/29 18:56, Fujii Masao wrote:



On 2020/06/29 18:53, Julien Rouhaud wrote:

On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
 wrote:



Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.

One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.


This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?


Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?


A POC patch should be easy to do and see how much it solves this
problem.  However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.


Agreed. +1 to change that default to off.


Attached patch does this.

I also add the following into the description about each *_plan_time column
in the docs. IMO this is helpful for users when they see that those columns
report zero by default and try to understand why.

(if pg_stat_statements.track_planning is enabled, otherwise 
zero)

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index f615f8c2bf..c3f013860a 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -3,6 +3,7 @@ CREATE EXTENSION pg_stat_statements;
 -- simple and compound statements
 --
 SET pg_stat_statements.track_utility = FALSE;
+SET pg_stat_statements.track_planning = TRUE;
 SELECT pg_stat_statements_reset();
  pg_stat_statements_reset 
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index cef8bb5a49..65ac301b99 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -442,7 +442,7 @@ _PG_init(void)
 "Selects whether 
planning duration is tracked by pg_stat_statements.",
 NULL,
 &pgss_track_planning,
-true,
+false,
 PGC_SUSET,
 0,
 NULL,
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql 
b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 75c10554a8..6ed8e38028 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -4,6 +4,7 @@ CREATE EXTENSION pg_stat_statements;
 -- simple and compound statements
 --
 SET pg_stat_statements.track_utility = FALSE;
+SET pg_stat_statements.track_planning = TRUE;
 SELECT pg_stat_statements_reset();
 
 SELECT 1 AS "int";
diff --git a/doc/src/sgml/pgstatstatements.sgml 
b/doc/src/sgml/pgstatstatements.sgml
index a13e28a84c..430d8bf07c 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -101,6 +101,8 @@
   
   
Number of times the statement was planned
+   (if pg_stat_statements.track_planning is enabled,
+   otherwise zero)
   
  
 
@@ -110,6 +112,8 @@
   
   
Total time spent planning the statement, in milliseconds
+   (if pg_stat_statements.track_planning is enabled,
+   otherwise zero)
   
  
 
@@ -119,6 +123,8 @@
   
   
Minimum time spent planning the statement, in milliseconds
+   (if pg_stat_statements.track_planning is enabled,
+   otherwise zero)
   
  
 
@@ -128,6 +134,8 @@
   
   
Maximum time spent planning the statement, in milliseconds
+   (if pg_stat_statements.track_planning is enabled,
+   otherwise zero)
   
  
 
@@ -137,6 +145,8 @@
   
   
Mean time spent planning the statement, in milliseconds
+   (if pg_stat_statements.track_planning

Re: track_planning causing performance regression

2020-06-29 Thread Fujii Masao




On 2020/06/29 18:53, Julien Rouhaud wrote:

On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
 wrote:



Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.

One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.


This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?


Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?


A POC patch should be easy to do and see how much it solves this
problem.  However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.


Agreed. +1 to change that default to off.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-06-29 Thread Julien Rouhaud
On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
 wrote:
>
> >> Your benchmark result seems to suggest that the cause of the problem is
> >> the contention of per-query spinlock in pgss_store(). Right?
> >> This lock contention is likely to happen when multiple sessions run
> >> the same queries.
> >>
> >> One idea to reduce that lock contention is to separate per-query spinlock
> >> into two; one is for planning, and the other is for execution. pgss_store()
> >> determines which lock to use based on the given "kind" argument.
> >> To make this idea work, also every pgss counters like shared_blks_hit
> >> need to be separated into two, i.e., for planning and execution.
> >
> > This can probably remove some overhead, but won't it eventually hit
> > the same issue when multiple connections try to plan the same query,
> > given the number of different queries and very low execution runtime?
>
> Yes. But maybe we can expect that the idea would improve
> the performance to the near same level as v12?

A POC patch should be easy to do and see how much it solves this
problem.  However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.

> > I'm wondering if we could instead use atomics to store the counters.
> > The only downside is that we won't guarantee per-row consistency
> > anymore, which may be problematic.
>
> Yeah, we can consider more improvements against this issue.
> But I'm afraid these (maybe including my idea) basically should
> be items for v14...

Yes, that's clearly not something I'd vote to push in v13 at this point.




Re: track_planning causing performance regression

2020-06-29 Thread Fujii Masao




On 2020/06/29 18:17, Julien Rouhaud wrote:

On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
 wrote:


On 2020/06/29 16:05, Julien Rouhaud wrote:

On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins  wrote:


During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows


Thanks for the benchmark!



~45% performance drop [2] at high DB connection counts (when compared with 
v12.3)


That's bad :(




Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.

The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low 
connection counts.

It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with 
pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).


Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.

One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.


This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?


Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?



It'll also quite increase the shared memory consumption.


Yes.



I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.


Yeah, we can consider more improvements against this issue.
But I'm afraid these (maybe including my idea) basically should
be items for v14...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-06-29 Thread Julien Rouhaud
On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
 wrote:
>
> On 2020/06/29 16:05, Julien Rouhaud wrote:
> > On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins  wrote:
> >>
> >> During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
>
> Thanks for the benchmark!
>
>
> >> ~45% performance drop [2] at high DB connection counts (when compared with 
> >> v12.3)
>
> That's bad :(
>
>
> >>
> >> Disabling pg_stat_statements.track_planning (which is 'On' by default)
> >> brings the TPS numbers up to v12.3 levels.
> >>
> >> The inflection point (in this test-case) is 128 Connections, beyond which 
> >> the
> >> TPS numbers are consistently low. Looking at the mailing list [1], this 
> >> issue
> >> didn't surface earlier possibly since the regression is trivial at low 
> >> connection counts.
> >>
> >> It would be great if this could be optimized further, or track_planning
> >> disabled (by default) so as to not trip users upgrading from v12 with 
> >> pg_stat_statement
> >> enabled (but otherwise not particularly interested in track_planning).
>
> Your benchmark result seems to suggest that the cause of the problem is
> the contention of per-query spinlock in pgss_store(). Right?
> This lock contention is likely to happen when multiple sessions run
> the same queries.
>
> One idea to reduce that lock contention is to separate per-query spinlock
> into two; one is for planning, and the other is for execution. pgss_store()
> determines which lock to use based on the given "kind" argument.
> To make this idea work, also every pgss counters like shared_blks_hit
> need to be separated into two, i.e., for planning and execution.

This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?
It'll also quite increase the shared memory consumption.

I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.




Re: track_planning causing performance regression

2020-06-29 Thread Fujii Masao




On 2020/06/29 16:05, Julien Rouhaud wrote:

Hi,

On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins  wrote:


During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows


Thanks for the benchmark!



~45% performance drop [2] at high DB connection counts (when compared with 
v12.3)


That's bad :(




Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.

The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low 
connection counts.

It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with 
pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).


Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.

One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.



These are some details around the above test:

pgbench: scale - 100 / threads - 16
test-duration - 30s each
server - 96 vCPUs / 768GB - r5.24xl (AWS EC2 instance)
client - 72 vCPUs / 144GB - c5.18xl (AWS EC2 instance) (co-located with the DB 
server - Same AZ)
v12 - REL_12_STABLE (v12.3)
v13Beta1 - REL_13_STABLE (v13Beta1)
max_connections = 1
shared_preload_libraries = 'pg_stat_statements'
shared_buffers 128MB


I can't reproduce this on my laptop, but I can certainly believe that
running the same 3 queries using more connections than available cores
will lead to extra overhead.

I disagree with the conclusion though.  It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement.  A quick test using pgbench -M prepared, with
track_planning enabled, with still way too many connections already
shows a 25% improvement over the -M simple without track_planning.


I understand your point. But IMO the default setting basically should
be safer value, i.e., off at least until the problem disappears.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: track_planning causing performance regression

2020-06-29 Thread Julien Rouhaud
Hi,

On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins  wrote:
>
> During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
> ~45% performance drop [2] at high DB connection counts (when compared with 
> v12.3)
>
> Disabling pg_stat_statements.track_planning (which is 'On' by default)
> brings the TPS numbers up to v12.3 levels.
>
> The inflection point (in this test-case) is 128 Connections, beyond which the
> TPS numbers are consistently low. Looking at the mailing list [1], this issue
> didn't surface earlier possibly since the regression is trivial at low 
> connection counts.
>
> It would be great if this could be optimized further, or track_planning
> disabled (by default) so as to not trip users upgrading from v12 with 
> pg_stat_statement
> enabled (but otherwise not particularly interested in track_planning).
>
> These are some details around the above test:
>
> pgbench: scale - 100 / threads - 16
> test-duration - 30s each
> server - 96 vCPUs / 768GB - r5.24xl (AWS EC2 instance)
> client - 72 vCPUs / 144GB - c5.18xl (AWS EC2 instance) (co-located with the 
> DB server - Same AZ)
> v12 - REL_12_STABLE (v12.3)
> v13Beta1 - REL_13_STABLE (v13Beta1)
> max_connections = 1
> shared_preload_libraries = 'pg_stat_statements'
> shared_buffers 128MB

I can't reproduce this on my laptop, but I can certainly believe that
running the same 3 queries using more connections than available cores
will lead to extra overhead.

I disagree with the conclusion though.  It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement.  A quick test using pgbench -M prepared, with
track_planning enabled, with still way too many connections already
shows a 25% improvement over the -M simple without track_planning.