On 07.10.2024 03:41, Michael Paquier wrote:
On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote:
Maybe I'm not aware of the whole context of the thread and maybe my
questions will seem a bit stupid, but honestly
it's not entirely clear to me how this statistics will help to adjust the
number of parallel workers.
We may have situations when during overestimation of the cardinality during
query optimization a several number of parallel workers were unjustifiably
generated and vice versa -
due to a high workload only a few number of workers were generated.
How do we identify such cases so as not to increase or decrease the number
of parallel workers when it is not necessary?
Well.  For spiky workloads, only these numbers are not going to help.
If you can map them with the number of times a query related to these
tables has been called, something that pg_stat_statements would be
able to show more information about.

FWIW, I have doubts that these numbers attached to this portion of the
system are always useful.  For OLTP workloads, parallel workers would
unlikely be spawned because even with JOINs we won't work with a high
number of tuples that require them.  This could be interested with
analytics, however complex query sequences mean that we'd still need
to look at all the plans involving the relations where there is an
unbalance of planned/spawned workers, because these can usually
involve quite a few gather nodes.  At the end of the day, it seems to
me that we would still need data that involves statements to track
down specific plans that are starving.  If your application does not
have that many statements, looking at individial plans is OK, but if
you have hundreds of them to dig into, this is time-consuming and
stats at table/index level don't offer data in terms of stuff that
stands out and needs adjustments.

And this is without the argument of bloating more the stats entries
for each table, even if it matters less now that these stats are in
shmem lately.

To be honest, it’s not entirely clear to me how these statistics will help in setting up parallel workers.

As I understand, we need additional tools for analytics, which are available in pg_stat_statements, but how then does it work? maybe you have the opportunity to demonstrate this?

--
Regards,
Alena Rybakina
Postgres Professional



Reply via email to