Re: Flush some statistics within running transactions

Michael Paquier Wed, 18 Feb 2026 19:58:37 -0800

On Wed, Feb 18, 2026 at 05:40:46AM +0000, Bertrand Drouvot wrote:
> PFA a mandatory rebase (nothing that needs review) due to a92b809f9da1.


I don't find the design of this patch appealing, and my mind points
towards two pieces of it:
1) The new requirement related to pgstat_schedule_anytime_update()
that a stats kind needs to call to enable a timeout.  This partially
doubles with pgstat_report_fixed.  And I suspect that this extra set
of requirements, introducing a new level of complexity for in-core
stats kinds as well as extension developers, would be the source of
more bugs.
2) The timeout requirement itself, relying on a timeout threshold
controlled by a backend-side configuration.

With that in mind, wouldn't it be simpler if we introduced an API that
could be used from client applications instead, in a model similar
what we do for procsignal.c/h?  One such example is
LOG_MEMORY_CONTEXT, where we have a SQL function that is able to tell
to a backend that it needs to do something.  I could see various
benefits to this approach, because it gives more flexibility with the
timing of the stats flushes, which may not be a backend-side only
policy:
- Use a cron bgworker in the backend, that scans pg_stat_activity, for
example for long-running transactions based on a threshold.
- Do the same periodic scan of pg_stat_activity, but from a client
application.

The PROCSIG would need to set a flag in a new SIGUSR1 handler that
would trigger the flush for the stats kinds that have the
out-of-transaction property set once we go through in
ProcessInterrupts().  We already have a pgstats report call there,
hence it is a matter of removing the timeout requirements as presented
in the patch, and let client applications when this should happen.
The property of tracking which stats kind is surely important, Sami
has reminded that a couple of hours ago that there are some stats that
we should not flush even if we get an async request.  Another thing
that I am doubting about is if using the same async flush threshold
makes sense for everything.  Long-running transactions, for example,
mostly would not care much even if we use an interval less aggressive
than what a WAL sender sees.

Not a fan of the hardcoded sleeps in the tests, either.  On fast
machines, these tend to waste in runtime because a process stands idle
doing nothing.  On slow machines, tests could be unstable if a sleep
takes longer than it takes for the environment to react to a condition
of the test.
--
Michael

signature.asc
Description: PGP signature

Re: Flush some statistics within running transactions

Reply via email to