Re: Multi-transactional statements and statistics for autovacuum

Egor Rogov Sun, 07 Jul 2024 05:22:11 -0700

Hello everybody,

On 12.06.2024 20:13, Igor V.Gnatyuk wrote:

Hello.
Before the advent of procedures in PostgreSQL 11 that can managetransactions, there could only be one transactionin one statement. Hence the end of the transaction also meant the endof the statement. Apparently, this is whythe corresponding restriction is described differently in differentplaces of the documentation:
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS"...so a query or transaction still in progress does not affect thedisplayed totals..."
"...counts actions taken so far within the current transaction..."
But now it's possible that several transactions are performed withinone SQL statement call.At the same time, the current implementation transfers the accumulatedstatistics to the shared memory onlyat the end of the statement. These statistics data is used byautomatic vacuum. Thus, in a situationwhere some procedure that changes data is running for a long time(e.g. an infinite event processing loop,including implementing any queues), the changes made and committed init will not affect statistics in shared memoryuntil the CALL statement is finished. This will not allow theautovacuum to make the right cleaning decision in time.To illustrate the described feature, I suggest to consider the examplebelow.

It would be nice to know if this is considered desired behavior or anoversight.

If it's OK that transaction(s) statistics are not accumulated in sharedmemory until the end of the SQL statement, we should at least improvedocumentation to better reflect this.

Although, from my POV, statistics should be send to shared memory afterthe end of each transaction, regardless of the boundaries of SQLstatements. With the current implementation, it's not possible to buildan infinite-loop query processing routine entirely in Postgres; we haveto rely on external tools either to implement a processing loop (toissue separate SQL statements for each event) or to schedule vacuum.


What's your opinion on this?

Example.
We process the data in the 'test' table. The 'changes' column willshow the number of row updates:
  CREATE TABLE test (changes int);

Let's insert a row into the table:

  INSERT INTO test VALUES (0);
At each processing step, the value of the 'changes' column will beincremented. The processing will be performedin a long-running loop within the 'process' procedure (see below). Theactions of each loop step are committed.
  CREATE PROCEDURE process() AS $$
  DECLARE
    l_chs int;
  BEGIN
    LOOP
      UPDATE test SET changes = changes + 1 RETURNING changes INTO l_chs;
      COMMIT;
      RAISE NOTICE 'changes % -- upd_shared = %, upd_local = %', l_chs,
                   (SELECT n_tup_upd FROM pg_stat_all_tables
WHERE relname = 'test'), -- statistics in sharedmemory (considered by autovacuum)
                   (SELECT n_tup_upd FROM pg_stat_xact_all_tables
WHERE relname = 'test'); -- statistics withinthe operation (transaction)
    END LOOP;
  END
  $$ LANGUAGE plpgsql

Let's call the procedure:

  CALL process();

NOTICE:  changes 1 -- upd_shared = 0, upd_local = 1
NOTICE:  changes 2 -- upd_shared = 0, upd_local = 2
NOTICE:  changes 3 -- upd_shared = 0, upd_local = 3
NOTICE:  changes 4 -- upd_shared = 0, upd_local = 4
NOTICE:  changes 5 -- upd_shared = 0, upd_local = 5
NOTICE:  changes 6 -- upd_shared = 0, upd_local = 6
NOTICE:  changes 7 -- upd_shared = 0, upd_local = 7
NOTICE:  changes 8 -- upd_shared = 0, upd_local = 8
...
If we now observe the cumulative statistics on the 'test' table fromanother session, we will seethat despite the fact that there are updates and dead tuples appear,this information does not get into the shared memory:
SELECT n_tup_upd, n_dead_tup, n_ins_since_vacuum, vacuum_count,autovacuum_count FROM pg_stat_all_tables WHERE relname = 'test'
    |  n_tup_upd          | 0
    |  n_dead_tup         | 0
    |  n_ins_since_vacuum | 1
    |  vacuum_count       | 0
    |  autovacuum_count   | 0
It would be logical to remove the existing restriction, that is, toupdate statistics data precisely
after transaction completion, even if the operator is still working.

Re: Multi-transactional statements and statistics for autovacuum

Reply via email to