On 16.06.2025 17:41, Andres Freund wrote:
TBH, I don't see a point in continuing with this thread without something that
others can test.  I rather doubt that the right fix here is to just change the
lock model over, but without a repro I can't evaluate that.


Hello,

I think I can reproduce the issue with pgbench on a muti-core server. I start a regular select-only test with 64 clients, and while it's running, I start a plpgsql loop creating and dropping temporary tables from a single psql session. I observe ~25% drop in tps reported by pgbench until I cancel the query in psql.


$ pgbench -n -S -c64 -j64 -T300 -P1

progress: 10.0 s, 1249724.7 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 11.0 s, 1248289.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 12.0 s, 1246001.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 13.0 s, 1247832.5 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 14.0 s, 1248205.8 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 15.0 s, 1247737.3 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 16.0 s, 1219444.3 tps, lat 0.052 ms stddev 0.039, 0 failed
progress: 17.0 s, 893943.4 tps, lat 0.071 ms stddev 0.159, 0 failed
progress: 18.0 s, 927861.3 tps, lat 0.069 ms stddev 0.150, 0 failed
progress: 19.0 s, 886317.1 tps, lat 0.072 ms stddev 0.163, 0 failed
progress: 20.0 s, 877200.1 tps, lat 0.073 ms stddev 0.164, 0 failed
progress: 21.0 s, 875424.4 tps, lat 0.073 ms stddev 0.163, 0 failed
progress: 22.0 s, 877693.0 tps, lat 0.073 ms stddev 0.165, 0 failed
progress: 23.0 s, 897202.8 tps, lat 0.071 ms stddev 0.158, 0 failed
progress: 24.0 s, 917853.4 tps, lat 0.070 ms stddev 0.153, 0 failed
progress: 25.0 s, 907865.1 tps, lat 0.070 ms stddev 0.154, 0 failed

Here I started the following loop in psql around 17s and tps dropped by ~25%:

do $$
begin
  for i in 1..1000000 loop
    create temp table tt1 (a bigserial primary key, b text);
    drop table tt1;
    commit;
  end loop;
end;
$$;

Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop of just ~6% under concurrent DDL. I think this strongly suggests that the spinlock is the bottleneck.

Before that, I tried removing `if (!hasMessages) return` optimization in SIGetDataEntries to stress the spinlock and observed ~35% drop in tps of select-only with an empty sinval queue (no DDL running in background). Then I also removed the spinlock in SIGetDataEntries, and the loss was just ~4%, which may be noise. I think this also suggests that the spinlock could be the bottleneck.

I'm running this on a 2 socket AMD EPYC 9654 96-Core server with postgres and pgbench bound to distinct CPUs. PGDATA is placed on tmpfs. postgres is running with the default settings. pgbench tables are of scale 1. pgbench is connecting via loopback/127.0.0.1.

Does this sound convincing?

Best regards,

--
Sergey Shinderuk                https://postgrespro.com/



Reply via email to