Re: old_snapshot_threshold bottleneck on replica

Pavel Borisov Tue, 24 Jan 2023 02:36:25 -0800

Hi, Maxim!

On Mon, 23 Jan 2023 at 18:40, Maxim Orlov <orlo...@gmail.com> wrote:
>
> Hi!
>
> One of our customers stumble onto a significant performance degradation while 
> running multiple OLAP-like queries on a replica.
> After some investigation, it became clear that the problem is in accessing 
> old_snapshot_threshold parameter.
>
> Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. 
> This is not a problem on primary
> server, since we rarely call GetOldSnapshotThresholdTimestamp:
>
> 5028 void
> 5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
> 5030 {
> 5031 ····if (RelationAllowsEarlyPruning(relation)
> 5032 ········&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
> 5033 ········ereport(ERROR,
> 5034 ················(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
> 5035 ················ errmsg("snapshot too old")));
>
> But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp 
> much often. So, this become a
> bottleneck. The customer solve this issue by setting old_snapshot_threshold 
> to 0. But, I think, we can
> do something about it.
>
> Some more investigation:
>
> -- On primary --
> $ ./bin/psql postgres -c "create database benchmark"
> CREATE DATABASE
> $ ./bin/pgbench -i -Uorlov -s300 benchmark
> dropping old tables...
> NOTICE:  table "pgbench_accounts" does not exist, skipping
> ...
> creating tables...
> generating data (client-side)...
> 30000000 of 30000000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s)
> vacuuming...
> creating primary keys...
> done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side 
> generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s).
>
> -- On secondary --
> $ touch 1.sql
> $ vim 1.sql
> $ cat 1.sql
> \set bid random(1, 300)
> BEGIN;
> SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid;
> END;
> $ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark
> pgbench (16devel)
> progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> ...
> progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>
> $ perf record -F 99 -a -g --call-graph=dwarf sleep 5
> $ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file
> $ grep s_lock file | wc -l
>
> 3486
>
>
> My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 
> 0001 patch.
> With patch 0001 we got:
>
> $ grep s_lock file2 | wc -l
> 8
>
>
> Maybe, we shall go farther and remove mutex_threshold here? This will lead to 
> inconsistency of
> threshold_timestamp and threshold_xid, but is this really a problem?
>
> Thoughts?


I think optimizing locking and switching to atomics wherever it
improves performance is a good direction. If performance improvement
could be demonstrated in a more direct way it would be a good argument
to commit the improvement. Personally I like TPS plots like in [1].

[1] 
https://www.postgresql.org/message-id/CALT9ZEHSX1Hpz5xjDA62yHAHtpinkA6hg8Zt-odyxqppmKbQFA%40mail.gmail.com

Kind regards,
Pavel Borisov,
Supabase

Re: old_snapshot_threshold bottleneck on replica

Reply via email to