Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Peter Smith Tue, 15 Aug 2023 20:54:37 -0700

On Fri, Aug 11, 2023 at 11:45 PM Melih Mutlu <m.melihmu...@gmail.com> wrote:
>
> Again, I couldn't reproduce the cases where you saw significantly degraded 
> performance. I wonder if I'm missing something. Did you do anything not 
> included in the test scripts you shared? Do you think v26-0001 will perform 
> 84% worse than HEAD, if you try again? I just want to be sure that it was not 
> a random thing.
> Interestingly, I also don't see an improvement in above results as big as in 
> your results when inserts/tx ratio is smaller. Even though it certainly is 
> improved in such cases.
>


TEST ENVIRONMENTS

I am running the tests on a high-spec machine:

-- NOTE: Nobody else is using this machine during our testing, so
there are no unexpected influences messing up the results.


Linix

Architecture:          x86_64
CPU(s):                120
Thread(s) per core:    2
Core(s) per socket:    15

              total        used        free      shared  buff/cache   available
Mem:           755G        5.7G        737G         49M         12G        748G
Swap:          4.0G          0B        4.0G

~~~

The results I am seeing are not random. HEAD+v26-0001 is consistently
worse than HEAD but only for some settings. With these settings, I see
bad results (i.e. worse than HEAD) consistently every time using the
dedicated test machine.

Hou-san also reproduced bad results using a different high-spec machine

Vignesh also reproduced bad results using just his laptop but in his
case, it did *not* occur every time. As discussed elsewhere the
problem is timing-related, so sometimes you may be lucky and sometimes
not.

~

I expect you are running everything correctly, but if you are using
just a laptop (like Vignesh) then like him you might need to try
multiple times before you can hit the problem happening in your
environment.

Anyway, in case there is some other reason you are not seeing the bad
results I have re-attached scripts and re-described every step below.

======

BUILDING

-- NOTE: I have a very minimal configuration without any
optimization/debug flags etc. See config.log

$ ./configure --prefix=/home/peter/pg_oss

-- NOTE: Of course, make sure to be running using the correct Postgres:

echo 'set environment variables for OSS work'
export PATH=/home/peter/pg_oss/bin:$PATH

-- NOTE: Be sure to do git stash or whatever so don't accidentally
build a patched version thinking it is the HEAD version
-- NOTE: Be sure to do a full clean build and apply (or don't apply
v26-0001) according to the test you wish to run.

STEPS
1. sudo make clean
2. make
3. sudo make install

======

SCRIPTS & STEPS

SCRIPTS
testrun.sh
do_one_test_setup.sh
do_one_test_PUB.sh
do_one_test_SUB.sh

---

STEPS

Step-1. Edit the testrun.sh

tables=( 100 )
workers=( 2 4 8 16 )
size="0"
prefix="0816headbusy" <-- edit to differentiate each test run

~

Step-2. Edit the do_one_test_PUB.sh
IF commit_counter = 1000 THEN <-- edit this if needed. I wanted 1000
inserts/tx so nothing to do

~

Step-3: Check nothing else is running. If yes, then clean it up
[peter@localhost testing_busy]$ ps -eaf | grep postgres
peter    111924 100103  0 19:31 pts/0    00:00:00 grep --color=auto postgres

~

Step-4: Run the tests
[peter@localhost testing_busy]$ ./testrun.sh
num_tables=100, size=0, num_workers=2, run #1 <-- check the echo
matched the config you set in the Setp-1
waiting for server to shut down.... done
server stopped
waiting for server to shut down.... done
server stopped
num_tables=100, size=0, num_workers=2, run #2
waiting for server to shut down.... done
server stopped
waiting for server to shut down.... done
server stopped
num_tables=100, size=0, num_workers=2, run #3
...

~

Step-5: Sanity check
When the test completes the current folder will be full of .log and .dat* files.
Check for sanity that no errors happened

[peter@localhost testing_busy]$ cat *.log | grep ERROR
[peter@localhost testing_busy]$

~

Step-6: Collect the results
The results are output (by the do_one_test_SUB.sh) into the *.dat_SUB files
Use grep to extract them

[peter@localhost testing_busy]$ cat 0816headbusy_100t_0_2w_*.dat_SUB |
grep RESULT | grep -v duration | awk '{print $3}'
11742.019
12157.355
11773.807
11582.981
12220.962
12546.325
12210.713
12614.892
12015.489
13527.05

Repeat grep for other files:
$ cat 0816headbusy_100t_0_4w_*.dat_SUB | grep RESULT | grep -v
duration | awk '{print $3}'
$ cat 0816headbusy_100t_0_8w_*.dat_SUB | grep RESULT | grep -v
duration | awk '{print $3}'
$ cat 0816headbusy_100t_0_16w_*.dat_SUB | grep RESULT | grep -v
duration | awk '{print $3}'

~

Step-7: Summarise the results
Now I just cut/paste the results from Step-6 into a spreadsheet and
report the median of the runs.

For example, for the above HEAD run, it was:
             2w        4w       8w      16w
1           11742   5996   1919   1582
2           12157   5960   1871   1469
3           11774   5926   2101   1571
4           11583   6155   1883   1671
5           12221   6310   1895   1707
6           12546   6166   1900   1470
7           12211   6114   2477   1587
8           12615   6173   2610   1715
9           12015   5869   2110   1673
10         13527   5913   2144   1227
Median 12184   6055   2010   1584

~

Step-8: REPEAT
-- repeat all above for different size transactions (editing do_one_test_PUB.sh)
-- repeat all above after rebuilding again with HEAD+v26-0001

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#!/bin/bash

tables=( 100 )

#tables=( 10 \
#100 \
#1000 \
#2000
#)

workers=( 2 4 8 16 )

#workers=( 2 \
#4 \
#8 \
#16
#)

prefix="0816headbusy"

# For now, either of "0" or "10kB" could be accepted
size="0"
#size="10rec"
#size="50rec"
#size="500rec"
#size="1000rec"
#size="10kB"

for j in "${workers[@]}"
do
    for i in "${tables[@]}"
    do
	for k in {1..10}
	do
	    echo -e "num_tables=${i}, size=${size}, num_workers=${j}, run #${k}"
	
	    # Run the pub and sub scripts in parallel, after doing the common setup
	    ./do_one_test_setup.sh $i ${size} $j $k $prefix &> ${prefix}_${i}t_${size}_${j}w_${k}.dat
#	    ./do_one_test_setup_SYNC.sh $i ${size} $j $k $prefix &> ${prefix}_${i}t_${size}_${j}w_${k}.dat
	    #
	    ./do_one_test_PUB.sh $i ${size} $j $k &> ${prefix}_${i}t_${size}_${j}w_${k}.dat_PUB & 
#	    ./do_one_test_SUB_SYNCDONE.sh $i ${size} $j $k &> ${prefix}_${i}t_${size}_${j}w_${k}.dat_SUB_SYNCDONE &
	    ./do_one_test_SUB.sh $i ${size} $j $k &> ${prefix}_${i}t_${size}_${j}w_${k}.dat_SUB &
            wait
#
	    pg_ctl --wait -D datapub stop
	    pg_ctl --wait -D datasub stop
#	
#	    Backup the data directory if needed
#	    cp -r datapub BAK_datapub_${prefix}_${i}t_${size}_${j}w_${k}
	done
    done
done

#!/bin/bash
#
# First argument : number of tables
# Second argument : size[Byte] of each tables
# Third argument : max_sync_workers
# Fourth argument : execution numbers
# Fifth argument: test prefix
#

port_pub=5431
data_pub=datapub

port_sub=5432
data_sub=datasub

echo '########################'
echo '# Check configurations #'
echo '########################'

declare num_tables
if [ -n "$1" ]; then
    num_tables=$1
else
    num_tables=10
fi

echo "$num_tables tables will be used while testing"

declare table_size
if [ -n "$2" ]; then
    table_size=$2
else
    table_size=0
fi

num_sync_workers=$3
run_no=$4
prefix=$5

# logname must be defined here

log_pub=pub_${prefix}_${num_tables}_${table_size}_${num_sync_workers}_${run_no}.log
log_sub=sub_${prefix}_${num_tables}_${table_size}_${num_sync_workers}_${run_no}.log

#
# Convert from table_size to number of tuples. The equation was
# found by my tests...
#

declare num_tuples
if [ $table_size == "10kB" ]
then
    num_tuples=3250
else
    num_tuples=0
fi

echo "$num_tuples tuples will be inserted to each tables"


echo '############'
echo '# Clean up #'
echo '############'

pg_ctl stop -D $data_pub -w
pg_ctl stop -D $data_sub -w

rm -rf $data_pub $data_sub

echo '##########'
echo '# Set up #'
echo '##########'

initdb -D $data_pub -U postgres
initdb -D $data_sub -U postgres

cat << EOF >> $data_pub/postgresql.conf
wal_level = logical
wal_sender_timeout = 0
wal_receiver_timeout = 0
port = $port_pub
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
max_sync_workers_per_subscription = 8
log_line_prefix = '%n [%p] '
max_wal_senders = 200
max_replication_slots = 200
EOF

cat << EOF >> $data_sub/postgresql.conf
wal_level = logical
wal_sender_timeout = 0
wal_receiver_timeout = 0
port = $port_sub
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
max_sync_workers_per_subscription = $num_sync_workers
max_logical_replication_workers = 50
log_line_prefix = '%n [%p] '
max_replication_slots = 200
EOF

pg_ctl -D $data_pub start -w -l $log_pub
pg_ctl -D $data_sub start -w -l $log_sub

#
# Setup the IPC tables and Procedures
#
(
    echo "CREATE TABLE ipc(msg text);"
    echo "ALTER TABLE ipc ADD COLUMN ts TIMESTAMP;"
    echo "ALTER TABLE ipc ALTER COLUMN ts SET DEFAULT now();"
    echo "CREATE OR REPLACE PROCEDURE ipc_wait_for(state TEXT) AS \$\$
    DECLARE
    BEGIN
        RAISE NOTICE 'wait for state %', state;
        WHILE NOT EXISTS (SELECT 1 FROM ipc WHERE msg = state) LOOP
            RAISE NOTICE 'waiting until state %', state;
            EXECUTE 'SELECT pg_sleep(0.1);';
         END LOOP;
         RAISE NOTICE 'ok, reached state %', state;
    END;
    \$\$ LANGUAGE plpgsql;
"
) | psql -U postgres -p $port_pub -a

(
    echo "CREATE TABLE ipc(msg text);"
    echo "ALTER TABLE ipc ADD COLUMN ts TIMESTAMP;"
    echo "ALTER TABLE ipc ALTER COLUMN ts SET DEFAULT now();"
    echo "CREATE OR REPLACE PROCEDURE ipc_wait_for(state TEXT) AS \$\$
    DECLARE
    BEGIN
        RAISE NOTICE 'wait for state %', state;
        WHILE NOT EXISTS (SELECT 1 FROM ipc WHERE msg = state) LOOP
            RAISE NOTICE 'waiting until state %', state;
             EXECUTE 'SELECT pg_sleep(0.1);';
        END LOOP;
        RAISE NOTICE 'ok, reached state %', state;
    END;
    \$\$ LANGUAGE plpgsql;
"
) | psql -U postgres -p $port_sub -a

#!/bin/bash
#
# PUB
#
# First argument : number of tables
# Second argument : size[Byte] of each tables
# Third argument : max_sync_workers
# Fourth argument : execution numbers
#

port_pub=5431
data_pub=datapub

port_sub=5432
data_sub=datasub

echo '########################'
echo '# Check configurations #'
echo '########################'

declare num_tables
if [ -n "$1" ]; then
    num_tables=$1
else
    num_tables=10
fi

echo "$num_tables tables will be used while testing"

declare table_size
if [ -n "$2" ]; then
    table_size=$2
else
    table_size=0
fi

num_sync_workers=$3
run_no=$4

#
# Convert from table_size to number of tuples. The equation was
# found by my tests...
#

declare num_tuples
if [ $table_size == "10kB" ]
then
    num_tuples=3250
elif [ $table_size == "10rec" ]
then
    num_tuples=10
elif [ $table_size == "50rec" ]
then
    num_tuples=50
elif [ $table_size == "500rec" ]
then
    num_tuples=500
elif [ $table_size == "1000rec" ]
then
    num_tuples=1000
else
    num_tuples=0
fi

echo "$num_tuples tuples will be inserted to each tables"


echo '#########################'
echo '# IPC at publisher-side #'
echo '#########################'

psql -U postgres -p $port_pub -a -c "CREATE PUBLICATION ipc_at_publisher FOR TABLE ipc;"

# wait a bit for the subscriber-side to connect to this publication
sleep 5s

psql -U postgres -p $port_pub -a -c "CREATE SUBSCRIPTION ipc_from_subscriber CONNECTION 'host=localhost user=postgres port=$port_sub' PUBLICATION ipc_at_subscriber WITH(origin=NONE);"

psql -U postgres -p $port_pub -a -c "INSERT INTO ipc VALUES('pub ipc ready');"
psql -U postgres -p $port_pub -a -c "CALL ipc_wait_for('sub ipc ready');"

psql -U postgres -p $port_pub -a -c "SELECT * FROM ipc ORDER BY ts;"


echo '#############################################'
echo '# Create tables, and populate the test data #'
echo '#############################################'
(
    echo "CREATE TABLE busy_tbl(a text);"

    echo "CREATE SCHEMA test_tables;"	
    echo -e "SELECT 'CREATE TABLE test_tables.manytables_'||i||'(i int);' FROM generate_series(1, $num_tables) g(i) \gexec"
    echo -e "SELECT 'INSERT INTO test_tables.manytables_'||i||' VALUES (generate_series(1, $num_tuples))' FROM generate_series(1, $num_tables) g(i) \gexec"

) | psql -U postgres -p $port_pub -a


echo '##############'
echo '# Busy table #'
echo '##############'

psql -U postgres -p $port_pub -a -c "CREATE PUBLICATION mypub FOR TABLE busy_tbl;"

# wait a bit for the subscriber-side to connect to this publication
sleep 5s

psql -U postgres -p $port_pub -a -c "CALL ipc_wait_for('mysub is created');"
psql -U postgres -p $port_pub -a -c "SELECT * FROM ipc ORDER BY ts;"

echo '#####################'
echo '# Alter publication #'
echo '#####################'

psql -U postgres -p $port_pub -a -c "ALTER PUBLICATION mypub ADD TABLES IN SCHEMA test_tables;"

echo '################'
echo '# Stay busy... #'
echo '################'
(
    echo -e "CREATE OR REPLACE PROCEDURE stay_busy() AS \$\$
    DECLARE
      counter INTEGER := 0;
      commit_counter INTEGER := 0;
    BEGIN
      EXECUTE 'INSERT INTO ipc VALUES(''pub busy started'');';
      RAISE NOTICE 'START: stay_busy';
      WHILE NOT EXISTS (SELECT 1 FROM ipc WHERE msg = 'test finished') LOOP
        EXECUTE 'INSERT INTO busy_tbl VALUES(''some data'');';
        counter := counter + 1;
        commit_counter := commit_counter + 1;
        IF commit_counter = 1000 THEN
          COMMIT;
          commit_counter := 0;
        END IF;
      END LOOP;
      COMMIT;
      EXECUTE 'INSERT INTO ipc VALUES(''pub busy finished'');';
      RAISE NOTICE 'END: stay_busy inserted % records', counter;
    END;
    \$\$ LANGUAGE plpgsql;
"
) | psql -U postgres -p $port_pub -a

psql -U postgres -p $port_pub -a -c "CALL stay_busy();"
psql -U postgres -p $port_pub -a -c "SELECT * FROM ipc ORDER BY ts;"

# wait a bit for the subscriber-side to drop the subscription
sleep 5s

psql -U postgres -p $port_pub -a -c "DROP PUBLICATION mypub;"
psql -U postgres -p $port_pub -a -c "DROP SUBSCRIPTION ipc_from_subscriber;"
psql -U postgres -p $port_pub -a -c "DROP PUBLICATION ipc_at_publisher;"

#!/bin/bash
#
# SUB
#
# First argument : number of tables
# Second argument : size[Byte] of each tables
# Third argument : max_sync_workers
# Fourth argument : execution numbers
#

port_pub=5431
data_pub=datapub

port_sub=5432
data_sub=datasub

echo '########################'
echo '# Check configurations #'
echo '########################'

declare num_tables
if [ -n "$1" ]; then
    num_tables=$1
else
    num_tables=10
fi

echo "$num_tables tables will be used while testing"

declare table_size
if [ -n "$2" ]; then
    table_size=$2
else
    table_size=0
fi

num_sync_workers=$3
run_no=$4

#
# Convert from table_size to number of tuples. The equation was
# found by my tests...
#

declare num_tuples
if [ $table_size == "10kB" ]
then
    num_tuples=3250
else
    num_tuples=0
fi

echo "$num_tuples tuples will be inserted to each tables"


echo '##########################'
echo '# IPC at subscriber-side #'
echo '##########################'

psql -U postgres -p $port_sub -a -c "CREATE SUBSCRIPTION ipc_from_publisher CONNECTION 'host=localhost user=postgres port=$port_pub' PUBLICATION ipc_at_publisher WITH(origin=NONE);"
psql -U postgres -p $port_sub -a -c "CREATE PUBLICATION ipc_at_subscriber FOR TABLE ipc;"

# wait a bit for the publisher-side to connect to this publication
sleep 5s

psql -U postgres -p $port_sub -a -c "INSERT INTO ipc VALUES('sub ipc ready');"
psql -U postgres -p $port_sub -a -c "CALL ipc_wait_for('pub ipc ready');"

echo '#################'
echo '# Create tables #'
echo '#################'
(
    echo "CREATE TABLE busy_tbl(a text);"

    echo "CREATE SCHEMA test_tables;"
    echo -e "SELECT 'CREATE TABLE test_tables.manytables_'||i||'(i int);' FROM generate_series(1, $num_tables) g(i) \gexec"

) | psql -U postgres -p $port_sub -a

sleep 1s

echo '######################################'
echo '# Create subscription for busy table #'
echo '######################################'
(
    echo "CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost user=postgres port=$port_pub' PUBLICATION mypub;"
    echo "INSERT INTO ipc VALUES ('mysub is created');"

) | psql -U postgres -p $port_sub -a


echo '########'
echo '# Test #'
echo '########'
(
    echo -e "CREATE OR REPLACE PROCEDURE log_rep_test(max INTEGER) AS \$\$
    DECLARE
        total_duration INTERVAL := '0';
        avg_duration FLOAT := 0.0;
        start_time TIMESTAMP;
        end_time TIMESTAMP;
    BEGIN
        start_time := clock_timestamp();

        -- time how long it takes for all the tablesyncs to become "ready"
        WHILE EXISTS (SELECT 1 FROM pg_subscription_rel WHERE srsubstate != 'r') LOOP
            COMMIT;
        END LOOP;

        end_time := clock_timestamp();

        total_duration := total_duration + (end_time - start_time);
    
        IF max > 0 THEN
            avg_duration := EXTRACT(EPOCH FROM total_duration) / max * 1000;
        END IF;
    
        RAISE NOTICE 'RESULT: %', avg_duration;
    END;
    \$\$ LANGUAGE plpgsql;
"
) | psql -U postgres -p $port_sub -a


psql -U postgres -p $port_sub -a -c "CALL ipc_wait_for('pub busy started');"

# This alter/refresh will cause all the test tables to participate in the subscription
psql -U postgres -p $port_sub -a -c "INSERT INTO ipc VALUES ('test started');"
psql -U postgres -p $port_sub -a -c "ALTER SUBSCRIPTION mysub REFRESH PUBLICATION;"
psql -U postgres -p $port_sub -a -c "CALL log_rep_test(1);"
psql -U postgres -p $port_sub -a -c "INSERT INTO ipc VALUES ('test finished');"

psql -U postgres -p $port_sub -a -c "CALL ipc_wait_for('pub busy finished');"
psql -U postgres -p $port_sub -a -c "SELECT count(*) FROM busy_tbl;"

psql -U postgres -p $port_sub -a -c "DROP SUBSCRIPTION mysub;"
psql -U postgres -p $port_sub -a -c "DROP SUBSCRIPTION ipc_from_publisher;"
psql -U postgres -p $port_sub -a -c "DROP PUBLICATION ipc_at_subscriber;"

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Reply via email to