On Fri, 27 Jun 2025 at 07:05, Michael Paquier <mich...@paquier.xyz> wrote: > > On Thu, Jun 26, 2025 at 05:25:42PM +0530, vignesh C wrote: > > On Thu, 26 Jun 2025 at 06:22, Michael Paquier <mich...@paquier.xyz> wrote: > >> So you are suggesting the addition of an extra ReadPageInternal() that > >> forces a read of only the read, perform the checks on the header, then > >> read the rest. After reading SizeOfXLogShortPHD worth of data, > >> shouldn't the checks on xlp_rem_len be done a bit earlier than what > >> you are proposing in this patch? > > > > Modified > > It seems to me that this assert can be moved after the second page > read: > Assert(SizeOfXLogShortPHD <= readOff);
I felt this Assert can be here to ensure we’ve read SizeOfXLogShortPHD before checking the page header contents. But for the second page read, the following existing Assert should be enough: Assert(pageHeaderSize <= readOff); > > Coming back to the point of Kuroda-san about performance, did you do > some checks related to that and did you measure any difference? I > suspect none of that because in most cases we are just going to fetch > the next page and we would trigger the fast-exit path of > ReadPageInternal() on the second call when fetching the rest. I still > need to get an idea of all that by myself, probably with various > lengths of logical message records. The test execution times are measured in microseconds. In the results table below, the first row indicates the message size, and each value represents the median of 5 test runs. The attached script was used to run these tests. In the attached script the MSG_SIZE variable in the script should be changed to 1000(1 page), 10000 (2 pages approx), 25000 (3 pages approx) , 50000 (6 pages approx), 100000 (12 page approx), 1000000 (122 pages approx), 10000000 (1220 pages approx) and 100000000 (12207 pages approx) and be run. Test execution time can be taken from run_*.dat files that will be generated. Size | 1000 | 10000 | 25000 | 50000 | 100000 | 1000000 --------|-----------|-----------|------------|-------------|------------|-------------- Head | 9297.1 | 9895.4 | 10844.2 | 12946.5 | 16945.1 | 86187.1 Patch | 9222.7 | 9889 | 10897.1 | 12904.2 | 16858.4 | 87115.5 Size | 10000000 | 100000000 ---------|----------------|----------------- HEAD | 804965.6 | 331639.7 Patch | 804942.6 | 321198.6 The performance results show that the patch does not introduce any noticeable overhead across varying message sizes, I felt there was no impact because of the additional page header read. > > Perhaps this code could be improved in the future with less page > reads. Anyway, what you are doing here is simple enough that it is a > no-brainer for the back-branches because we are just forcing our way > through with a new short header validation, so logically that's sound > as far as I can see. I was waiting for the review of the master branch patch to complete. I will post the back branch patches soon. Regards, Vignesh
#!/bin/bash ##### PORT=8000 SLOT_NAME=test PLUGIN_NAME=test_decoding MSG_SIZE=1000000 LOOP=5 ##### for i in `seq 1 $LOOP` do # Cleanup previous result pg_ctl stop -D data rm -rf data rm logfile # Initialize an instance initdb -D data -U postgres -c wal_level=logical cat << EOF >> data/postgresql.conf listen_addresses = '*' wal_level = logical port = $PORT shared_buffers = 8GB checkpoint_timeout = 30min max_wal_size = 20GB min_wal_size = 10GB #max_logical_replication_workers = 16 #max_parallel_apply_workers_per_subscription = 12 autovacuum = off EOF # Start the instance pg_ctl -D data -l logfile start # Create a replication slot psql -U postgres -p $PORT -c "SELECT * FROM pg_create_logical_replication_slot('$SLOT_NAME', '$PLUGIN_NAME')" psql -U postgres -p $PORT -c "SELECT * FROM pg_current_wal_lsn()" # emit a message psql -U postgres -p $PORT -c "SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', repeat('a', $MSG_SIZE));" psql -U postgres -p $PORT -c "SELECT * FROM pg_current_wal_lsn()" t1=$(($(date +%s%N)/1000)) echo $t1 > run_${i}.dat (time psql -d postgres -p $PORT -U postgres -c "SELECT data FROM pg_logical_slot_get_changes('test', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1')") >> run_${i}.dat t2=$(($(date +%s%N)/1000)) echo $t2 >> run_${i}.dat t3=$((t2-t1)) echo "execution time=$t3" >> run_${i}.dat done