Performance tests done on the v8-0001 and v8-0002 patches, available at [1].
The purpose of the performance tests is to measure the impact on logical replication with track_commit_timestamp enabled, as this involves fetching the commit_ts data to determine delete_differ/update_differ conflicts. Fortunately, we did not see any noticeable overhead from the new commit_ts fetch and comparison logic. The only notable impact is potential overhead from logging conflicts if they occur frequently. Therefore, enabling conflict detection by default seems feasible, and introducing a new detect_conflict option may not be necessary. Please refer to the following for detailed test results: For all the tests, created two server nodes, one publisher and one as subscriber. Both the nodes are configured with below settings - wal_level = logical shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off ~~~~ Test 1: create conflicts on Sub using pgbench. ---------------------------------------------------------------- Setup: - Both publisher and subscriber have pgbench tables created as- pgbench -p $node1_port postgres -qis 1 - At Sub, a subscription created for all the changes from Pub node. Test Run: - To test, ran pgbench for 15 minutes on both nodes simultaneously, which led to concurrent updates and update_differ conflicts on the Subscriber node. Command used to run pgbench on both nodes- ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20 Results: For each case, note the “tps” and total time taken by the apply-worker on Sub to apply the changes coming from Pub. Case1: track_commit_timestamp = off, detect_conflict = off Pub-tps = 9139.556405 Sub-tps = 8456.787967 Time of replicating all the changes: 19min 28s Case 2 : track_commit_timestamp = on, detect_conflict = on Pub-tps = 8833.016548 Sub-tps = 8389.763739 Time of replicating all the changes: 20min 20s Case3: track_commit_timestamp = on, detect_conflict = off Pub-tps = 8886.101726 Sub-tps = 8374.508017 Time of replicating all the changes: 19min 35s Case 4: track_commit_timestamp = off, detect_conflict = on Pub-tps = 8981.924596 Sub-tps = 8411.120808 Time of replicating all the changes: 19min 27s **The difference of TPS between each case is small. While I can see a slight increase of the replication time (about 5%), when enabling both track_commit_timestamp and detect_conflict. Test2: create conflict using a manual script ---------------------------------------------------------------- - To measure the precise time taken by the apply-worker in all cases, create a test with a table having 10 million rows. - To record the total time taken by the apply-worker, dump the current time in the logfile for apply_handle_begin() and apply_handle_commit(). Setup: Pub : has a table ‘perf’ with 10 million rows. Sub : has the same table ‘perf’ with its own 10 million rows (inserted by 1000 different transactions). This table is subscribed for all changes from Pub. Test Run: At Pub: run UPDATE on the table ‘perf’ to update all its rows in a single transaction. (this will lead to update_differ conflict for all rows on Sub when enabled). At Sub: record the time(from log file) taken by the apply-worker to apply all updates coming from Pub. Results: Below table shows the total time taken by the apply-worker (apply_handle_commit Time - apply_handle_begin Time ). (Two test runs for each of the four cases) Case1: track_commit_timestamp = off, detect_conflict = off Run1 - 2min 42sec 579ms Run2 - 2min 41sec 75ms Case 2 : track_commit_timestamp = on, detect_conflict = on Run1 - 6min 11sec 602ms Run2 - 6min 25sec 179ms Case3: track_commit_timestamp = on, detect_conflict = off Run1 - 2min 34sec 223ms Run2 - 2min 33sec 482ms Case 4: track_commit_timestamp = off, detect_conflict = on Run1 - 2min 35sec 276ms Run2 - 2min 38sec 745ms ** In the case-2 when both track_commit_timestamp and detect_conflict are enabled, the time taken by the apply-worker is ~140% higher. Test3: Case when no conflict is detected. ---------------------------------------------------------------- To measure the time taken by the apply-worker when there is no conflict detected. This test is to confirm if the time overhead in Test1-Case2 is due to the new function GetTupleCommitTs() which fetches the origin and timestamp information for each row in the table before applying the update. Setup: - The Publisher and Subscriber both have an empty table to start with. - At Sub, the table is subscribed for all changes from Pub. - At Pub: Insert 10 million rows and the same will be replicated to the Sub table as well. Test Run: At Pub: run an UPDATE on the table to update all rows in a single transaction. (This will NOT hit the update_differ on Sub because now all the tuples have the Pub’s origin). Results: Case1: track_commit_timestamp = off, detect_conflict = off Run1 - 2min 39sec 261ms Run2 - 2min 30sec 95ms Case 2 : track_commit_timestamp = on, detect_conflict = on Run1 - 2min 38sec 985ms Run2 - 2min 46sec 624ms Case3: track_commit_timestamp = on, detect_conflict = off Run1 - 2min 59sec 887ms Run2 - 2min 34sec 336ms Case 4: track_commit_timestamp = off, detect_conflict = on Run1 - 2min 33sec 477min Run2 - 2min 37sec 677ms Test Summary - -- The duration for case-2 was reduced to 2-3 minutes, matching the times of the other cases. -- The test revealed that the overhead in case-2 was not due to commit_ts fetching (GetTupleCommitTs). -- The additional action in case-2 was the error logging of all 10 million update_differ conflicts. -- To confirm that the additional time was due to logging, I conducted another test. I removed the "ReportApplyConflict()" call for update_differ in the code and re-ran test1-case2 (which initially took ~6 minutes). Without conflict logging, the duration was reduced to "2min 56sec 758 ms". Test4 - Code Profiling ---------------------------------------------------------------- To narrow down the cause of the time overhead in Test2-case2, did code profiling patches. Used same setup and test script as Test2. The overhead in (track_committs=on and detect_conflict=on) case is not introduced by the commit timestamp fetching(e.g. GetTupleCommitTs). The main overhead comes from the log reporting which happens when applying each change: |--16.57%--ReportApplyConflict |--13.17%--errfinish --11.53%--EmitErrorReport --11.41%--send_message_to_server_log ... ... ... |--0.74%--GetTupleCommitTs" Thank you Hou-San for helping in Test1 and conducting Test4. [1] https://www.postgresql.org/message-id/OS0PR01MB57162919F1D6C55D82D4D89D94B12%40OS0PR01MB5716.jpnprd01.prod.outlook.com Thanks, Nisha