On Mon, Aug 5, 2024 at 9:19 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Aug 2, 2024 at 6:28 PM Nisha Moond <nisha.moond...@gmail.com> wrote: > > > > Performance tests done on the v8-0001 and v8-0002 patches, available at [1]. > > > > Thanks for doing the detailed tests for this patch. > > > The purpose of the performance tests is to measure the impact on > > logical replication with track_commit_timestamp enabled, as this > > involves fetching the commit_ts data to determine > > delete_differ/update_differ conflicts. > > > > Fortunately, we did not see any noticeable overhead from the new > > commit_ts fetch and comparison logic. The only notable impact is > > potential overhead from logging conflicts if they occur frequently. > > Therefore, enabling conflict detection by default seems feasible, and > > introducing a new detect_conflict option may not be necessary. > > > ... > > > > Test 1: create conflicts on Sub using pgbench. > > ---------------------------------------------------------------- > > Setup: > > - Both publisher and subscriber have pgbench tables created as- > > pgbench -p $node1_port postgres -qis 1 > > - At Sub, a subscription created for all the changes from Pub node. > > > > Test Run: > > - To test, ran pgbench for 15 minutes on both nodes simultaneously, > > which led to concurrent updates and update_differ conflicts on the > > Subscriber node. > > Command used to run pgbench on both nodes- > > ./pgbench postgres -p 8833 -c 10 -j 3 -T 300 -P 20 > > > > Results: > > For each case, note the “tps” and total time taken by the apply-worker > > on Sub to apply the changes coming from Pub. > > > > Case1: track_commit_timestamp = off, detect_conflict = off > > Pub-tps = 9139.556405 > > Sub-tps = 8456.787967 > > Time of replicating all the changes: 19min 28s > > Case 2 : track_commit_timestamp = on, detect_conflict = on > > Pub-tps = 8833.016548 > > Sub-tps = 8389.763739 > > Time of replicating all the changes: 20min 20s > > > > Why is there a noticeable tps (~3%) reduction in publisher TPS? Is it > the impact of track_commit_timestamp = on or something else?
Was track_commit_timestamp enabled only on subscriber (as needed) or on both publisher and subscriber? Nisha, can you please confirm from your logs? > > Case3: track_commit_timestamp = on, detect_conflict = off > > Pub-tps = 8886.101726 > > Sub-tps = 8374.508017 > > Time of replicating all the changes: 19min 35s > > Case 4: track_commit_timestamp = off, detect_conflict = on > > Pub-tps = 8981.924596 > > Sub-tps = 8411.120808 > > Time of replicating all the changes: 19min 27s > > > > **The difference of TPS between each case is small. While I can see a > > slight increase of the replication time (about 5%), when enabling both > > track_commit_timestamp and detect_conflict. > > > > The difference in TPS between case 1 and case 2 is quite visible. > IIUC, the replication time difference is due to the logging of > conflicts, right? > > > Test2: create conflict using a manual script > > ---------------------------------------------------------------- > > - To measure the precise time taken by the apply-worker in all cases, > > create a test with a table having 10 million rows. > > - To record the total time taken by the apply-worker, dump the > > current time in the logfile for apply_handle_begin() and > > apply_handle_commit(). > > > > Setup: > > Pub : has a table ‘perf’ with 10 million rows. > > Sub : has the same table ‘perf’ with its own 10 million rows (inserted > > by 1000 different transactions). This table is subscribed for all > > changes from Pub. > > > > Test Run: > > At Pub: run UPDATE on the table ‘perf’ to update all its rows in a > > single transaction. (this will lead to update_differ conflict for all > > rows on Sub when enabled). > > At Sub: record the time(from log file) taken by the apply-worker to > > apply all updates coming from Pub. > > > > Results: > > Below table shows the total time taken by the apply-worker > > (apply_handle_commit Time - apply_handle_begin Time ). > > (Two test runs for each of the four cases) > > > > Case1: track_commit_timestamp = off, detect_conflict = off > > Run1 - 2min 42sec 579ms > > Run2 - 2min 41sec 75ms > > Case 2 : track_commit_timestamp = on, detect_conflict = on > > Run1 - 6min 11sec 602ms > > Run2 - 6min 25sec 179ms > > Case3: track_commit_timestamp = on, detect_conflict = off > > Run1 - 2min 34sec 223ms > > Run2 - 2min 33sec 482ms > > Case 4: track_commit_timestamp = off, detect_conflict = on > > Run1 - 2min 35sec 276ms > > Run2 - 2min 38sec 745ms > > > > ** In the case-2 when both track_commit_timestamp and detect_conflict > > are enabled, the time taken by the apply-worker is ~140% higher. > > > > Test3: Case when no conflict is detected. > > ---------------------------------------------------------------- > > To measure the time taken by the apply-worker when there is no > > conflict detected. This test is to confirm if the time overhead in > > Test1-Case2 is due to the new function GetTupleCommitTs() which > > fetches the origin and timestamp information for each row in the table > > before applying the update. > > > > Setup: > > - The Publisher and Subscriber both have an empty table to start with. > > - At Sub, the table is subscribed for all changes from Pub. > > - At Pub: Insert 10 million rows and the same will be replicated to > > the Sub table as well. > > > > Test Run: > > At Pub: run an UPDATE on the table to update all rows in a single > > transaction. (This will NOT hit the update_differ on Sub because now > > all the tuples have the Pub’s origin). > > > > Results: > > Case1: track_commit_timestamp = off, detect_conflict = off > > Run1 - 2min 39sec 261ms > > Run2 - 2min 30sec 95ms > > Case 2 : track_commit_timestamp = on, detect_conflict = on > > Run1 - 2min 38sec 985ms > > Run2 - 2min 46sec 624ms > > Case3: track_commit_timestamp = on, detect_conflict = off > > Run1 - 2min 59sec 887ms > > Run2 - 2min 34sec 336ms > > Case 4: track_commit_timestamp = off, detect_conflict = on > > Run1 - 2min 33sec 477min > > Run2 - 2min 37sec 677ms > > > > Test Summary - > > -- The duration for case-2 was reduced to 2-3 minutes, matching the > > times of the other cases. > > -- The test revealed that the overhead in case-2 was not due to > > commit_ts fetching (GetTupleCommitTs). > > -- The additional action in case-2 was the error logging of all 10 > > million update_differ conflicts. > > > > According to me, this last point is key among all tests which will > decide whether we should have a new subscription option like > detect_conflict or not. I feel this is the worst case where all the > row updates have conflicts and the majority of time is spent writing > LOG messages. Now, for this specific case, if one wouldn't have > enabled track_commit_timestamp then there would be no difference as > seen in case-4. So, I don't see this as a reason to introduce a new > subscription option like detect_conflicts, if one wants to avoid such > an overhead, she shouldn't have enabled track_commit_timestamp in the > first place to detect conflicts. Also, even without this, we would see > similar overhead in the case of update/delete_missing where we LOG > when the tuple to modify is not found. > Overall, it looks okay to get rid of the 'detect_conflict' parameter. My only concern here is the purpose/use-cases of 'track_commit_timestamp'. Is the only purpose of enabling 'track_commit_timestamp' is to detect conflicts? I couldn't find much in the doc on this. Can there be a case where a user wants to enable 'track_commit_timestamp' for any other purpose without enabling subscription's conflict detection? thanks Shveta