On Tue, Jul 1, 2025 at 6:10 PM Zhijie Hou (Fujitsu) wrote: > Here is V45 patch set.
With the main patch set now stable, I am summarizing the performance tests conducted before for reference. In earlier tests [1], we confirmed that in a pub-sub cluster with high workload on the publisher (via pgbench), the patch had no impact on TPS (Transactions Per Second) on the publisher. This indicates that the modifications to the walsender responsible for replying to publisher status do not introduce noticeable overhead. Additionally, we confirmed that the patch, with its latest mechanism for dynamically tuning the frequency of advancing slot.xmin, does not affect TPS on the subscriber when minimal changes occur on the publisher. This test[2] involved creating a pub-sub cluster and running pgbench on the subscriber to monitor TPS. It further suggests that the logic for maintaining non-removable xid in the apply worker does not introduce noticeable overhead for concurrent user DMLs. Furthermore, we tested running pgbench on both publisher and subscriber[3]. Some regression was observed in TPS on the subscriber, because workload on the publisher is pretty high and the apply workers must wait for the amount of transactions with earlier timestamps to be applied and flushed before advancing the non-removable XID to remove dead tuples. This is the expected behavior of this approach since the patch's main goal is to retain dead tuples for reliable conflict detection. When discussing the regression, we considered providing a workaround for users to recover from the regression (the 0002 of the latest patch set). We introduces a GUC option max_conflict_retention_duration, designed to prevent excessive accumulation of dead tuples when subscription with retain_conflict_info enabled is present and the apply worker cannot catch up with the publisher's workload. In short, the conflict detection replication slot will be invalidated if lag time exceeds the specified GUC value. In performance tests[4], we confirmed that the slot would be invalidated as expected when the workload on the publisher was high, and it would not get invalidated anymore after reducing the workload. This shows even if the slot has been invalidated once, users can continue to detect the update_deleted conflict by reduce the workload on the publisher. The design of the patch set was not changed since the last performance test; only some code enhancements have been made. Therefore, I think the results and findings from the previous performance tests are still valid. However, if necessary, we can rerun all the tests on the latest patch set to verify the same. [1] https://www.postgresql.org/message-id/CABdArM5SpMyGvQTsX0-d%3Db%2BJAh0VQjuoyf9jFqcrQ3JLws5eOw%40mail.gmail.com [2] https://www.postgresql.org/message-id/TYAPR01MB5692B0182356F041DC9DE3B5F53E2%40TYAPR01MB5692.jpnprd01.prod.outlook.com [3] https://www.postgresql.org/message-id/CABdArM4OEwmh_31dQ8_F__VmHwk2ag_M%3DYDD4H%2ByYQBG%2BbHGzg%40mail.gmail.com [4] https://www.postgresql.org/message-id/OSCPR01MB14966F39BE1732B9E433023BFF5E72%40OSCPR01MB14966.jpnprd01.prod.outlook.com Best Regards, Hou zj