RE: Conflict detection for update_deleted in logical replication

Zhijie Hou (Fujitsu) Tue, 01 Jul 2025 23:03:28 -0700

On Tue, Jul 1, 2025 at 6:10 PM Zhijie Hou (Fujitsu) wrote:
> Here is V45 patch set.

With the main patch set now stable, I am summarizing the performance tests
conducted before for reference.

In earlier tests [1], we confirmed that in a pub-sub cluster with high workload
on the publisher (via pgbench), the patch had no impact on TPS (Transactions
Per Second) on the publisher. This indicates that the modifications to the
walsender responsible for replying to publisher status do not introduce
noticeable overhead.

Additionally, we confirmed that the patch, with its latest mechanism for
dynamically tuning the frequency of advancing slot.xmin, does not affect TPS on
the subscriber when minimal changes occur on the publisher. This test[2]
involved creating a pub-sub cluster and running pgbench on the subscriber to
monitor TPS. It further suggests that the logic for maintaining non-removable
xid in the apply worker does not introduce noticeable overhead for concurrent
user DMLs.

Furthermore, we tested running pgbench on both publisher and subscriber[3].
Some regression was observed in TPS on the subscriber, because workload on the
publisher is pretty high and the apply workers must wait for the amount of
transactions with earlier timestamps to be applied and flushed before advancing
the non-removable XID to remove dead tuples. This is the expected behavior of
this approach since the patch's main goal is to retain dead tuples for reliable
conflict detection.

When discussing the regression, we considered providing a workaround for users
to recover from the regression (the 0002 of the latest patch set). We
introduces a GUC option max_conflict_retention_duration, designed to prevent
excessive accumulation of dead tuples when subscription with
retain_conflict_info enabled is present and the apply worker cannot catch
up with the publisher's workload. In short, the conflict detection replication
slot
will be invalidated if lag time exceeds the specified GUC value.

In performance tests[4], we confirmed that the slot would be invalidated as
expected when the workload on the publisher was high, and it would not get
invalidated anymore after reducing the workload. This shows even if the slot
has been invalidated once, users can continue to detect the update_deleted
conflict by reduce the workload on the publisher.

The design of the patch set was not changed since the last performance test;
only some code enhancements have been made. Therefore, I think the results and
findings from the previous performance tests are still valid. However, if
necessary, we can rerun all the tests on the latest patch set to verify the
same.

[1]
https://www.postgresql.org/message-id/CABdArM5SpMyGvQTsX0-d%3Db%2BJAh0VQjuoyf9jFqcrQ3JLws5eOw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/TYAPR01MB5692B0182356F041DC9DE3B5F53E2%40TYAPR01MB5692.jpnprd01.prod.outlook.com
[3]
https://www.postgresql.org/message-id/CABdArM4OEwmh_31dQ8_F__VmHwk2ag_M%3DYDD4H%2ByYQBG%2BbHGzg%40mail.gmail.com
[4]
https://www.postgresql.org/message-id/OSCPR01MB14966F39BE1732B9E433023BFF5E72%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

RE: Conflict detection for update_deleted in logical replication

Reply via email to