Re: Conflict detection for update_deleted in logical replication

Amit Kapila Wed, 04 Dec 2024 02:59:50 -0800

On Tue, Dec 3, 2024 at 9:17 AM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> Also, I removed a change in wait_for_local_flush() which was mis-added in 
> V13_2
> patch.
>


1.
+ if (can_advance_nonremovable_xid(&data, last_recv_timestamp))
+ maybe_advance_nonremovable_xid(&data);

In can_advance_nonremovable_xid(), we determine whether to advance the
oldest xid based on 'last_recv_timestamp' and then again in
maybe_advance_nonremovable_xid()->get_candidate_xid(), we compare it
with the current time. How does that make sense? Shall we use
'last_recv_timestamp' directly in get_candidate_xid() as that will
avoid the additional time check in can_advance_nonremovable_xid()?

2.
+ TimestampTz next_attempt_time; /* when to attemp to advance the xid during
+ * change application */
+} RetainConflictInfoData;

This new variable introduced in this version is not used in the patch.
Any reason or just a leftover?

Apart from the above, I have made a few updates in the comments in the
attached. Please include those after review.

-- 
With Regards,
Amit Kapila.

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 97f4d9fcba..06ba6d3a64 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3736,9 +3736,8 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 
                                                /*
                                                 * Attempt to advance the 
non-removable transaction ID
-                                                * during change application to 
prevent it from
-                                                * remaining unchanged for long 
periods when the worker
-                                                * is busy.
+                                                * to avoid accumulating dead 
rows when the worker is
+                                                * busy.
                                                 */
                                                if 
(can_advance_nonremovable_xid(&data, last_recv_timestamp))
                                                        
maybe_advance_nonremovable_xid(&data);
@@ -3770,6 +3769,12 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
                                                data.remote_epoch = 
pq_getmsgint(&s, 4);
                                                data.reply_time = 
pq_getmsgint64(&s);
 
+                                               /*
+                                                * This should never happen, see
+                                                * 
ProcessStandbyPSRequestMessage. But if it happens
+                                                * due to a bug, we don't want 
to proceed as it can
+                                                * incorrectly advance 
oldest_nonremovable_xid.
+                                                */
                                                if 
(XLogRecPtrIsInvalid(data.remote_lsn))
                                                        elog(ERROR, "cannot get 
the latest WAL position from the publisher");
 
@@ -4164,8 +4169,8 @@ request_publisher_status(RetainConflictInfoData *data)
        data->phase = RCI_WAIT_FOR_PUBLISHER_STATUS;
 
        /*
-        * Skip calling maybe_advance_nonremovable_xid() since further actions
-        * cannot proceed until the publisher status is received.
+        * Skip calling maybe_advance_nonremovable_xid() since further 
transition
+        * is possible only once we receive the publisher status message.
         */
 }
 
@@ -4244,7 +4249,14 @@ wait_for_local_flush(RetainConflictInfoData *data)
        if (!AllTablesyncsReady())
                return;
 
-       /* Return to wait for the changes to be applied */
+       /*
+        * Return to wait for the changes to be applied.
+        *
+        * XXX The remote flush location (last_flushpos) is updated only when
+        * feedback is sent to the server. So, the advancement of
+        * oldest_nonremovable_xid may be delayed. We can always update
+        * last_flushpos here if we notice such a delay.
+        */
        if (last_flushpos < data->remote_lsn)
                return;
 
@@ -4268,7 +4280,7 @@ wait_for_local_flush(RetainConflictInfoData *data)
 }
 
 /*
- * Determine if the next round of transaction ID advancement can be attempted.
+ * Determine if we can attempt to advance transaction ID.
  *
  * TODO: The remote flush location (last_flushpos) is currently not updated
  * during change application, making it impossible to satisfy the condition of

Re: Conflict detection for update_deleted in logical replication

Reply via email to