On 2020/06/25 12:57, Alvaro Herrera wrote:
On 2020-Jun-25, Fujii Masao wrote:

        /*
         * Find the oldest extant segment file. We get 1 until checkpoint 
removes
         * the first WAL segment file since startup, which causes the status 
being
         * wrong under certain abnormal conditions but that doesn't actually 
harm.
         */
        oldestSeg = XLogGetLastRemovedSegno() + 1;

I see the point of the above comment, but this can cause wal_status to be
changed from "lost" to "unreserved" after the server restart. Isn't this
really confusing? At least it seems better to document that behavior.

Hmm.

Or if we *can ensure* that the slot with invalidated_at set always means
"lost" slot, we can judge that wal_status is "lost" without using fragile
XLogGetLastRemovedSegno(). Thought?

Hmm, this sounds compelling -- I think it just means we need to ensure
we reset invalidated_at to zero if the slot's restart_lsn is set to a
correct position afterwards.

Yes.

I don't think we have any operation that
does that, so it should be safe -- hopefully I didn't overlook anything?

We need to call ReplicationSlotMarkDirty() and ReplicationSlotSave()
just after setting invalidated_at and restart_lsn in 
InvalidateObsoleteReplicationSlots()?
Otherwise, restart_lsn can go back to the previous value after the restart.

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index e8761f3a18..5584e5dd2c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1229,6 +1229,13 @@ restart:
                s->data.invalidated_at = s->data.restart_lsn;
                s->data.restart_lsn = InvalidXLogRecPtr;
                SpinLockRelease(&s->mutex);
+
+               /*
+                * Save this invalidated slot to disk, to ensure that the slot
+                * is still invalid even after the server restart.
+                */
+               ReplicationSlotMarkDirty();
+               ReplicationSlotSave();
                ReplicationSlotRelease();
/* if we did anything, start from scratch */

Maybe we don't need to do this if the slot is temporary?

Neither copy nor advance seem to work with a slot that has invalid
restart_lsn.

Or XLogGetLastRemovedSegno() should be fixed so that it returns valid
value even after the restart?

This seems more work to implement.

Yes.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to