On 26.12.2019 11:33, Kyotaro Horiguchi wrote:
At Wed, 25 Dec 2019 20:28:04 +0300, Alexey Kondratov 
<a.kondra...@postgrespro.ru> wrote in
Yep, it helps with physical replication slot persistence after
advance, but the whole validation (moveto <= endlsn) does not make
sense for me. The value of moveto should be >= than minlsn ==
confirmed_flush / restart_lsn, while endlsn == retlsn is also always
initialized with confirmed_flush / restart_lsn. Thus, your condition
seems to be true in any case, even if it was no-op one, which we were
intended to catch.
...
If I get it correctly, then we already keep previous slot position in
the minlsn, so we just have to compare endlsn with minlsn and treat
endlsn <= minlsn as a no-op without slot state flushing.
I think you're right about the condition. (endlsn cannot be less than
minlsn, though) But I came to think that we shouldn't use locations in
that decision.

Attached is a patch that does this, so it fixes the bug without
affecting any user-facing behavior. Detailed comment section and DEBUG
output are also added. What do you think now?

I have also forgotten to mention that all versions down to 11.0 should
be affected with this bug.
pg_replication_slot_advance is the only caller of
pg_logical/physical_replication_slot_advacne so there's no apparent
determinant on who-does-what about dirtying and other housekeeping
calculation like *ComputeRequired*() functions, but the current shape
seems a kind of inconsistent between logical and physical.

I think pg_logaical/physical_replication_slot_advance should dirty the
slot if they actually changed anything. And
pg_replication_slot_advance should do the housekeeping if the slots
are dirtied.  (Otherwise both the caller function should dirty the
slot in lieu of the two.)

The attached does that.

Both approaches looks fine for me: my last patch with as minimal intervention as possible and yours refactoring. I think that it is a right direction to let everyone who modifies slot->data also mark slot as dirty.

I found some comment section in your code as rather misleading:

+        /*
+         * We don't need to dirty the slot only for the above change, but dirty
+         * this slot for the same reason with
+         * pg_logical_replication_slot_advance.
+         */

We just modified MyReplicationSlot->data, which is "On-Disk data of a replication slot, preserved across restarts.", so it definitely should be marked as dirty, not because pg_logical_replication_slot_advance does the same.

Also I think that using this transient variable in ReplicationSlotIsDirty is not necessary. MyReplicationSlot is already a pointer to the slot in shared memory.

+    ReplicationSlot *slot = MyReplicationSlot;
+
+    Assert(MyReplicationSlot != NULL);
+
+    SpinLockAcquire(&slot->mutex);

Otherwise it looks fine for me, so attached is the same diff, but with these proposed corrections.

Another concern is that ReplicationSlotIsDirty is added with the only one user. It also cannot be used by SaveSlotToPath due to the simultaneous usage of both flags dirty and just_dirtied there.

In that way, I hope that we should call ReplicationSlotSave unconditionally in the pg_replication_slot_advance, so slot will be saved or not automatically based on the slot->dirty flag. In the same time, ReplicationSlotsComputeRequiredXmin and ReplicationSlotsComputeRequiredLSN should be called by anyone, who modifies xmin and LSN fields in the slot. Otherwise, currently we are getting some leaky abstractions.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 21ae8531b3..edf661521a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -672,6 +672,23 @@ ReplicationSlotMarkDirty(void)
 	SpinLockRelease(&slot->mutex);
 }
 
+/*
+ * Verify whether currently acquired slot is dirty.
+ */
+bool
+ReplicationSlotIsDirty(void)
+{
+	bool dirty;
+
+	Assert(MyReplicationSlot != NULL);
+
+	SpinLockAcquire(&MyReplicationSlot->mutex);
+	dirty = MyReplicationSlot->dirty;
+	SpinLockRelease(&MyReplicationSlot->mutex);
+
+	return dirty;
+}
+
 /*
  * Convert a slot that's marked as RS_EPHEMERAL to a RS_PERSISTENT slot,
  * guaranteeing it will be there after an eventual crash.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6683fc3f9b..d7a16a9071 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -370,6 +370,12 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 		MyReplicationSlot->data.restart_lsn = moveto;
 		SpinLockRelease(&MyReplicationSlot->mutex);
 		retlsn = moveto;
+
+		/*
+		 * Dirty the slot as we updated data that is meant to be
+		 * persistent on disk.
+		 */
+		ReplicationSlotMarkDirty();
 	}
 
 	return retlsn;
@@ -574,9 +580,8 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	nulls[0] = false;
 
 	/* Update the on disk state when lsn was updated. */
-	if (XLogRecPtrIsInvalid(endlsn))
+	if (ReplicationSlotIsDirty())
 	{
-		ReplicationSlotMarkDirty();
 		ReplicationSlotsComputeRequiredXmin(false);
 		ReplicationSlotsComputeRequiredLSN();
 		ReplicationSlotSave();
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 3a5763fb07..f76b84571f 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -189,6 +189,7 @@ extern void ReplicationSlotRelease(void);
 extern void ReplicationSlotCleanup(void);
 extern void ReplicationSlotSave(void);
 extern void ReplicationSlotMarkDirty(void);
+extern bool ReplicationSlotIsDirty(void);
 
 /* misc stuff */
 extern bool ReplicationSlotValidateName(const char *name, int elevel);

Reply via email to