On 09.01.2020 09:36, Kyotaro Horiguchi wrote:
Hello.

At Sun, 29 Dec 2019 15:12:16 +0300, Alexey Kondratov 
<a.kondra...@postgrespro.ru> wrote in
On 2019-12-26 16:35, Alexey Kondratov wrote:
Another concern is that ReplicationSlotIsDirty is added with the only
one user. It also cannot be used by SaveSlotToPath due to the
simultaneous usage of both flags dirty and just_dirtied there.
In that way, I hope that we should call ReplicationSlotSave
unconditionally in the pg_replication_slot_advance, so slot will be
saved or not automatically based on the slot->dirty flag. In the same
time, ReplicationSlotsComputeRequiredXmin and
ReplicationSlotsComputeRequiredLSN should be called by anyone, who
modifies xmin and LSN fields in the slot. Otherwise, currently we are
getting some leaky abstractions.
Sounds reasonable.

Great, so it seems that we have reached some agreement about who should mark slot as dirty, at least for now.


If someone will utilise old WAL and after that crash will happen
between steps 2) and 3), then we start with old value of restart_lsn,
but without required WAL. I do not know how to properly reproduce it
without gdb and power off, so the chance is pretty low, but still it
could be a case.
In the first place we advance required LSN for every reply message but
save slot data only at checkpoint on physical repliation.  Such a
strict guarantee seems too much.

...

I think we shouldn't touch the paths used by replication protocol. And
don't we focus on how we make a change of a replication slot from SQL
interface persistent?  It seems to me that generaly we don't need to
save dirty slots other than checkpoint, but the SQL function seems
wanting the change to be saved immediately.

As the result, please find the attached, which is following only the
first paragraph cited above.

OK, I have definitely overthought that, thanks. This looks like a minimal subset of changes that actually solves the bug. I would only prefer to keep some additional comments (something like the attached), otherwise after half a year it will be unclear again, why we save slot unconditionally here.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index bb69683e2a..084e0c2960 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 		MyReplicationSlot->data.restart_lsn = moveto;
 		SpinLockRelease(&MyReplicationSlot->mutex);
 		retlsn = moveto;
+
+		ReplicationSlotMarkDirty();
+
+		/* We moved retart_lsn, update the global value. */
+		ReplicationSlotsComputeRequiredLSN();
 	}
 
 	return retlsn;
@@ -564,7 +569,10 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 						(uint32) (moveto >> 32), (uint32) moveto,
 						(uint32) (minlsn >> 32), (uint32) minlsn)));
 
-	/* Do the actual slot update, depending on the slot type */
+	/*
+	 * Do the actual slot update, depending on the slot type.  Slot will be
+	 * marked as dirty by pg_*_replication_slot_advance if changed.
+	 */
 	if (OidIsValid(MyReplicationSlot->data.database))
 		endlsn = pg_logical_replication_slot_advance(moveto);
 	else
@@ -573,14 +581,11 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
 	nulls[0] = false;
 
-	/* Update the on disk state when lsn was updated. */
-	if (XLogRecPtrIsInvalid(endlsn))
-	{
-		ReplicationSlotMarkDirty();
-		ReplicationSlotsComputeRequiredXmin(false);
-		ReplicationSlotsComputeRequiredLSN();
-		ReplicationSlotSave();
-	}
+	/*
+	 * Update the on disk state.  No work here if
+	 * pg_*_replication_slot_advance call was a no-op.
+	 */
+	ReplicationSlotSave();
 
 	ReplicationSlotRelease();
 

Reply via email to