On Thu, Nov 9, 2023 at 7:29 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:
>
> On 11/9/23 3:41 AM, Amit Kapila wrote:
> > On Wed, Nov 8, 2023 at 8:09 PM Drouvot, Bertrand
> > <bertranddrouvot...@gmail.com> wrote:
> >>
> >>> Unrelated to above, if there is a user slot on standby with the same
> >>> name which the slot-sync worker is trying to create, then shall it
> >>> emit a warning and skip the sync of that slot or shall it throw an
> >>> error?
> >>>
> >>
> >> I'd vote for emit a warning and move on to the next slot if any.
> >>
> >
> > But then it could take time for users to know the actual problem and
> > they probably notice it after failover.
>
> Right, that's not appealing....
>
> OTOH the slot has already been created manually on the standby so there is
> probably already a "use case" for it (that is probably unrelated to the
> failover story then).
>
> In V32, the following states have been introduced:
>
> "
> 'n': none for user slots,
> 'i': sync initiated for the slot but waiting for primary to catch up.
> 'r': ready for periodic syncs.
> "
>
> Should we introduce a new state that indicates that a sync slot creation
> has failed because the slot already existed? That would probably
> be simple to monitor instead of looking at the log file.
>

Are you saying that we change the state of the already existing slot
on standby? And, such a state would indicate that we are trying to
sync the slot with the same name from the primary. Is that what you
have in mind? If so, it appears quite odd to me to have such a state
and also set it in some unrelated slot that just has the same name.

I understand your point that we can allow other slots to proceed but
it is also important to not create any sort of inconsistency that can
surprise user after failover. Also, the current coding doesn't ensure
we will always give WARNING. If we see the below code that deals with
this WARNING,

+  /* User created slot with the same name exists, emit WARNING. */
+  else if (found && s->data.sync_state == SYNCSLOT_STATE_NONE)
+  {
+    ereport(WARNING,
+        errmsg("not synchronizing slot %s; it is a user created slot",
+             remote_slot->name));
+  }
+  /* Otherwise create the slot first. */
+  else
+  {
+    TransactionId xmin_horizon = InvalidTransactionId;
+    ReplicationSlot *slot;
+
+    ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
+                remote_slot->two_phase, false);

I think this is not a solid check to ensure that the slot existed
before. Because it could be created as soon as the slot sync worker
invokes ReplicationSlotCreate() here. So, depending on the timing, we
can either get an ERROR or WARNING. I feel giving an ERROR in this
case should be okay.

-- 
With Regards,
Amit Kapila.


Reply via email to