Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit :
> On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunk...@aiven.io> 
wrote:
> > order to fail early if the replication slot doesn't exist, so please find
> > attached v2 for that.
> 
> Thanks for the patches. Here are some comments:
> 

Thank you for this review ! Please see the other side of the thread where I 
answered Michael Paquier with a new patchset, which includes some of your 
proposed modifications.

> 1) While the intent of these patches looks good, I have following
> concern with new replication command READ_REPLICATION_SLOT: what if
> the pg_receivewal exits (because user issued a SIGINT or for some
> reason) after flushing  the received WAL to disk, before it sends
> sendFeedback to postgres server's walsender so that it doesn't get a
> chance to update the restart_lsn in the replication slot via
> PhysicalConfirmReceivedLocation. If the pg_receivewal is started
> again, isn't it going to get the previous restart_lsn and receive the
> last chunk of flushed WAL again?

I've kept the existing directory as the source of truth if we have any WAL 
there already. If we don't, we fallback to the restart_lsn on the replication 
slot.
So in the event that we start it again from somewhere else where we don't have 
access to those WALs anymore, we could be receiving it again, which in my 
opinion is better than losing everything in between in that case. 

> 
> 2) What is the significance of READ_REPLICATION_SLOT for logical
> replication slots? I read above that somebody suggested to restrict
> the walsender to handle READ_REPLICATION_SLOT for physical replication
> slots so that the callers will see a command failure. But I tend to
> think that it is clean to have this command common for both physical
> and logical replication slots and the callers can have an Assert(type
> == 'physical').

I've updated the patch to make it easy for the caller to check the slot's type 
and added a verification for those cases.

In general, I tried to implement the meaning of the different fields exactly as 
it's done in the pg_replication_slots view.

> 
> 3) Isn't it useful to send active, active_pid info of the replication
> slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
> true && active_pid == getpid()) as an assertion to ensure that it is
> the sole owner of the replication slot? Also, is it good send
> wal_status info

Typically we read the slot before attaching to it, since what we want to do is 
check if it exists. It may be worthwile to check that it's not already used by 
another backend though.

> 
> 4) I think below messages should start with lower case letter and also
> there are some typos:
> + pg_log_warning("Could not fetch the replication_slot \"%s\" information "
> + pg_log_warning("Server does not suport fetching the slot's position, "
> something like:
> + pg_log_warning("could not fetch replication slot \"%s\" information, "
> +    "resuming from current server position instead", replication_slot);
> + pg_log_warning("server does not support fetching replication slot
> information, "
> +    "resuming from current server position instead");
> 
I've rephrased it a bit in v3, let me know if that's what you had in mind.


> 5) How about emitting the above messages in case of "verbose"?

I think it is useful to warn the user even if not in the verbose case, but if 
the consensus is to move it to verbose only output I can change it.

> 
> 6) How about an assertion like below?
> + if (stream.startpos == InvalidXLogRecPtr)
> + {
> + stream.startpos = serverpos;
> + stream.timeline = servertli;
> + }
> +
> +Assert(stream.startpos != InvalidXLogRecPtr)>>

Good idea.

> 
> 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when 
using a replication slot. Maybe an option to turn it off could be useful ? 

> 
> 8) Just an idea, how about we store pg_receivewal's lastFlushPosition
> in a file before pg_receivewal exits and compare it with the
> restart_lsn that it received from the replication slot, if
> lastFlushPosition == received_restart_lsn well and good, if not, then
> something would have happened and we always start at the
> lastFlushPosition ?

The patch idea originally came from the fact that some utility use 
pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't 
really see what value this brings compared to the existing (and unmodified) way 
of computing the restart position from the already stored files ?

Best regards,

-- 
Ronan Dunklau




Reply via email to