On Fri, Sep 08, 2023 at 09:04:43AM +0530, Amit Kapila wrote: > I have added something on these lines and also changed the other > comment pointed out by you. In the passing, I made minor cosmetic > changes as well.
+ * We can flush dirty replication slots at regular intervals by any + * background process like bgwriter but checkpoint is a convenient location. I don't see a need to refer to the bgwriter here. On the contrary, it can be confusing as one could think that this flush happens in the bgwriter, but that's not the case currently as only the checkpointer does that. + * We won't ensure that the slot is persisted after the s/persisted/flushed/? Or just refer to the "slot's data being flushed", or refer to "the slot's data is made durable" instead? The use of "persist" here is confusing, because a slot's persistence refers to it as being a *candidate* for flush (compared to an ephemeral slot), and it does not refer to the *fact* of flushing its data to make sure that it survives a crash. In the context of this patch, the LSN value tracked in the slot's in-memory data refers to the last point where the slot's data has been flushed. + /* + * This is used to track the last persisted confirmed_flush LSN value to + * detect any divergence in the in-memory and on-disk values for the same. + */ "This value tracks is the last LSN where this slot's data has been flushed to disk. This is used during a checkpoint shutdown to decide if a logical slot's data should be forcibly flushed or not." Hmm. WAL senders are shut down *after* the checkpointer and *after* the shutdown checkpoint is finished (see PM_SHUTDOWN and PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the checkpoint record before shutting down the primary. In order to limit the number of records to work on after a restart, what this patch is proposing is an improvement. Perhaps it would be better to document that we don't care about the potential concurrent activity of logical WAL senders in this case and that the LSN we are saving at is a best effort, meaning that last_saved_confirmed_flush is just here to reduce the damage on a follow-up restart? The comment added in CheckPointReplicationSlots() goes in this direction, but perhaps this potential concurrent activity should be mentioned? -- Michael
signature.asc
Description: PGP signature