On 2025/06/30 20:32, Nisha Moond wrote:
On Fri, Jun 27, 2025 at 5:40 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:



On 2025/06/27 15:32, Nisha Moond wrote:
On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:



On 2025/06/26 15:46, Nisha Moond wrote:
On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:

Hi,

The pg_replication_slots documentation mentions only max_slot_wal_keep_size
as a condition under which the wal_status column can show unreserved or lost.
However, since commit ac0e33136ab, idle_replication_slot_timeout can also
cause this behavior when it is set. This has not been documented yet.
https://www.postgresql.org/docs/devel/view-pg-replication-slots.html


+1 to the doc update.

Thanks for the review!


So, how about updating the documentation to also mention
idle_replication_slot_timeout as a factor that can cause wal_status to
become unreserved or lost? Patch attached.


Since idle_replication_slot_timeout can only cause wal_status to
become 'lost' and not 'unreserved', perhaps we can reword the sentence
slightly for clarity, suggestion -
"The last two states are seen when max_slot_wal_keep_size is
non-negative and, the 'lost' state may also appear when
idle_replication_slot_timeout is greater than zero."

I was thinking that when idle_replication_slot_timeout triggers,
the following functions are called, and that wal_status can become
"unreserved" before ReplicationSlotRelease() runs. It's very short
period, though. Am I wrong?

ReplicationSlotMarkDirty();
ReplicationSlotSave();
ReplicationSlotRelease();


Thank you for pointing it out.
You are correct that while the checkpointer is in the process of
invalidating a slot, it sets its PID as the slot’s active_pid. During
this short window, if a user queries pg_replication_slot, the
underlying function pg_get_replication_slots will compute the
wal_status as 'unreserved' for the invalidated slot because the slot
has a valid active_pid.

That said, it's reasonable to mention in the doc that 'unreserved' may
appear when idle_replication_slot_timeout is greater than zero, as
this can indeed happen. So, let's retain the current description.

However, this behavior isn’t specific to
idle_replication_slot_timeout. For example, when a slot is being
invalidated due to a different cause "wal_level_insufficient",
'unreserved' may also briefly appear in wal_status.

Yes, and "lost" can appear for various reasons, including 
wal_level_insufficient,
so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
status in the note. It would probably be better to remove the mention of "lost"
from that note.


+1

Is this true starting from v16, when logical replication from standby was 
introduced?
In other words, in v15 and earlier, only max_slot_wal_keep_size could cause
the wal_status to become "unreserved" or "lost"? I'm wondering where to 
back-patch
this fix to.


As for "unreserved", it can also occur for different reasons, but typically,
it happens when max_slot_wal_keep_size is set to a non-negative value.
So it might make sense to keep the explanation focused just on "unreserved"
and max_slot_wal_keep_size. For example:

----------------------
           <listitem>
            <para>
             <literal>unreserved</literal> means that the slot no longer
             retains the required WAL files and some of them are to be removed 
at
-          the next checkpoint.  This state can return
+          the next checkpoint.  This can occur when
+          <xref linkend="guc-max-slot-wal-keep-size"/> is set to
+          a non-negative value.  This state can return
             to <literal>reserved</literal> or <literal>extended</literal>.
            </para>
           </listitem>
           <listitem>
----------------------

What do you think?


The change LGTM, only a minor suggestion to add "typically", as “This
can typically occur when…” to indicate that max_slot_wal_keep_size is
one possible reason, not the only one.

OK.


Also, I noticed the note that says “If <structfield>restart_lsn</structfield>
is NULL, this field is null” seems inaccurate. For example, when "wal_removed"
happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove
that note as well?

You're right, the statement is not accurate.
We could rephrase it as: "If <structfield>restart_lsn</structfield> is
NULL, this field is either null or lost." But since 'unreserved' can
also appear briefly during invalidation, it might be better to remove
it altogether.
I agree with removing the description. Unless I'm missing something,
it has been incorrect since at least v13, so we should back-patch this fix
to all supported versions.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation



Reply via email to