Hi,

The document explains that "lost" value that
pg_replication_slots.wal_status reports means

    some WAL files are definitely lost and this slot cannot be used to resume 
replication anymore.

However, I observed "lost" value while inserting lots of records,
but replication could continue normally. So I wonder if
pg_replication_slots.wal_status may have a bug.

wal_status is calculated in GetWALAvailability(), and probably I found
some issues in it.


        keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
                                                          wal_segment_size) + 1;

max_wal_size_mb is the number of megabytes. wal_keep_segments is
the number of WAL segment files. So it's strange to calculate max of them.
The above should be the following?

    Max(ConvertToXSegs(max_wal_size_mb, wal_segment_size), wal_keep_segments) + 
1



                if ((max_slot_wal_keep_size_mb <= 0 ||
                         max_slot_wal_keep_size_mb >= max_wal_size_mb) &&
                        oldestSegMaxWalSize <= targetSeg)
                        return WALAVAIL_NORMAL;

This code means that wal_status reports "normal" only when
max_slot_wal_keep_size is negative or larger than max_wal_size.
Why is this condition necessary? The document explains "normal
 means that the claimed files are within max_wal_size". So whatever
 max_slot_wal_keep_size value is, IMO that "normal" should be
 reported if the WAL files claimed by the slot are within max_wal_size.
 Thought?

Or, if that condition is really necessary, the document should be
updated so that the note about the condition is added.



If the WAL files claimed by the slot exceeds max_slot_wal_keep_size
but any those WAL files have not been removed yet, wal_status seems
to report "lost". Is this expected behavior? Per the meaning of "lost"
described in the document, "lost" should be reported only when
any claimed files are removed, I think. Thought?

Or this behavior is expected and the document is incorrect?



BTW, if we want to implement GetWALAvailability() as the document
advertises, we can simplify it like the attached POC patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 55cac186dc..0b9cca2173 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9504,62 +9504,29 @@ GetWALAvailability(XLogRecPtr targetLSN)
        XLogSegNo       currSeg;                /* segid of currpos */
        XLogSegNo       targetSeg;              /* segid of targetLSN */
        XLogSegNo       oldestSeg;              /* actual oldest segid */
-       XLogSegNo       oldestSegMaxWalSize;    /* oldest segid kept by 
max_wal_size */
-       XLogSegNo       oldestSlotSeg = InvalidXLogRecPtr;      /* oldest segid 
kept by
-                                                                               
                         * slot */
        uint64          keepSegs;
 
        /* slot does not reserve WAL. Either deactivated, or has never been 
active */
        if (XLogRecPtrIsInvalid(targetLSN))
                return WALAVAIL_INVALID_LSN;
 
-       currpos = GetXLogWriteRecPtr();
-
        /* calculate oldest segment currently needed by slots */
        XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
-       KeepLogSeg(currpos, &oldestSlotSeg);
 
-       /*
-        * Find the oldest extant segment file. We get 1 until checkpoint 
removes
-        * the first WAL segment file since startup, which causes the status 
being
-        * wrong under certain abnormal conditions but that doesn't actually 
harm.
-        */
-       oldestSeg = XLogGetLastRemovedSegno() + 1;
+       /* Find the oldest extant segment file */
+       oldestSeg = XLogGetLastRemovedSegno();
 
-       /* calculate oldest segment by max_wal_size and wal_keep_segments */
+       if (targetSeg <= oldestSeg)
+               return WALAVAIL_REMOVED;
+
+       currpos = GetXLogWriteRecPtr();
        XLByteToSeg(currpos, currSeg, wal_segment_size);
-       keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
-                                                         wal_segment_size) + 1;
+       keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size);
 
-       if (currSeg > keepSegs)
-               oldestSegMaxWalSize = currSeg - keepSegs;
-       else
-               oldestSegMaxWalSize = 1;
-
-       /*
-        * If max_slot_wal_keep_size has changed after the last call, the 
segment
-        * that would been kept by the current setting might have been lost by 
the
-        * previous setting. No point in showing normal or keeping status values
-        * if the targetSeg is known to be lost.
-        */
-       if (targetSeg >= oldestSeg)
-       {
-               /*
-                * show "normal" when targetSeg is within max_wal_size, even if
-                * max_slot_wal_keep_size is smaller than max_wal_size.
-                */
-               if ((max_slot_wal_keep_size_mb <= 0 ||
-                        max_slot_wal_keep_size_mb >= max_wal_size_mb) &&
-                       oldestSegMaxWalSize <= targetSeg)
+       if (currSeg - targetSeg <= keepSegs)
                        return WALAVAIL_NORMAL;
 
-               /* being retained by slots */
-               if (oldestSlotSeg <= targetSeg)
-                       return WALAVAIL_RESERVED;
-       }
-
-       /* Definitely lost */
-       return WALAVAIL_REMOVED;
+       return WALAVAIL_RESERVED;
 }
 
 

Reply via email to