Hi,
The document explains that "lost" value that
pg_replication_slots.wal_status reports means
some WAL files are definitely lost and this slot cannot be used to resume
replication anymore.
However, I observed "lost" value while inserting lots of records,
but replication could continue normally. So I wonder if
pg_replication_slots.wal_status may have a bug.
wal_status is calculated in GetWALAvailability(), and probably I found
some issues in it.
keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
wal_segment_size) + 1;
max_wal_size_mb is the number of megabytes. wal_keep_segments is
the number of WAL segment files. So it's strange to calculate max of them.
The above should be the following?
Max(ConvertToXSegs(max_wal_size_mb, wal_segment_size), wal_keep_segments) +
1
if ((max_slot_wal_keep_size_mb <= 0 ||
max_slot_wal_keep_size_mb >= max_wal_size_mb) &&
oldestSegMaxWalSize <= targetSeg)
return WALAVAIL_NORMAL;
This code means that wal_status reports "normal" only when
max_slot_wal_keep_size is negative or larger than max_wal_size.
Why is this condition necessary? The document explains "normal
means that the claimed files are within max_wal_size". So whatever
max_slot_wal_keep_size value is, IMO that "normal" should be
reported if the WAL files claimed by the slot are within max_wal_size.
Thought?
Or, if that condition is really necessary, the document should be
updated so that the note about the condition is added.
If the WAL files claimed by the slot exceeds max_slot_wal_keep_size
but any those WAL files have not been removed yet, wal_status seems
to report "lost". Is this expected behavior? Per the meaning of "lost"
described in the document, "lost" should be reported only when
any claimed files are removed, I think. Thought?
Or this behavior is expected and the document is incorrect?
BTW, if we want to implement GetWALAvailability() as the document
advertises, we can simplify it like the attached POC patch.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 55cac186dc..0b9cca2173 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9504,62 +9504,29 @@ GetWALAvailability(XLogRecPtr targetLSN)
XLogSegNo currSeg; /* segid of currpos */
XLogSegNo targetSeg; /* segid of targetLSN */
XLogSegNo oldestSeg; /* actual oldest segid */
- XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by
max_wal_size */
- XLogSegNo oldestSlotSeg = InvalidXLogRecPtr; /* oldest segid
kept by
-
* slot */
uint64 keepSegs;
/* slot does not reserve WAL. Either deactivated, or has never been
active */
if (XLogRecPtrIsInvalid(targetLSN))
return WALAVAIL_INVALID_LSN;
- currpos = GetXLogWriteRecPtr();
-
/* calculate oldest segment currently needed by slots */
XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
- KeepLogSeg(currpos, &oldestSlotSeg);
- /*
- * Find the oldest extant segment file. We get 1 until checkpoint
removes
- * the first WAL segment file since startup, which causes the status
being
- * wrong under certain abnormal conditions but that doesn't actually
harm.
- */
- oldestSeg = XLogGetLastRemovedSegno() + 1;
+ /* Find the oldest extant segment file */
+ oldestSeg = XLogGetLastRemovedSegno();
- /* calculate oldest segment by max_wal_size and wal_keep_segments */
+ if (targetSeg <= oldestSeg)
+ return WALAVAIL_REMOVED;
+
+ currpos = GetXLogWriteRecPtr();
XLByteToSeg(currpos, currSeg, wal_segment_size);
- keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
- wal_segment_size) + 1;
+ keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size);
- if (currSeg > keepSegs)
- oldestSegMaxWalSize = currSeg - keepSegs;
- else
- oldestSegMaxWalSize = 1;
-
- /*
- * If max_slot_wal_keep_size has changed after the last call, the
segment
- * that would been kept by the current setting might have been lost by
the
- * previous setting. No point in showing normal or keeping status values
- * if the targetSeg is known to be lost.
- */
- if (targetSeg >= oldestSeg)
- {
- /*
- * show "normal" when targetSeg is within max_wal_size, even if
- * max_slot_wal_keep_size is smaller than max_wal_size.
- */
- if ((max_slot_wal_keep_size_mb <= 0 ||
- max_slot_wal_keep_size_mb >= max_wal_size_mb) &&
- oldestSegMaxWalSize <= targetSeg)
+ if (currSeg - targetSeg <= keepSegs)
return WALAVAIL_NORMAL;
- /* being retained by slots */
- if (oldestSlotSeg <= targetSeg)
- return WALAVAIL_RESERVED;
- }
-
- /* Definitely lost */
- return WALAVAIL_REMOVED;
+ return WALAVAIL_RESERVED;
}