On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote: > I was just coming by to look at bit at the patch series, and bumped > into that:
So I have been looking at the last patch series 0001-0004 posted on this thread, and coming from here: https://postgr.es/m/20181025.215518.189844649.horiguchi.kyot...@lab.ntt.co.jp /* check that the slot is gone */ SELECT * FROM pg_replication_slots It could be an idea to switch to the expanded mode here, not that it matters much still.. +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes) You mean Available here, not Avaiable. This function is only used when scanning for slot information with pg_replication_slots, so wouldn't it be better to just return the status string in this case? Not sure I see the point of the "remain" field, which can be found with a simple calculation using the current insertion LSN, the segment size and the amount of WAL that the slot is retaining. It may be interesting to document a query to do that though. GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in parallel, no? How is it safe to scan pg_wal on a process querying pg_replication_slots while another process may manipulate its contents (aka the checkpointer or just the startup process with an end-of-recovery checkpoint.). This routine relies on unsafe assumptions as this is not concurrent-safe. You can avoid problems by making sure instead that lastRemovedSegNo is initialized correctly at startup, which would be normally one segment older than what's in pg_wal, which feels a bit hacky to rely on to track the oldest segment. It seems to me that GetOldestXLogFileSegNo() should also check for segments matching the current timeline, no? + if (prev_lost_segs != lost_segs) + ereport(WARNING, + (errmsg ("some replication slots have lost required WAL segments"), + errdetail_plural( + "The mostly affected slot has lost %ld segment.", + "The mostly affected slot has lost %ld segments.", + lost_segs, lost_segs))); This can become very noisy with the time, and it would be actually useful to know which replication slot is impacted by that. + slot doesn't have valid restart_lsn, this field Missing a determinant here, and restart_lsn should have a <literal> markup. + many WAL segments that they fill up the space allotted s/allotted/allocated/. + available. The last two states are seen only when + <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the + slot doesn't have valid restart_lsn, this field + is <literal>unknown</literal>. I am a bit confused by this statement. The last two states are "lost" and "keeping", but shouldn't "keeping" be the state showing up by default as it means that all WAL segments are kept around. +# Advance WAL by ten segments (= 160MB) on master +advance_wal($node_master, 10); +$node_master->safe_psql('postgres', "CHECKPOINT;"); This makes the tests very costly, which is something we should avoid as much as possible. One trick which could be used here, on top of reducing the number of segment switches, is to use initdb --wal-segsize=1. -- Michael
signature.asc
Description: PGP signature