Hi, On 2023-10-23 16:21:45 -0700, Andres Freund wrote: > 1) For some reason I haven't yet debugged, the ReadRecord(PANIC) in > PerformWalRecovery() doesn't PANIC, but instead just returns NULL > > We *do* output a DEBUG message, but well, that's insufficient.
The debug is from this backtrace: #0 XLogFileReadAnyTLI (segno=6476, emode=13, source=XLOG_FROM_PG_WAL) at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:4291 #1 0x000055d7b3949db0 in WaitForWALToBecomeAvailable (RecPtr=108649259008, randAccess=true, fetching_ckpt=false, tliRecPtr=108665421680, replayTLI=1, replayLSN=108665421680, nonblocking=false) at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3697 #2 0x000055d7b39494ff in XLogPageRead (xlogreader=0x55d7b472c470, targetPagePtr=108649250816, reqLen=8192, targetRecPtr=108665421680, readBuf=0x55d7b47ba5d8 "\024\321\005") at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3278 #3 0x000055d7b3941bb1 in ReadPageInternal (state=0x55d7b472c470, pageptr=108665413632, reqLen=8072) at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:1014 #4 0x000055d7b3940f43 in XLogDecodeNextRecord (state=0x55d7b472c470, nonblocking=false) at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:571 #5 0x000055d7b3941a41 in XLogReadAhead (state=0x55d7b472c470, nonblocking=false) at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:947 #6 0x000055d7b393f5fa in XLogPrefetcherNextBlock (pgsr_private=94384934340072, lsn=0x55d7b47cfeb8) at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:496 #7 0x000055d7b393efcd in lrq_prefetch (lrq=0x55d7b47cfe88) at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:256 #8 0x000055d7b393f190 in lrq_complete_lsn (lrq=0x55d7b47cfe88, lsn=0) at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:294 #9 0x000055d7b39401ba in XLogPrefetcherReadRecord (prefetcher=0x55d7b47bc5e8, errmsg=0x7ffc23505920) at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:1041 #10 0x000055d7b3948ff8 in ReadRecord (xlogprefetcher=0x55d7b47bc5e8, emode=23, fetching_ckpt=false, replayTLI=1) at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3078 #11 0x000055d7b3946749 in PerformWalRecovery () at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:1640 The source of the emode=13=DEBUG2 is that that's hardcoded in WaitForWALToBecomeAvailable(). I guess the error ought to come from XLogPageRead(), but all that happens is this: case XLREAD_FAIL: if (readFile >= 0) close(readFile); readFile = -1; readLen = 0; readSource = XLOG_FROM_ANY; return XLREAD_FAIL; which *does* error out for some other failures: errno = save_errno; ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), (errcode_for_file_access(), errmsg("could not read from WAL segment %s, LSN %X/%X, offset %u: %m", fname, LSN_FORMAT_ARGS(targetPagePtr), readOff))); but not for a file that couldn't be opened. Which wouldn't have to be due to ENOENT, could also be EACCESS... xlogreader has undergone a fair bit of changes in the last few releases. As of now, I can't really understand who is responsible for reporting what kind of error. /* * Data input callback * * This callback shall read at least reqLen valid bytes of the xlog page * starting at targetPagePtr, and store them in readBuf. The callback * shall return the number of bytes read (never more than XLOG_BLCKSZ), or * -1 on failure. The callback shall sleep, if necessary, to wait for the * requested bytes to become available. The callback will not be invoked * again for the same page unless more than the returned number of bytes * are needed. * * targetRecPtr is the position of the WAL record we're reading. Usually * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs * to read and verify the page or segment header, before it reads the * actual WAL record it's interested in. In that case, targetRecPtr can * be used to determine which timeline to read the page from. * * The callback shall set ->seg.ws_tli to the TLI of the file the page was * read from. */ XLogPageReadCB page_read; /* * Callback to open the specified WAL segment for reading. ->seg.ws_file * shall be set to the file descriptor of the opened segment. In case of * failure, an error shall be raised by the callback and it shall not * return. * * "nextSegNo" is the number of the segment to be opened. * * "tli_p" is an input/output argument. WALRead() uses it to pass the * timeline in which the new segment should be found, but the callback can * use it to return the TLI that it actually opened. */ WALSegmentOpenCB segment_open; My reading of this is that page_read isn't ever supposed to error out - yet we do it all over (e.g. WALDumpReadPage(), XLogPageRead()) and that segment_open is supposed to error out - but we don't even use segment_open for xlogrecovery.c. Who is supposed to error out then? And why is it ok for the read callbacks to error out directly, if they're supposed to return -1 in case of failure? Somewhat of an aside: It also seems "The callback shall sleep, if necessary, to wait for the requested bytes to become available." is outdated, given that we *explicitly* don't do so in some cases and support that via XLREAD_WOULDBLOCK? I dug through recent changes, expecting to find the problem. But uh, no. I reproduced this in 9.4, and I think the behaviour might have been introduced in 9.3 (didn't have a build of that around, hence didn't test that), as part of: commit abf5c5c9a4f Author: Heikki Linnakangas <heikki.linnakan...@iki.fi> Date: 2013-02-22 11:43:04 +0200 If recovery.conf is created after "pg_ctl stop -m i", do crash recovery. Before that, XLogPageRead() called XLogFileReadAnyTLI() with the "incoming" emode, when in crash recovery -> PANIC in this case. After that commit, the call to XLogFileReadAnyTLI() was moved to WaitForWALToBecomeAvailable(), but the emode was changed to unconditionally be DEBUG2. /* * Try to restore the file from archive, or read an existing * file from pg_xlog. */ readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource); In 9.4 the server actually came up ok after encountering the problem (destroying data in the process!), not sure where we started to bogusly initialize shared memory... Greetings, Andres Freund