Re: Various bugs if segment containing redo pointer does not exist

Andres Freund Mon, 23 Oct 2023 17:44:13 -0700

Hi,

On 2023-10-23 16:21:45 -0700, Andres Freund wrote:
> 1) For some reason I haven't yet debugged, the ReadRecord(PANIC) in
>    PerformWalRecovery() doesn't PANIC, but instead just returns NULL
>
>    We *do* output a DEBUG message, but well, that's insufficient.


The debug is from this backtrace:

#0  XLogFileReadAnyTLI (segno=6476, emode=13, source=XLOG_FROM_PG_WAL) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:4291
#1  0x000055d7b3949db0 in WaitForWALToBecomeAvailable (RecPtr=108649259008, 
randAccess=true, fetching_ckpt=false, tliRecPtr=108665421680, replayTLI=1,
    replayLSN=108665421680, nonblocking=false) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3697
#2  0x000055d7b39494ff in XLogPageRead (xlogreader=0x55d7b472c470, 
targetPagePtr=108649250816, reqLen=8192, targetRecPtr=108665421680,
    readBuf=0x55d7b47ba5d8 "\024\321\005") at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3278
#3  0x000055d7b3941bb1 in ReadPageInternal (state=0x55d7b472c470, 
pageptr=108665413632, reqLen=8072)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:1014
#4  0x000055d7b3940f43 in XLogDecodeNextRecord (state=0x55d7b472c470, 
nonblocking=false)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:571
#5  0x000055d7b3941a41 in XLogReadAhead (state=0x55d7b472c470, 
nonblocking=false) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:947
#6  0x000055d7b393f5fa in XLogPrefetcherNextBlock (pgsr_private=94384934340072, 
lsn=0x55d7b47cfeb8)
    at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:496
#7  0x000055d7b393efcd in lrq_prefetch (lrq=0x55d7b47cfe88) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:256
#8  0x000055d7b393f190 in lrq_complete_lsn (lrq=0x55d7b47cfe88, lsn=0) at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:294
#9  0x000055d7b39401ba in XLogPrefetcherReadRecord (prefetcher=0x55d7b47bc5e8, 
errmsg=0x7ffc23505920)
    at 
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:1041
#10 0x000055d7b3948ff8 in ReadRecord (xlogprefetcher=0x55d7b47bc5e8, emode=23, 
fetching_ckpt=false, replayTLI=1)
    at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3078
#11 0x000055d7b3946749 in PerformWalRecovery () at 
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:1640


The source of the emode=13=DEBUG2 is that that's hardcoded in
WaitForWALToBecomeAvailable(). I guess the error ought to come from
XLogPageRead(), but all that happens is this:

                        case XLREAD_FAIL:
                                if (readFile >= 0)
                                        close(readFile);
                                readFile = -1;
                                readLen = 0;
                                readSource = XLOG_FROM_ANY;
                                return XLREAD_FAIL;

which *does* error out for some other failures:
                        errno = save_errno;
                        ereport(emode_for_corrupt_record(emode, targetPagePtr + 
reqLen),
                                        (errcode_for_file_access(),
                                         errmsg("could not read from WAL 
segment %s, LSN %X/%X, offset %u: %m",
                                                        fname, 
LSN_FORMAT_ARGS(targetPagePtr),
                                                        readOff)));

but not for a file that couldn't be opened. Which wouldn't have to be due to
ENOENT, could also be EACCESS...


xlogreader has undergone a fair bit of changes in the last few releases. As of
now, I can't really understand who is responsible for reporting what kind of
error.

        /*
         * Data input callback
         *
         * This callback shall read at least reqLen valid bytes of the xlog page
         * starting at targetPagePtr, and store them in readBuf.  The callback
         * shall return the number of bytes read (never more than XLOG_BLCKSZ), 
or
         * -1 on failure.  The callback shall sleep, if necessary, to wait for 
the
         * requested bytes to become available.  The callback will not be 
invoked
         * again for the same page unless more than the returned number of bytes
         * are needed.
         *
         * targetRecPtr is the position of the WAL record we're reading.  
Usually
         * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
         * to read and verify the page or segment header, before it reads the
         * actual WAL record it's interested in.  In that case, targetRecPtr can
         * be used to determine which timeline to read the page from.
         *
         * The callback shall set ->seg.ws_tli to the TLI of the file the page 
was
         * read from.
         */
        XLogPageReadCB page_read;

        /*
         * Callback to open the specified WAL segment for reading.  
->seg.ws_file
         * shall be set to the file descriptor of the opened segment.  In case 
of
         * failure, an error shall be raised by the callback and it shall not
         * return.
         *
         * "nextSegNo" is the number of the segment to be opened.
         *
         * "tli_p" is an input/output argument. WALRead() uses it to pass the
         * timeline in which the new segment should be found, but the callback 
can
         * use it to return the TLI that it actually opened.
         */
        WALSegmentOpenCB segment_open;

My reading of this is that page_read isn't ever supposed to error out - yet we
do it all over (e.g. WALDumpReadPage(), XLogPageRead()) and that segment_open
is supposed to error out - but we don't even use segment_open for
xlogrecovery.c.  Who is supposed to error out then?  And why is it ok for the
read callbacks to error out directly, if they're supposed to return -1 in case
of failure?

Somewhat of an aside: It also seems "The callback shall sleep, if necessary,
to wait for the requested bytes to become available." is outdated, given that
we *explicitly* don't do so in some cases and support that via
XLREAD_WOULDBLOCK?


I dug through recent changes, expecting to find the problem. But uh, no. I
reproduced this in 9.4, and I think the behaviour might have been introduced
in 9.3 (didn't have a build of that around, hence didn't test that), as part
of:

commit abf5c5c9a4f
Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date:   2013-02-22 11:43:04 +0200

    If recovery.conf is created after "pg_ctl stop -m i", do crash recovery.

Before that, XLogPageRead() called XLogFileReadAnyTLI() with the "incoming"
emode, when in crash recovery -> PANIC in this case. After that commit, the
call to XLogFileReadAnyTLI() was moved to WaitForWALToBecomeAvailable(), but
the emode was changed to unconditionally be DEBUG2.

                                /*
                                 * Try to restore the file from archive, or 
read an existing
                                 * file from pg_xlog.
                                 */
                                readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
                                                currentSource == 
XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
                                                                                
 currentSource);

In 9.4 the server actually came up ok after encountering the problem
(destroying data in the process!), not sure where we started to bogusly
initialize shared memory...

Greetings,

Andres Freund

Re: Various bugs if segment containing redo pointer does not exist

Reply via email to