On 3/13/26 12:33, Fujii Masao wrote: > On Fri, Mar 13, 2026 at 8:12 PM Tomas Vondra <[email protected]> wrote: >> >> On 3/5/26 21:25, Andres Freund wrote: >>> Hi, >>> >>> On 2026-03-05 10:52:03 -0800, Noah Misch wrote: >>>> On Thu, Mar 05, 2026 at 12:10:11PM -0500, Andres Freund wrote: >>>>> Tomas encountered a crash with the index prefetching patchset. One of the >>>>> patches included therein is a generalization of the gistGetFakeLSN() >>>>> mechanism, which is then used by other indexes as well. That triggered an >>>>> occasional, hard to locally reproduce, ERROR or PANIC in CI, about >>>>> >>>>> ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only >>>>> to 0/01BD2000 >>>> >>>>> To be safe, this code would need to use a version of GetXLogInsertRecPtr() >>>>> that does use XLogBytePosToEndRecPtr() instead of XLogBytePosToRecPtr(). >>>> >>>> I agree. Thanks for diagnosing it. Feel free to move forward with that >>>> strategy, or let me know if you'd like me to do it. >>> >>> I'd appreciate if you could do it. >>> >> >> Here's a fix for master (and backpatching). It introduces a new function >> GetXLogInsertEndRecPtr() and then uses that in gistGetFakeLSN(). I still >> need to test this a bit more, but it did fix the issue in our dev branch >> (where we saw regular failures). So I'm 99% sure it's fine. >> >> After writing the fix I had the idea to grep for GetXLogInsertRecPtr >> calls that might have similar issue (being passed to XLogFlush), and >> sure enough walsender does this: >> >> XLogFlush(GetXLogInsertRecPtr()); >> >> Which AFAICS has the same issue, right? Funnily enough, this is a very >> new call, from 2026/03/06. Before 6eedb2a5fd88 walsender might flush not >> far enough, now it may be flushing too far ;-) AFAIK it should call the >> same GetXLogInsertEndRecPtr() once we have it. > > Yes, this is already being discussed at [1], and Anthonin has also proposed > a patch there that adds GetXLogInsertEndRecPtr(). >
Thanks, I didn't notice the other thread, to much stuff happening.
I've now pushed the fix (and backpatched all the way to 14), so it
should be possible to use the new function in walsender too. That should
do the trick, I think.
It took me quite a bit of time to reproduce the failure well enough to
make sure the fix is correct. I was hoping it'd allow me to write a
regression test for this bug (and similar ones), but that turned out to
be way trickier because it requires a very precise sequence of WAL
records, aligned just the right way.
I reproduced it on three different platforms (x86-64, 32/64-bit arm),
and every platform requires *completely different* parameters (number of
rows inserted, etc.). On some platforms I had to disable hint bits or
FPWs, etc.
It was good enough for validating the fix, but I gave up on the
regression test. A single extra WAL record breaks the test - an
unexpected hint bit, an autovacuum run updating something, or even a
tiny change from an artibrary commit, and the test stops working.
I'm actually quite amazed it started failing so reliably on our branch
with prefetching. But it only ever failed on CI, which just shows how
finicky the test would be.
I'm attaching the script I used (derived from 018_wal_optimize.tap). To
reproduce the issue you'll have to experiment with the various
parameters (number of rows, FPW, hint bits). What helped me was patching
gistGetFakeLSN() to show calls where currlsn points right after page
header, and using pg_waldump to check there's nothing between the last
ASSIGN_LSN and COMMIT.
After I gave up on a regression test, I had the idea we might add an
assert to XLogFlush() to check we're not flushing LSNs in the page
header. So something like:
Assert((record % XLOG_BLCKSZ == 0) ||
(record % XLOG_BLCKSZ > SizeOfXLogShortPHD));
Surprisingly, that fails because GiST decided to use "1" as a special
LSN during build.
#define GistBuildLSN ((XLogRecPtr) 1)
Of course, GetVictimBuffer() is unaware of that and ends up flushing
this buffers with XLogFlush(record=1). I wonder if this might lead to
some funny issues elsewhere, in code that uses LSN in some way. Backups,
checksums, ... that kind of stuff. I wonder if other index AMs do more
stuff like this.
With (record == 1) added to the assert it seems to work, or at least it
passes check-world for me. But I decided not to include it in the fix.
regards
--
Tomas Vondra
repro.sh
Description: application/shellscript
