On 01/09/2025 10:18 PM, Peter Geoghegan wrote:
On Mon, Sep 1, 2025 at 3:04 PM Peter Geoghegan <[email protected]> wrote:
There's just no reason to think that we'd ever be able to tie back one
of these LOG messages from VACUUM to the problem within _bt_split.
There's too many other forms of corruption that might result in VACUUM
logging this same error (e.g., breaking changes to a glibc collation).
Thinking about this some more, I guess it's generally fairly unlikely
that VACUUM would actually even attempt to delete such a page. The
only reason why it happens with Konstantin's test case is because the
whole inserting transaction aborts, leaving behind many garbage tuples
that VACUUM will remove, leading to an empty page. Without that,
VACUUM won't think to even try to delete a left over junk right
sibling page.


But sooner or later vacuum will be called for this index and will traverse this page, will not it? There is not other way to reuse this page without deleting it or I am missing something?



An important case where this weakness will make life worse for users
is a checksum failure against the existing right sibling page -- since
those are not once off, transient errors (unlike, say, OOMs). Once you
have an index page with a bad checksum, there's a decent chance that
the application will attempt to insert onto the page to the immediate
left of that bad page. That'll trigger a split, sooner or later.
Also rethinking this aspect: a checksum failure probably *isn't* going
to make much difference. Since that'll also cause bigger problems for
VACUUM than logging one of these "failed to re-find parent key"
messages.


But vacuum is not just logging this message. It throws error which means that vacuum for this relation will be performed any more.




Reply via email to