On 1/8/24 2:10 PM, Robert Haas wrote:
On Fri, Jan 5, 2024 at 3:57 PM Andres Freund <and...@anarazel.de> wrote:
I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

These days the AM is often involved with that, via
table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
happen before physically removing the already-marked-killed index entries. We
can't rely on being able to actually prune the heap page at that point, there
might be other backends pinning it, but often we will be able to. If we were
to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
index again during "individual tuple pruning", if the to-be-marked-unused heap
tuple is one of the tuples passed to heap_index_delete_tuples(). Which
presumably will be very commonly the case.

At least for nbtree, we are much more aggressive about marking index entries
as killed, than about actually removing the index entries. "individual tuple
pruning" would have to look for killed-but-still-present index entries, not
just for "live" entries.

I don't want to derail this thread, but I don't really see what you
have in mind here. The first paragraph sounds like you're imagining
that while pruning the index entries we might jump over to the heap
and clean things up there, too, but that seems like it wouldn't work
if the table has more than one index. I thought you were talking about
starting with a heap tuple and bouncing around to every index to see
if we can find index pointers to kill in every one of them. That
*could* work out, but you only need one index to have been
opportunistically cleaned up in order for it to fail to work out.
There might well be some workloads where that's often the case, but
the regressions in the workloads where it isn't the case seem like
they would be rather substantial, because doing an extra lookup in
every index for each heap tuple visited sounds pricey.

The idea of probing indexes for tuples that are now dead has come up in the past, and the concern has always been whether it's actually safe to do so. An obvious example is an index on a function and now the function has changed so you can't reliably determine if a particular tuple is present in the index. That's bad enough during an index scan, but potentially worse while doing heeap cleanup. Given that operators are functions, this risk exists to some degree in even simple indexes.

Depending on the gains this might still be worth doing, at least for some cases. It's hard to conceive of this breaking for indexes on integers, for example. But we'd still need to be cautious.
--
Jim Nasby, Data Architect, Austin TX



Reply via email to