Re: tests against running server occasionally fail, postgres_fdw & tenk1

Peter Geoghegan Wed, 08 Feb 2023 18:38:26 -0800

On Wed, Feb 8, 2023 at 4:29 PM Andres Freund <and...@anarazel.de> wrote:
> I find it useful information when debugging problems. Without it, the log
> doesn't tell you which index was processed when a problem started to occur. Or
> even that we were scanning indexes at all.


I guess it might have some limited value when doing some sort of
top-down debugging of the regression tests, where (say) some random
buildfarm issue is the starting point, and you don't really know what
you're looking for at all. I cannot remember ever approaching things
that way myself, though.

> > I generally don't care about the details when VACUUM runs out of space
> > for dead_items -- these days the important thing is to just avoid it
> > completely.
>
> I wonder if it'd possibly make sense to log more verbosely if we do end up
> running out of space, but not otherwise? Particularly because it's important
> to avoid multi-pass vacuums? Right now there's basically no log output until
> the vacuum finished, which in a bad situation could take days, with many index
> scan cycles. Logging enough to be able to see such things happening IMO is
> important.

That definitely could make sense, but ISTM that it should be some
totally orthogonal thing.

> > I also generally don't care about how many index tuples
> > were deleted by each index's ambulkdelete() call, since VACUUM VERBOSE
> > already shows me the number of LP_DEAD stubs encountered/removed in
> > the heap.
>
> Isn't it actually quite useful to see how many entries were removed from the
> index relative to the number of LP_DEAD removed in the heap? And relative to
> other indexes? E.g. to understand how effective killtuple style logic is?

Way less than you'd think. I'd even go so far as to say that showing
users the number of index tuples deleted by VACUUM at the level of
individual indexes could be very misleading. The simplest way to see
that this is true is with an example:

Assume that we have two indexes on the same table: A UUID v4 index,
and a traditional serial/identity column style primary key index.
Further assume that index tuple deletion does a perfect job with both
indexes, in the sense that no leaf page is ever split while the
pre-split leaf page has even one single remaining delete-safe index
tuple remaining. So the index deletion stuff is equally effective in
both indexes, in a way that could be measured exactly (by custom
instrumentation code, say). What does the instrumentation of the
number of index tuples deleted by VACUUM reveal here?

I would expect the UUID index to have *many* more index tuples deleted
by VACUUM than the traditional serial/identity column style primary
key index did in the same period (unless perhaps there was an
unrealistically uniform distribution of updates across the PK's key
space). We're talking about a 3x difference here, possibly much more.
In the case of the UUID index, we'll have needed fewer opportunistic
index deletion passes because there was naturally more free space on
each leaf page due to generic B-Tree overhead -- allowing
opportunistic index tuple deletion to be much more lazy overall,
relative to how things went with the other index. In reality we get
the same optimal outcome for each index, but
IndexBulkDeleteResult.tuples_removed suggests that just the opposite
has happened. That's just perverse.

This isn't just a cute example. If anything it *understates* the
extent to which these kinds of differences are possible. I could come
up with a case where the difference was far larger still, just by
adding a few more details. Users ought to focus on the picture over
time, and the space utilization for remaining live tuples. To a large
degree it doesn't actually matter whether it's VACUUM or opportunistic
deletion that does most of the deleting, provided it happens and is
reasonably efficient. They're two sides of the same coin.

Space utilization over time for live tuples matters most. Ideally it
can be normalized to account for the effects of these workload
differences, and things like nbtree deduplication. But even just
dividing the size of the index in pages by the number of live tuples
in the index tells me plenty, with no noise from VACUUM implementation
details.

We care about signal to noise ratio. Managing the noise is no less
important than increasing the signal. It might even be more important.

> I think we should emit most of the non-error/warning messages in vacuum with
> errhidestmt(true), errhidecontext(true) to avoid that. The error context is
> quite helpful for error messages due to corruption, cancellations and such,
> but not for messages where we already are careful to include relation names.

Agreed.

> I'd thus like to:
>
> 1) Use errhidestmt(true), errhidecontext(true) for vacuum
>    ereport(non-error-or-warning)

Makes sense.

> 2) Add a message to lazy_vacuum() or lazy_vacuum_all_indexes(), that includes
>    - num_index_scans
>    - how many indexes we'll scan
>    - how many dead tids we're working on removing

It's not obvious how you can know the number of index scans at this
point. Well, it depends on how you define "index scan". It's true that
the number shown as "index scans" by VACUUM VERBOSE could be shown
here instead, earlier on. However, there are cases where VACUUM
VERBOSE shows 0 index scans, but also shows that it has scanned one or
more indexes (usually not all the indexes, just a subset). This
happens whenever an amvacuumcleanup() routine decides it needs to scan
an index to do stuff like recycle previously deleted pages.

After 14, nbtree does a pretty good job of avoiding that when it
doesn't really make sense. But it's still possible. It's also quite
common with GIN indexes, I think -- in fact it can be quite painful
there. This is a good thing for performance, of course, but it also
makes VACUUM VERBOSE show information that makes sense to users, since
things actually happen in a way that makes a lot more sense. I'm quite
happy about the fact that the new VACUUM VERBOSE allows users to
mostly ignore obscure details like whether an index was scanned by
amvacuumcleanup() or by ambulkdelete() -- stuff that basically nobody
understands. That seems worth preserving.

> 3) Add a log at the end of lazy_vacuum_heap_rel() that's logged only (or more
>    verbosely) when lazy_vacuum() was run due to running out of space
>
>    If we just do the heap scan once, this can be easily inferred from the
>    other messages.

I don't mind adding something that makes it easier to notice the
number of index scans early. However, the ambulkdelete vs
amvacuumcleanup index scan situation needs more thought.

> 4) When we run out of space for dead tids, increase the log level for the rest
>    of vacuum. It's sufficiently bad if that happens that we really ought to
>    include it in the log by default.

That makes sense. Same could be done when the failsafe triggers.

> 2) and 3) together allow to figure out how long individual scan / vacuum
> phases are taking. 1) should reduce log verbosity sufficiently to make it
> easier to actually read the output.

It's not just verbosity. It's also showing the same details
consistently for the same table over time, so that successive VACUUMs
can be compared to each other easily. The worst thing about the old
VACUUM VERBOSE was that it was inconsistent about how much it showed
in a way that made little sense, based on low level details like the
order that things happen in, not the order that actually made sense.

As I said, I don't mind making VACUUM VERBOSE behave a little bit more
like a progress indicator, which is how it used to work. Maybe I went
a little too far in the direction of neatly summarizing the whole
VACUUM operation in one go. But I doubt that I went too far with it by
all that much. Overall, the old VACUUM VERBOSE was extremely hard to
use, and was poorly maintained -- let's not go back to that. (See
commit ec196930 for evidence of how sloppily it was maintained.)


--
Peter Geoghegan

Re: tests against running server occasionally fail, postgres_fdw & tenk1

Reply via email to