On Tue, Nov 2, 2021 at 1:47 PM Robert Haas <robertmh...@gmail.com> wrote: > Yeah. I have only very rarely run into cases where people actually end > up needing multiple passes, but it's always something I need to rule > out as part of the troubleshooting process, and it's hard to do that > without the log_autovacuum_min_duration output. It's pretty common for > me to see cases where, for example, the I/O performed by autovacuum > read a bunch of data off disk, which shoved a bunch of hot data out of > cache, and then performance tanked really hard.
Right. Indexes are a big source of the variability IME. I agree that having to do more than one pass isn't all that common. More often it's something to do with the fact that there are 20 indexes, or the fact that they use UUID indexes. VACUUM can very easily end up dirtying far more pages than might be expected, just because of these kinds of variations. Variability in the duration of VACUUM due to issues on the heapam side tends to be due to things like the complicated relationship between work done and XIDs consumed, the fact that autovacuum scheduling makes VACUUM kick in at geometric intervals/in geometric series (the scale factor is applied to the size of the table at the end of the last autovacuum), and problems with maintaining accurate information about the number of dead tuples in the table. > Or where the vacuum > cost limit is ridiculously low relative to the table size and the > vacuums take an unreasonably long time. In those kinds of cases what > you really need to know is that there was a vacuum on a certain table, > and how long it took, and when that happened. In short: very often everything is just fine, but when things do go wrong, the number of plausible sources of trouble is vast. There is no substitute for already having reasonably good instrumentation of VACUUM in place, to show how things have changed over time. The time dimension is often very important, for experts and non-experts alike. The more success we have with improving VACUUM (e.g., the VM/freeze map stuff), the more likely it is that the notable remaining problems will be complicated and kind of weird. That's what I see. -- Peter Geoghegan