On Fri, Oct 8, 2021 at 8:13 AM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > On 2021-Oct-06, Masahiko Sawada wrote: > > A customer reported that during parallel index vacuum, the oldest xmin > > doesn't advance. Normally, the calculation of oldest xmin > > (ComputeXidHorizons()) ignores xmin/xid of processes having > > PROC_IN_VACUUM flag in MyProc->statusFlags. But since parallel vacuum > > workers don’t set their statusFlags, the xmin of the parallel vacuum > > worker is considered to calculate the oldest xmin. This issue happens > > from PG13 where the parallel vacuum was introduced. I think it's a > > bug. > > Augh, yeah, I agree this is a pretty serious problem.
So is this comparable problem, which happens to be much older: https://postgr.es/m/cah2-wzkjrk556envtflmyxedw91xguwiyzvep2kp5yqt_-3...@mail.gmail.com In both cases we see bugs (or implementation deficiencies) that accidentally block ComputeXidHorizons() for hours, when that isn't truly necessary. Practically all users are not sure of whether or not VACUUM behaves like a long running transaction already, in general, so we shouldn't be surprised that it takes so long for us to hear about issues like this. I think that we should try to find a way of making this whole class of problems easier to identify in production. There needs to be greater visibility into what process holds back VACUUM, and how long that lasts -- something easy to use, and *obvious*. That would be a very useful feature in general. It would also make catching these issues early far more likely. It's just *not okay* that you have to follow long and complicated instructions [1] to get just some of this information. How can something this important just be an afterthought? [1] https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/ -- Peter Geoghegan