On Tue, Jul 14, 2020 at 12:28 AM Peter Geoghegan <p...@bowt.ie> wrote:
> On Mon, Jul 13, 2020 at 2:12 PM Robert Haas <robertmh...@gmail.com> wrote: > > 1. There's nothing to identify the tuple that has the problem, and no > > way to know how many more of them there might be. Back-patching > > b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first > > part of this. > > I am in favor of backpatching such changes in cases where senior > community members feel that it could help with hypothetical > undiscovered data corruption issues -- if they're willing to take > responsibility for the change. It certainly wouldn't be the first > time. A "defense in depth" mindset seems like the right one when it > comes to data corruption bugs. Early detection is really important. > > > Moreover, not everyone is as > > interested in an extended debugging exercise as they are in getting > > the system working again, and VACUUM failing repeatedly is a pretty > > serious problem. > > That's absolutely consistent with my experience. Most users want to > get back to business as usual now, while letting somebody else do the > hard work of debugging. > Also even if you do trace the problem you still have to recover. And sometimes I have found latent corruption from times when dbs were running on older versions and older servers, making debugging largely a futile exercise. > > > Therefore, one of my colleagues has - at my request - created a couple > > of functions called heap_force_kill() and heap_force_freeze() which > > take an array of TIDs. > > > So I have these questions: > > > > - Do people think it would me smart/good/useful to include something > > like this in PostgreSQL? > > I'm in favor of it. > +1 Would be worth extending it with some functions to grab rows that have various TOAST oids too. > > > - If so, how? I would propose a new contrib module that we back-patch > > all the way, because the VACUUM errors were back-patched all the way, > > and there seems to be no advantage in making people wait 5 years for a > > new version that has some kind of tooling in this area. > > I'm in favor of it being *possible* to backpatch tooling that is > clearly related to correctness in a fundamental way. Obviously this > would mean that we'd be revising our general position on backpatching > to allow some limited exceptions around corruption. I'm not sure that > this meets that standard, though. It's hardly something that we can > expect all that many users to be able to use effectively. > > I may be biased, but I'd be inclined to permit it in the case of > something like amcheck, or pg_visibility, on the grounds that they're > more or less the same as the new VACUUM errcontext instrumentation you > mentioned. The same cannot be said of something like this new > heap_force_kill() stuff. > > > - Any ideas for additional things we should include, or improvements > > on the sketch above? > > Clearly you should work out a way of making it very hard to > accidentally (mis)use. For example, maybe you make the functions check > for the presence of a sentinel file in the data directory. > Agreed. > > > -- > Peter Geoghegan > > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin