On 2020-May-13, Peter Geoghegan wrote: > On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvhe...@2ndquadrant.com> > wrote: > > Hmm. I think we should (try to?) write code that avoids all crashes > > with production builds, but not extend that to assertion failures. > > Assertions are only a problem at all because Mark would like to write > tests that involve a selection of truly corrupt data. That's a new > requirement, and one that I have my doubts about.
I agree that this (a test tool that exercises our code against arbitrarily corrupted data pages) is not going to work as a test that all buildfarm members run -- it seems something for specialized buildfarm members to run, or even something that's run outside of the buildfarm, like sqlsmith. Obviously such a tool would not be able to run against an assertion-enabled build, and we shouldn't even try. > I would be willing to make a larger effort to avoid crashing a > backend, since that affects production. I might go to some effort to > not crash with downright adversarial inputs, for example. But it seems > inappropriate to take extreme measures just to avoid a crash with > extremely contrived inputs that will probably never occur. My sense is > that this is subject to sharply diminishing returns. Completely > nailing down hard crashes from corrupt data seems like the wrong > priority, at the very least. Pursuing that objective over other > objectives sounds like zero-risk bias. I think my initial approach for this would be to use a fuzzing tool that generates data blocks semi-randomly, then uses them as Postgres data pages somehow, and see what happens -- examine any resulting crashes and make individual judgement calls about the fix(es) necessary to prevent each of them. I expect that many such pages would be rejected as corrupt by page header checks. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services