On 2019-03-26 18:22:55 +0100, Michael Banck wrote: > Hi, > > Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, > > 1000000) g(i); > > SELECT pg_relation_size('corruptme'); > > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || > > pg_relation_filepath('corruptme'); > > ┌─────────────────────────────────────┐ > > │ ?column? │ > > ├─────────────────────────────────────┤ > > │ /srv/dev/pgdev-dev/base/13390/16384 │ > > └─────────────────────────────────────┘ > > (1 row) > > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 > > conv=notrunc > > > > Try a basebackup and see how many times it'll detect the corrupt > > data. In the vast majority of cases you're going to see checksum > > failures when reading the data for normal operation, but not when using > > basebackup (or this new tool). > > > > At the very very least this would need to do > > > > a) checks that the page is all zeroes if PageIsNew() (like > > PageIsVerified() does for the backend). That avoids missing cases > > where corruption just zeroed out the header, but not the whole page. > > b) Check that pd_lsn is between startlsn and the insertion pointer. That > > avoids accepting just about all random data. > > > > And that'd *still* be less strenuous than what normal backends > > check. And that's already not great (due to not noticing zeroed out > > data). > > I've done the above in the attached patch now. Well, literally like an > hour ago, then went jogging and came back to see you outlined about > fixing this differently in a separate thread. Still might be helpful for > the TAP test changes at least.
Sorry, I just hadn't seen much movement on this, and I'm a bit concerned about such a critical issue not being addressed. > /* > - * Only check pages which have not been > modified since the > - * start of the base backup. Otherwise, they > might have been > - * written only halfway and the checksum would > not be valid. > - * However, replaying WAL would reinstate the > correct page in > - * this case. We also skip completely new > pages, since they > - * don't have a checksum yet. > + * We skip completely new pages after checking > they are > + * all-zero, since they don't have a checksum > yet. > */ > - if (!PageIsNew(page) && PageGetLSN(page) < > startptr) > + if (PageIsNew(page)) > { > - checksum = pg_checksum_page((char *) > page, blkno + segmentno * RELSEG_SIZE); > - phdr = (PageHeader) page; > - if (phdr->pd_checksum != checksum) > + all_zeroes = true; > + pagebytes = (size_t *) page; > + for (int i = 0; i < (BLCKSZ / > sizeof(size_t)); i++) Can we please abstract the zeroeness check into a separate function to be used both by PageIsVerified() and this? > + if (!all_zeroes) > + { > + /* > + * pd_upper is zero, but the > page is not all zero. We > + * cannot run > pg_checksum_page() on the page as it > + * would throw an assertion > failure. Consider this a > + * checksum failure. > + */ I don't think the assertion failure is the relevant bit here, it's htat the page is corrupted, no? > + /* > + * Only check pages which have not been > modified since the > + * start of the base backup. Otherwise, > they might have been > + * written only halfway and the > checksum would not be valid. > + * However, replaying WAL would > reinstate the correct page in > + * this case. If the page LSN is larger > than the current redo > + * pointer then we assume a bogus LSN > due to random page header > + * corruption and do verify the > checksum. > + */ > + if (PageGetLSN(page) < startptr || > PageGetLSN(page) > GetRedoRecPtr()) I don't think GetRedoRecPtr() is the right check? Wouldn't it need to be GetInsertRecPtr()? Greetings, Andres Freund