On 2019-03-26 18:22:55 +0100, Michael Banck wrote:
> Hi,
>
> Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1,
> > 1000000) g(i);
> > SELECT pg_relation_size('corruptme');
> > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' ||
> > pg_relation_filepath('corruptme');
> > ┌─────────────────────────────────────┐
> > │ ?column? │
> > ├─────────────────────────────────────┤
> > │ /srv/dev/pgdev-dev/base/13390/16384 │
> > └─────────────────────────────────────┘
> > (1 row)
> > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1
> > conv=notrunc
> >
> > Try a basebackup and see how many times it'll detect the corrupt
> > data. In the vast majority of cases you're going to see checksum
> > failures when reading the data for normal operation, but not when using
> > basebackup (or this new tool).
> >
> > At the very very least this would need to do
> >
> > a) checks that the page is all zeroes if PageIsNew() (like
> > PageIsVerified() does for the backend). That avoids missing cases
> > where corruption just zeroed out the header, but not the whole page.
> > b) Check that pd_lsn is between startlsn and the insertion pointer. That
> > avoids accepting just about all random data.
> >
> > And that'd *still* be less strenuous than what normal backends
> > check. And that's already not great (due to not noticing zeroed out
> > data).
>
> I've done the above in the attached patch now. Well, literally like an
> hour ago, then went jogging and came back to see you outlined about
> fixing this differently in a separate thread. Still might be helpful for
> the TAP test changes at least.
Sorry, I just hadn't seen much movement on this, and I'm a bit concerned
about such a critical issue not being addressed.
> /*
> - * Only check pages which have not been
> modified since the
> - * start of the base backup. Otherwise, they
> might have been
> - * written only halfway and the checksum would
> not be valid.
> - * However, replaying WAL would reinstate the
> correct page in
> - * this case. We also skip completely new
> pages, since they
> - * don't have a checksum yet.
> + * We skip completely new pages after checking
> they are
> + * all-zero, since they don't have a checksum
> yet.
> */
> - if (!PageIsNew(page) && PageGetLSN(page) <
> startptr)
> + if (PageIsNew(page))
> {
> - checksum = pg_checksum_page((char *)
> page, blkno + segmentno * RELSEG_SIZE);
> - phdr = (PageHeader) page;
> - if (phdr->pd_checksum != checksum)
> + all_zeroes = true;
> + pagebytes = (size_t *) page;
> + for (int i = 0; i < (BLCKSZ /
> sizeof(size_t)); i++)
Can we please abstract the zeroeness check into a separate function to
be used both by PageIsVerified() and this?
> + if (!all_zeroes)
> + {
> + /*
> + * pd_upper is zero, but the
> page is not all zero. We
> + * cannot run
> pg_checksum_page() on the page as it
> + * would throw an assertion
> failure. Consider this a
> + * checksum failure.
> + */
I don't think the assertion failure is the relevant bit here, it's htat
the page is corrupted, no?
> + /*
> + * Only check pages which have not been
> modified since the
> + * start of the base backup. Otherwise,
> they might have been
> + * written only halfway and the
> checksum would not be valid.
> + * However, replaying WAL would
> reinstate the correct page in
> + * this case. If the page LSN is larger
> than the current redo
> + * pointer then we assume a bogus LSN
> due to random page header
> + * corruption and do verify the
> checksum.
> + */
> + if (PageGetLSN(page) < startptr ||
> PageGetLSN(page) > GetRedoRecPtr())
I don't think GetRedoRecPtr() is the right check? Wouldn't it need to be
GetInsertRecPtr()?
Greetings,
Andres Freund