On 11/12/2012 05:55 AM, Greg Smith wrote: > Adding an initdb option to start out with everything checksummed seems > an uncontroversial good first thing to have available.
+1 So the following discussion really is for a future patch extending on that initial checkpoint support. > One of the really common cases I was expecting here is that conversions > are done by kicking off a slow background VACUUM CHECKSUM job that might > run in pieces. I was thinking of an approach like this: > > -Initialize a last_checked_block value for each table > -Loop: > --Grab the next block after the last checked one > --When on the last block of the relation, grab an exclusive lock to > protect against race conditions with extension > --If it's marked as checksummed and the checksum matches, skip it > ---Otherwise, add a checksum and write it out > --When that succeeds, update last_checked_block > --If that was the last block, save some state saying the whole table is > checkedsummed Perfect, thanks. That's the rough idea I had in mind as well, written out in detail and catching the extension case. > With that logic, there is at least a forward moving pointer that removes > the uncertainty around whether pages have been updated or not. It will > keep going usefully if interrupted too. One obvious this way this can > fail is if: > > 1) A late page in the relation is updated and a checksummed page written > 2) The page is corrupted such that the "is this checksummed?" bits are > not consistent anymore, along with other damage to it > 3) The conversion process gets to this page eventually > 4) The corruption of (2) isn't detected IMO this just outlines how limited the use of the "is this checksummed" bit in the page itself is. It just doesn't catch all cases. Is it worth having that bit at all, given your block-wise approach above? It really only serves to catch corruptions to *newly* dirtied pages *during* the migration phase that *keep* that single bit set. Everything else is covered by the last_checked_block variable. Sounds narrow enough to be negligible. Then again, it's just a single bit per page... > The only guarantee I see that we can give for online upgrades is that > after a VACUUM CHECKSUM sweep is done, and every page is known to both > have a valid checksum on it and have its checksum bits set, *then* any > page that doesn't have both set bits and a matching checksum is garbage. >From that point in time on, we'd theoretically better use that bit as an additional checksum bit rather than requiring it to be set all times. Really just theoretically, I'm certainly not advocating a 33 bit checksum :-) Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers