Greg Smith wrote: > On 11/11/12 2:56 PM, Jeff Davis wrote: > >We could have a separate utility, pg_checksums, that can > >alter the state and/or do an offline verification. And initdb would take > >an option that would start everything out fully protected with > >checksums. > > Adding an initdb option to start out with everything checksummed > seems an uncontroversial good first thing to have available.
+1 > Won't a pg_checksums program just grow until it looks like a limited > version of vacuum though? It's going to iterate over most of the > table; it needs the same cost controls as autovacuum (and to respect > the load of concurrent autovacuum work) to keep I/O under control; > and those cost control values might change if there's a SIGHUP to > reload parameters. It looks so much like vacuum that I think there > needs to be a really compelling reason to split it into something > new. Why can't this be yet another autovacuum worker that does its > thing? I agree that much of the things it's gonna do are going to be pretty much the same as vacuum, but vacuum does so many other things that I think it should be kept separate. Sure, we can make it be invoked from autovacuum in background according to some (yet to be devised) scheduling heuristics. But I don't see that it needs to share any vacuum code. A couple of thoughts about autovacuum: it's important to figure out whether checksumming can run concurrently with vacuuming the same table; if not, which one defers to the other in case of lock conflict. Also, can checksumming be ignored by concurrent transactions when computing Xmin (I don't see any reason not to ...) > One of the really common cases I was expecting here is that > conversions are done by kicking off a slow background VACUUM > CHECKSUM job that might run in pieces. I was thinking of an > approach like this: > > -Initialize a last_checked_block value for each table > -Loop: > --Grab the next block after the last checked one > --When on the last block of the relation, grab an exclusive lock to > protect against race conditions with extension Note that we have a separate lock type for relation extension, so we can use that to avoid a conflict here. > --If it's marked as checksummed and the checksum matches, skip it > ---Otherwise, add a checksum and write it out > --When that succeeds, update last_checked_block > --If that was the last block, save some state saying the whole table > is checkedsummed "Some state" can be a pg_class field that's updated per heap_inplace_update. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers