On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote: > On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote: > > The attached patch should do the above, on top of Michael's last > > patchset. > > What you are doing here looks like a good defense in itself.
More thoughts on that, and here is a short summary of the thread. + /* Check if control file has changed */ + if (controlfile_last_updated != ControlFile->time) + { + fprintf(stderr, _("%s: control file has changed since startup\n"), progname); + exit(1); + } Actually, under the conditions discussed on this thread that Postgres is started in parallel of pg_checksums, imagine the following scenario: - pg_checksums starts to enable checksums, it reads a block to calculate its checksum, then calculates it. - Postgres has been started in parallel, writes the same block. - pg_checksums finishes the block calculation, writes back the block it has just read. - Postgres stops, some data is lost. - At the end of pg_checksums, we complain that the control file has been updated since the start of pg_checksums. I think that we should be way more noisy about this error message document properly that Postgres should not be started while checksums are enabled. Basically, I guess that it should mention that there is a risk of corruption because of this parallel operation. Hence, based on what I could read on this thread, we'd like to have the following things added to the core patch set: 1) When enabling checksums, fsync the data folder. Then update the control file, and finally fsync the control file (+ flush of the parent folder to make the whole durable). This way a host crash never actually means that we finish in an inconsistent state. 2) When checksums are disabled, update the control file, then fsync it + its parent folder. 3) Add tracking of the control file data, and complain loudly before trying to update the file if there are any inconsistencies found. 4) Document with a big-fat-red warning that postgres should not be worked on while the tool is enabling or disabling checksums. There is a part discussed about standbys and primaries with not the same checksum settings, but I commented on that in [1]. There is a secondary bug fix to prevent the tool to run if the data folder has been initialized with a block size different than what pg_checksums has been compiled with in [2]. The patch looks good, still the error message could be better per my lookup. [1]: https://www.postgresql.org/message-id/20190314002342.gc3...@paquier.xyz [2]: https://www.postgresql.org/message-id/20190313224742.ga3...@paquier.xyz Am I missing something? -- Michael
signature.asc
Description: PGP signature