On Wed, Jul 20, 2022 at 8:21 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > > Hi, > > I'd like to propose to remove "whichChkpt" and "report" arguments in > ReadCheckpointRecord(). "report" is obviously useless because it's always > true, i.e., there are two callers of the function and they always specify > true as "report".
Yes, the report parameter is obvious to delete. The commit 1d919de5eb removed the only call with the report parameter as false. > "whichChkpt" indicates where the specified checkpoint location came from, > pg_control or backup_label. This information is used to log different > messages as follows. > > switch (whichChkpt) > { > case 1: > ereport(LOG, > (errmsg("invalid primary > checkpoint link in control file"))); > break; > default: > ereport(LOG, > (errmsg("invalid checkpoint > link in backup_label file"))); > break; > } > return NULL; > ... > switch (whichChkpt) > { > case 1: > ereport(LOG, > (errmsg("invalid primary > checkpoint record"))); > break; > default: > ereport(LOG, > (errmsg("invalid checkpoint > record"))); > break; > } > return NULL; > ... > > But the callers of ReadCheckpointRecord() already output different log > messages depending on where the invalid checkpoint record came from. So even > if ReadCheckpointRecord() doesn't use "whichChkpt", i.e., use the same log > message in both pg_control and backup_label cases, users can still identify > where the invalid checkpoint record came from, by reading the log message. > > Also when whichChkpt = 0, "primary checkpoint" is used in the log message and > sounds confusing because, as far as I recall correctly, we removed the > concept of primary and secondary checkpoints before. Yes, using "primary checkpoint" confuses, as we usually refer primary in the context of replication and HA. > Therefore I think that it's better to remove "whichChkpt" argument in > ReadCheckpointRecord(). > > Patch attached. Thought? How about we transform the following messages into something like below? (errmsg("could not locate a valid checkpoint record"))); after ReadCheckpointRecord() for control file cases to "could not locate valid checkpoint record in control file" (errmsg("could not locate required checkpoint record"), after ReadCheckpointRecord() for backup_label case to "could not locate valid checkpoint record in backup_label file" The above messages give more meaningful and unique info to the users. May be unrelated, IIRC, for the errors like ereport(PANIC, (errmsg("could not locate a valid checkpoint record"))); we wanted to add a hint asking users to consider running pg_resetwal to fix the issue. The error for ReadCheckpointRecord() in backup_label file cases, already gives a hint errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" ...... Adding the hint of running pg_resetwal (of course with a caution that it can cause inconsistency in the data and use it as a last resort as described in the docs) helps users and support engineers a lot to mitigate the server down cases quickly. Regards, Bharath Rupireddy.