On Wed, Jul 20, 2022 at 8:21 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
>
> Hi,
>
> I'd like to propose to remove "whichChkpt" and "report" arguments in 
> ReadCheckpointRecord(). "report" is obviously useless because it's always 
> true, i.e., there are two callers of the function and they always specify 
> true as "report".

Yes, the report parameter is obvious to delete. The commit 1d919de5eb
removed the only call with the report parameter as false.

> "whichChkpt" indicates where the specified checkpoint location came from, 
> pg_control or backup_label. This information is used to log different 
> messages as follows.
>
>                 switch (whichChkpt)
>                 {
>                         case 1:
>                                 ereport(LOG,
>                                                 (errmsg("invalid primary 
> checkpoint link in control file")));
>                                 break;
>                         default:
>                                 ereport(LOG,
>                                                 (errmsg("invalid checkpoint 
> link in backup_label file")));
>                                 break;
>                 }
>                 return NULL;
>                 ...
>                 switch (whichChkpt)
>                 {
>                         case 1:
>                                 ereport(LOG,
>                                                 (errmsg("invalid primary 
> checkpoint record")));
>                                 break;
>                         default:
>                                 ereport(LOG,
>                                                 (errmsg("invalid checkpoint 
> record")));
>                                 break;
>                 }
>                 return NULL;
>                 ...
>
> But the callers of ReadCheckpointRecord() already output different log 
> messages depending on where the invalid checkpoint record came from. So even 
> if ReadCheckpointRecord() doesn't use "whichChkpt", i.e., use the same log 
> message in both pg_control and backup_label cases, users can still identify 
> where the invalid checkpoint record came from, by reading the log message.
>
> Also when whichChkpt = 0, "primary checkpoint" is used in the log message and 
> sounds confusing because, as far as I recall correctly, we removed the 
> concept of primary and secondary checkpoints before.

Yes, using "primary checkpoint" confuses, as we usually refer primary
in the context of replication and HA.

>   Therefore I think that it's better to remove "whichChkpt" argument in 
> ReadCheckpointRecord().
>
> Patch attached. Thought?

How about we transform the following messages into something like below?

(errmsg("could not locate a valid checkpoint record"))); after
ReadCheckpointRecord() for control file cases to "could not locate
valid checkpoint record in control file"
(errmsg("could not locate required checkpoint record"), after
ReadCheckpointRecord() for backup_label case to "could not locate
valid checkpoint record in backup_label file"

The above messages give more meaningful and unique info to the users.

May be unrelated, IIRC, for the errors like ereport(PANIC,
(errmsg("could not locate a valid checkpoint record"))); we wanted to
add a hint asking users to consider running pg_resetwal to fix the
issue. The error for ReadCheckpointRecord() in backup_label file
cases, already gives a hint errhint("If you are restoring from a
backup, touch \"%s/recovery.signal\" ...... Adding the hint of running
pg_resetwal (of course with a caution that it can cause inconsistency
in the data and use it as a last resort as described in the docs)
helps users and support engineers a lot to mitigate the server down
cases quickly.

Regards,
Bharath Rupireddy.


Reply via email to