> On 26 May 2026, at 20:12, Tomas Vondra <[email protected]> wrote:

> I suppose this means we should not be updating the checksum state
> without emitting the barrier? I think all other places do that.

Good catch, it's indeed a bug, any state change must emit a procsignalbarrier
to maintain cluster consistency.  I ended up writing a test for this very case
as well.

> I'm still not sure if it really is an issue or just an annoyance,
> because I've not been able to find a case where it'd lead to checksum
> failures (or obviously incorrect final state after recovery).

I've tried to get it to reach an incorrect end state but failed, but I do agree
that maybe we need an improved locking protocol around state updates.  Need to
spend some more time thinking about this.

> I still don't understand why this needs DELAY_CHKPT_START ...

Having stared at this for some time, and going over old threads, I think this
is a mistake.  AFAICT though it cannot cause any error, so I'd lean towards
erring on the safe side by leaving as is and looking at removing in 20.  What
do you think?

> I also noticed a couple minor comment issues, per attached patch (this
> may need pgindent).

I ended up splitting this into two, one for the comment fixes and one for the
data type change.

I propose applying the three patches below to v19 to fix the promotion issue
before we wrap beta1.

--
Daniel Gustafsson

Attachment: 0003-Use-correct-datatype-for-PID.patch
Description: Binary data

Attachment: 0002-Improve-comments-in-online-checksums-code.patch
Description: Binary data

Attachment: 0001-Fix-checksum-state-transition-during-promotion.patch
Description: Binary data

Reply via email to