On 30/10/2025 08:13, Maxim Orlov wrote:
On Tue, 28 Oct 2025 at 17:17, Heikki Linnakangas <[email protected] <mailto:[email protected]>> wrote:

    On 27/10/2025 17:54, Maxim Orlov wrote:


    If backend C looks up multixid 101 in between steps 3 and 4, it would
    read the offset incorrectly, because 'base' isn't set yet.

Hmm, maybe I miss something? We set page base on first write of any
offset on the page, not only the first one. In other words, there
should never be a case when we read an offset without a previously
defined page base. Correct me if I'm wrong:
1. Backend A assigned mxact=100, offset=1000.
2. Backend B assigned mxact=101, offset=1010.
3. Backend B calls RecordNewMultiXact()/MXOffsetWrite() and
     set page base=1010, offset plus 0^0x80000000 bit while
     holding lock on the page.
4. Backend C looks up for the mxact=101 by calling MXOffsetRead()
     and should get exactly what he's looking for:
     base (1010) + offset (0) minus 0x80000000 bit.
5. Backend A calls RecordNewMultiXact() and sets his offset using
     existing base from step 3.

Oh I see, the 'base' is not necessarily the base offset of the first multixact on the page, it's the base offset of the first multixid that is written to the page. And the (short) offsets can be negative. That's a frighteningly clever encoding scheme. One upshot of that is that WAL redo might get construct the page with a different 'base'. I guess that works, but it scares me. Could we come up with a more deterministic scheme?

- Heikki



Reply via email to