Hello, Peter.

Thanks for your review!

> I doubt that the patch's use of pg_memory_barrier() in places like
> _bt_killitems() is correct. There is no way to know for sure if this
> novel new lockless algorithm is correct or not, since it isn't
> explained anywhere.

The memory barrier is used only to ensure memory ordering in case of
clearing LP_DEAD bits. Just to make sure the flag allowing the use
LP_DEAD is seen AFTER bits are cleared.
Yes, it should be described in more detail.
The flapping test is one added in the patch and not related to memory
ordering. I have already tried to make it stable once before, but it
depends on minRecoveryLSN propagation. I’ll think about how to make it
stable.

> If there is an LP_DEAD bit set on a posting list on the primary, and
> we need to do a posting list split against the posting tuple, we need
> to be careful -- we cannot allow our new TID to look like it's LP_DEAD
> immediately, before our transaction even commits/aborts. We cannot
> swap out our new TID with an old LP_DEAD TID, because we'll think that
> our new TID is LP_DEAD when we shouldn't.

Oh, good catch! I was thinking it is safe to have additional hint bits
on primary, but it seems like no. BTW I am wondering if it is possible
to achieve the same situation by pg_rewind and standby promotion…

> Overall, I think that this patch has serious design flaws, and that
> this issue is really just a symptom of a bigger problem.

Could you please advise me on something? The ways I see:
* give up :)
* try to fix this concept
* go back to concept with LP_DEAD horizon WAL and optional cancellation
* try to limit scope on “allow standby to use LP_DEAD set on primary
in some cases” (by marking something in btree page probably)
* look for some new way

Best regards,
Michail.


Reply via email to