Hello, Peter. Thanks for your review!
> I doubt that the patch's use of pg_memory_barrier() in places like > _bt_killitems() is correct. There is no way to know for sure if this > novel new lockless algorithm is correct or not, since it isn't > explained anywhere. The memory barrier is used only to ensure memory ordering in case of clearing LP_DEAD bits. Just to make sure the flag allowing the use LP_DEAD is seen AFTER bits are cleared. Yes, it should be described in more detail. The flapping test is one added in the patch and not related to memory ordering. I have already tried to make it stable once before, but it depends on minRecoveryLSN propagation. I’ll think about how to make it stable. > If there is an LP_DEAD bit set on a posting list on the primary, and > we need to do a posting list split against the posting tuple, we need > to be careful -- we cannot allow our new TID to look like it's LP_DEAD > immediately, before our transaction even commits/aborts. We cannot > swap out our new TID with an old LP_DEAD TID, because we'll think that > our new TID is LP_DEAD when we shouldn't. Oh, good catch! I was thinking it is safe to have additional hint bits on primary, but it seems like no. BTW I am wondering if it is possible to achieve the same situation by pg_rewind and standby promotion… > Overall, I think that this patch has serious design flaws, and that > this issue is really just a symptom of a bigger problem. Could you please advise me on something? The ways I see: * give up :) * try to fix this concept * go back to concept with LP_DEAD horizon WAL and optional cancellation * try to limit scope on “allow standby to use LP_DEAD set on primary in some cases” (by marking something in btree page probably) * look for some new way Best regards, Michail.