On Mon, Mar 28, 2022 at 1:23 PM Peter Geoghegan <p...@bowt.ie> wrote: > I doubt that the patch's use of pg_memory_barrier() in places like > _bt_killitems() is correct.
I also doubt that posting list splits are handled correctly. If there is an LP_DEAD bit set on a posting list on the primary, and we need to do a posting list split against the posting tuple, we need to be careful -- we cannot allow our new TID to look like it's LP_DEAD immediately, before our transaction even commits/aborts. We cannot swap out our new TID with an old LP_DEAD TID, because we'll think that our new TID is LP_DEAD when we shouldn't. This is currently handled by having the inserted do an early round of simple/LP_DEAD index tuple deletion, using the "simpleonly" argument from _bt_delete_or_dedup_one_page(). Obviously the primary cannot be expected to know that one of its standbys has independently set a posting list's LP_DEAD bit, though. At the very least you need to teach the posting list split path in btree_xlog_insert() about all this -- it's not necessarily sufficient to clear LP_DEAD bits in the index AM's fpi_mask() routine. Overall, I think that this patch has serious design flaws, and that this issue is really just a symptom of a bigger problem. -- Peter Geoghegan