At Thu, 14 May 2020 10:19:42 +0500, "Andrey M. Borodin" <x4...@yandex-team.ru> wrote in > >> I'm looking more at MultiXact and it seems to me that we have a race > >> condition there. > >> > >> When we create a new MultiXact we do: > >> 1. Generate new MultiXactId under MultiXactGenLock > >> 2. Record new mxid with members and offset to WAL > >> 3. Write offset to SLRU under MultiXactOffsetControlLock > >> 4. Write members to SLRU under MultiXactMemberControlLock > > > > But, don't we hold exclusive lock on the buffer through all the steps > > above? > Yes...Unless MultiXact is observed on StandBy. This could lead to observing > inconsistent snapshot: one of lockers committed tuple delete, but standby > sees it as alive.
Ah, right. I looked from GetNewMultiXactId. Actually XLOG_MULTIXACT_CREATE_ID is not protected from concurrent reference to the creating mxact id. And GetMultiXactIdMembers is considering that case. > >> When we read MultiXact we do: > >> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock > >> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we > >> sleep and goto 1 > >> 3. Retrieve members from SLRU under MultiXactMemberControlLock > >> 4. ..... what we do if there are just zeroes because step 4 is not > >> executed yet? Nothing, return empty members list. > > > > So transactions never see such incomplete mxids, I believe. > I've observed sleep in step 2. I believe it's possible to observe special > effects of step 4 too. > Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it > hits hard on Standbys: if someone is locking whole table on primary - all seq > scans on standbys follow him with MultiXactOffsetControlLock contention. GetMultiXactIdMembers believes that 4 is successfully done if 2 returned valid offset, but actually that is not obvious. If we add a single giant lock just to isolate ,say, GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency unnecessarily. Perhaps we need finer-grained locking-key for standby that works similary to buffer lock on primary, that doesn't cause confilicts between irrelevant mxids. regards. -- Kyotaro Horiguchi NTT Open Source Software Center