> 14 мая 2020 г., в 06:25, Kyotaro Horiguchi <horikyota....@gmail.com>
> написал(а):
>
> At Wed, 13 May 2020 23:08:37 +0500, "Andrey M. Borodin"
> <x4...@yandex-team.ru> wrote in
>>
>>
>>> 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4...@yandex-team.ru>
>>> написал(а):
>>>
>>> I've went ahead and created 3 patches:
>>> 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
>>> 2. Reduce locking level to shared on read of MultiXactId members
>>> 3. Configurable cache size
>>
>> I'm looking more at MultiXact and it seems to me that we have a race
>> condition there.
>>
>> When we create a new MultiXact we do:
>> 1. Generate new MultiXactId under MultiXactGenLock
>> 2. Record new mxid with members and offset to WAL
>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>> 4. Write members to SLRU under MultiXactMemberControlLock
>
> But, don't we hold exclusive lock on the buffer through all the steps
> above?
Yes...Unless MultiXact is observed on StandBy. This could lead to observing
inconsistent snapshot: one of lockers committed tuple delete, but standby sees
it as alive.
>> When we read MultiXact we do:
>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we
>> sleep and goto 1
>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>> 4. ..... what we do if there are just zeroes because step 4 is not executed
>> yet? Nothing, return empty members list.
>
> So transactions never see such incomplete mxids, I believe.
I've observed sleep in step 2. I believe it's possible to observe special
effects of step 4 too.
Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it
hits hard on Standbys: if someone is locking whole table on primary - all seq
scans on standbys follow him with MultiXactOffsetControlLock contention.
It looks like this:
0x00007fcd56896ff7 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffd83376fe0) at
../sysdeps/unix/sysv/linux/select.c:41
#0 0x00007fcd56896ff7 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffd83376fe0) at
../sysdeps/unix/sysv/linux/select.c:41
#1 0x000056186e0d54bd in pg_usleep (microsec=microsec@entry=1000) at
./build/../src/port/pgsleep.c:56
#2 0x000056186dd5edf2 in GetMultiXactIdMembers (from_pgupgrade=0 '\000',
onlyLock=<optimized out>, members=0x7ffd83377080, multi=3106214809) at
./build/../src/backend/access/transam/multixact.c:1370
#3 GetMultiXactIdMembers () at
./build/../src/backend/access/transam/multixact.c:1202
#4 0x000056186dd2d2d9 in MultiXactIdGetUpdateXid (xmax=<optimized out>,
t_infomask=<optimized out>) at ./build/../src/backend/access/heap/heapam.c:7039
#5 0x000056186dd35098 in HeapTupleGetUpdateXid
(tuple=tuple@entry=0x7fcba3b63d58) at
./build/../src/backend/access/heap/heapam.c:7080
#6 0x000056186e0cd0f8 in HeapTupleSatisfiesMVCC (htup=<optimized out>,
snapshot=0x56186f44a058, buffer=230684) at
./build/../src/backend/utils/time/tqual.c:1091
#7 0x000056186dd2d922 in heapgetpage (scan=scan@entry=0x56186f4c8e78,
page=page@entry=3620) at ./build/../src/backend/access/heap/heapam.c:439
#8 0x000056186dd2ea7c in heapgettup_pagemode (key=0x0, nkeys=0,
dir=ForwardScanDirection, scan=0x56186f4c8e78) at
./build/../src/backend/access/heap/heapam.c:1034
#9 heap_getnext (scan=scan@entry=0x56186f4c8e78,
direction=direction@entry=ForwardScanDirection) at
./build/../src/backend/access/heap/heapam.c:1801
#10 0x000056186de84f51 in SeqNext (node=node@entry=0x56186f4a4f78) at
./build/../src/backend/executor/nodeSeqscan.c:81
#11 0x000056186de6a3f1 in ExecScanFetch (recheckMtd=0x56186de84ef0
<SeqRecheck>, accessMtd=0x56186de84f20 <SeqNext>, node=0x56186f4a4f78) at
./build/../src/backend/executor/execScan.c:97
#12 ExecScan (node=0x56186f4a4f78, accessMtd=0x56186de84f20 <SeqNext>,
recheckMtd=0x56186de84ef0 <SeqRecheck>) at
./build/../src/backend/executor/execScan.c:164
Best regards, Andrey Borodin.