> 14 мая 2020 г., в 06:25, Kyotaro Horiguchi <horikyota....@gmail.com> 
> написал(а):
> 
> At Wed, 13 May 2020 23:08:37 +0500, "Andrey M. Borodin" 
> <x4...@yandex-team.ru> wrote in 
>> 
>> 
>>> 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4...@yandex-team.ru> 
>>> написал(а):
>>> 
>>> I've went ahead and created 3 patches:
>>> 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
>>> 2. Reduce locking level to shared on read of MultiXactId members
>>> 3. Configurable cache size
>> 
>> I'm looking more at MultiXact and it seems to me that we have a race 
>> condition there.
>> 
>> When we create a new MultiXact we do:
>> 1. Generate new MultiXactId under MultiXactGenLock
>> 2. Record new mxid with members and offset to WAL
>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>> 4. Write members to SLRU under MultiXactMemberControlLock
> 
> But, don't we hold exclusive lock on the buffer through all the steps
> above?
Yes...Unless MultiXact is observed on StandBy. This could lead to observing 
inconsistent snapshot: one of lockers committed tuple delete, but standby sees 
it as alive.

>> When we read MultiXact we do:
>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we 
>> sleep and goto 1
>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>> 4. ..... what we do if there are just zeroes because step 4 is not executed 
>> yet? Nothing, return empty members list.
> 
> So transactions never see such incomplete mxids, I believe.
I've observed sleep in step 2. I believe it's possible to observe special 
effects of step 4 too.
Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it 
hits hard on Standbys: if someone is locking whole table on primary - all seq 
scans on standbys follow him with MultiXactOffsetControlLock contention.

It looks like this:
0x00007fcd56896ff7 in __GI___select (nfds=nfds@entry=0, 
readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, 
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffd83376fe0) at 
../sysdeps/unix/sysv/linux/select.c:41
#0  0x00007fcd56896ff7 in __GI___select (nfds=nfds@entry=0, 
readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, 
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffd83376fe0) at 
../sysdeps/unix/sysv/linux/select.c:41
#1  0x000056186e0d54bd in pg_usleep (microsec=microsec@entry=1000) at 
./build/../src/port/pgsleep.c:56
#2  0x000056186dd5edf2 in GetMultiXactIdMembers (from_pgupgrade=0 '\000', 
onlyLock=<optimized out>, members=0x7ffd83377080, multi=3106214809) at 
./build/../src/backend/access/transam/multixact.c:1370
#3  GetMultiXactIdMembers () at 
./build/../src/backend/access/transam/multixact.c:1202
#4  0x000056186dd2d2d9 in MultiXactIdGetUpdateXid (xmax=<optimized out>, 
t_infomask=<optimized out>) at ./build/../src/backend/access/heap/heapam.c:7039
#5  0x000056186dd35098 in HeapTupleGetUpdateXid 
(tuple=tuple@entry=0x7fcba3b63d58) at 
./build/../src/backend/access/heap/heapam.c:7080
#6  0x000056186e0cd0f8 in HeapTupleSatisfiesMVCC (htup=<optimized out>, 
snapshot=0x56186f44a058, buffer=230684) at 
./build/../src/backend/utils/time/tqual.c:1091
#7  0x000056186dd2d922 in heapgetpage (scan=scan@entry=0x56186f4c8e78, 
page=page@entry=3620) at ./build/../src/backend/access/heap/heapam.c:439
#8  0x000056186dd2ea7c in heapgettup_pagemode (key=0x0, nkeys=0, 
dir=ForwardScanDirection, scan=0x56186f4c8e78) at 
./build/../src/backend/access/heap/heapam.c:1034
#9  heap_getnext (scan=scan@entry=0x56186f4c8e78, 
direction=direction@entry=ForwardScanDirection) at 
./build/../src/backend/access/heap/heapam.c:1801
#10 0x000056186de84f51 in SeqNext (node=node@entry=0x56186f4a4f78) at 
./build/../src/backend/executor/nodeSeqscan.c:81
#11 0x000056186de6a3f1 in ExecScanFetch (recheckMtd=0x56186de84ef0 
<SeqRecheck>, accessMtd=0x56186de84f20 <SeqNext>, node=0x56186f4a4f78) at 
./build/../src/backend/executor/execScan.c:97
#12 ExecScan (node=0x56186f4a4f78, accessMtd=0x56186de84f20 <SeqNext>, 
recheckMtd=0x56186de84ef0 <SeqRecheck>) at 
./build/../src/backend/executor/execScan.c:164


Best regards, Andrey Borodin.

Reply via email to