On Tue, Dec 12, 2023 at 6:58 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > [Added Andrey again in CC, because as I understand they are using this > code or something like it in production. Please don't randomly remove > people from CC lists.]
Oh, glad to know that. Yeah, I generally do not remove but I have noticed that in the mail chain, some of the reviewers just replied to me and the hackers' list, and from that point onwards I lost track of the CC list. > I've been looking at this some more, and I'm not confident in that the > group clog update stuff is correct. I think the injection points test > case was good enough to discover a problem, but it's hard to get peace > of mind that there aren't other, more subtle problems. Yeah, I agree. > The problem I see is that the group update mechanism is designed around > contention of the global xact-SLRU control lock; it uses atomics to > coordinate a single queue when the lock is contended. So if we split up > the global SLRU control lock using banks, then multiple processes using > different bank locks might not contend. OK, this is fine, but what > happens if two separate groups of processes encounter contention on two > different bank locks? I think they will both try to update the same > queue, and coordinate access to that *using different bank locks*. I > don't see how can this work correctly. Let's back up a bit and start from the current design with the centralized lock. With that, if one process is holding the lock the other processes will try to perform the group update, and if there is already a group that still hasn't got the lock but trying to update the different CLOG page then what this process wants to update then it will not add itself for the group update instead it will fallback to the normal lock wait. Now, in another situation, it may so happen that the group leader of the other group already got the control lock and in such case, it would have cleared the 'procglobal->clogGroupFirst' that means now we will start forming a different group. So logically if we talk only about the optimization part then the thing is that it is assumed that at a time when we are trying to commit a log of concurrent xid then those xids are mostly of the same range and will fall in the same SLRU page and group update will help them. But if we are getting some out-of-range xid of some long-running transaction they might not even go for group update as the page number will be different. Although the situation might be better here with a bank-wise lock because there if those xids are falling in altogether a different bank it might not even contend. Now, let's talk about the correctness, I think even though we are getting processes that might be contending on different bank-lock, still we are ensuring that in a group all the processes are trying to update the same SLRU page (i.e. same bank also, we will talk about the exception later[1]). One of the processes is becoming a leader and as soon as the leader gets the lock it detaches the queue from the 'procglobal->clogGroupFirst' by setting it as INVALID_PGPROCNO so that other group update requesters now can form another parallel group. But here I do not see a problem with correctness. I agree someone might say that since now there is a possibility that different groups might get formed for different bank locks we do not get other groups to get formed until we get the lock for our bank as we do not clear 'procglobal->clogGroupFirst' before we get the lock. Other requesters might want to update the page in different banks so why block them? But the thing is the group update design is optimized for the cases where all requesters are trying to update the status of xids generated near the same range. > I suspect the first part of that algorithm, where atomics are used to > create the list without a lock, might work fine. But will each "leader" > process, each of which is potentially using a different bank lock, > coordinate correctly? Maybe this works correctly because only one > process will find the queue head not empty? If this is what happens, > then there needs to be comments about it. Yes, you got it right, I will try to comment on it better. Without any explanation, > this seems broken and potentially dangerous, as some transaction commit > bits might become lost given high enough concurrency and bad luck. > > Maybe this can be made to work by having one more lwlock that we use > solely to coordinate this task. Do you mean to say a different lock for adding/removing in the list instead of atomic operation? I think then we will lose the benefit we got in the group update by having contention on another lock. [1] I think we already know about the exception case and I have explained in the comments as well that in some cases we might add different clog page update requests in the same group, and for handling that exceptional case we are checking the respective bank lock for each page and if that exception occurred we will release the old bank lock and acquire a new lock. This case might not be performant because now it is possible that after getting the lock leader might need to wait again on another bank lock, but this is an extremely exceptional case so should not be worried about performance and I do not see any correctness issue here as well. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com