I've continued to review this and decided that I don't like the mess this patch proposes in order to support pg_commit_ts's deletion of all files. (Yes, I know that I was the one that proposed this idea. It's still a bad one). I'd like to change that code by removing the limit that we can only have 128 bank locks in a SLRU. To recap, the reason we did this is that commit_ts sometimes wants to delete all files while running (DeactivateCommitTs), and for this it needs to acquire all bank locks. Since going above the MAX_SIMUL_LWLOCKS limit is disallowed, we added the SLRU limit making multiple banks share lwlocks.
I propose two alternative solutions: 1. The easiest is to have DeactivateCommitTs continue to hold CommitTsLock until the end, including while SlruScanDirectory does its thing. This sounds terrible, but considering that this code only runs when the module is being deactivated, I don't think it's really all that bad in practice. I mean, if you deactivate the commit_ts module and then try to read commit timestamp data, you deserve to wait for a few seconds just as a punishment for your stupidity. AFAICT the cases where anything is done in the module mostly check without locking that commitTsActive is set, so we're not slowing down any critical operations. Also, if you don't like to be delayed for a couple of seconds, just don't deactivate the module. 2. If we want some refinement, the other idea is to change SlruScanDirCbDeleteAll (the callback that SlruScanDirectory uses in this case) so that it acquires the bank lock appropriate for all the slots used by the file that's going to be deleted. This is OK because in the default compilation each file only has 32 segments, so that requires only 32 lwlocks held at once while the file is being deleted. I think we don't need to bother with this, but it's an option if we see the above as unworkable for whatever reason. The only other user of SlruScanDirCbDeleteAll is async.c (the LISTEN/ NOTIFY code), and what that does is delete all the files at server start, where nothing is running concurrently anyway. So whatever we do for commit_ts, it won't affect async.c. So, if we do away with the SLRU_MAX_BANKLOCKS idea, then we're assured one LWLock per bank (instead of sharing some lwlocks to multiple banks), and that makes the code simpler to reason about. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)