On Wed, May 27, 2026 at 02:03:37PM +0300, Vlad Lesin wrote: > Attached is an updated patch with the test rebased onto the latest master.
I have been looking at that, and after putting my HEAD on it I have concluded that this test is an anti-pattern based on its current shape. The test relies on a wait, but the wait mechanism in injection_points will not be able to work as the latch of the backends get detached earlier after the fix done at 84b9d6bceab6 to prevent the recycle race. What your proposed test was relying on here is a statement timeout to make sure that the leader stays long enough in the second point so as the followers are awaken and can finish their termination business first. But we cannot really rely on that, and it makes the test much longer than necessary on fast machines, dependent on statement_timeout. Another thing that we could do is redesign the wait mechanism in injection_points to not rely on latches, using a shared memory flag where we don't depend on latches and conditional variables. I cannot get excited enough about that for this specific case, but perhaps somebody would like to give it a shot? The use of latches has been mentioned more than once as an annoying limitation, but I've always found that less efficient on fast machines as it would require an internal polling with a sleep or equivalent. This thread would give an extra reason to do so, perhaps not in the stable branches but HEAD once v20 opens up? prockill_attach_injection_wait_pid() is not really necessary. We could just reuse the existing attach() function and add an optional PID argument, defaulting to 0. Using two different points, one for the leader and one for the follower is indeed the correct way to do things. Another thing that one may try is to switch the test to use NOTICEs and server log lookups. That may catch the PANIC, but one really needs to be lucky so it would be mostly a waste of cycles in the buildfarm due to the low reproducibility rate. The test was still a useful exercise to prove the bug, though. If one reverts 84b9d6bceab6 and runs the test, we are able to reproduce the original bug. Anyway, if we are to move ahead with this test, I'd suggest a couple of patch pieces first: - Switch the wait mechanism to use something more primitive, with a shmem state. - Extend injection_points_attach() with an optional PID. - Add the test. All that would be quite invasise, especially for the 1st point, so introducing that only in v20 may be better. I am not sure that the case of this thread is good enough to justify these changes, TBH. -- Michael
signature.asc
Description: PGP signature
