Hi, On 2022-05-03 01:16:46 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2022-05-02 23:44:32 -0400, Tom Lane wrote: > >> I can poke into that tomorrow, but are you sure that that isn't an > >> expectable result? > > > It's not expected. But I think I might see what the problem is: > > We wait for the FETCH (and thus the buffer pin to be acquired). But that > > doesn't guarantee that the lock has been acquired. We can't check that with > > pump_until() afaics, because there'll not be any output. But a query_until() > > checking pg_locks should do the trick? > > Irritatingly, it doesn't reproduce (at least not easily) in a manual > build on the same box.
Odd, given how readily it seem to reproduce on the bf. I assume you built with > Uses -fsanitize=alignment -DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN > -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS > So it's almost surely a timing issue, and your theory here seems plausible. Unfortunately I don't think my theory holds, because I actually had added a defense against this into the test that I forgot about momentarily... # just to make sure we're waiting for lock already ok( $node_standby->poll_query_until( 'postgres', qq[ SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted; ], 'waiting'), "$sect: lock acquisition is waiting"); and on longfin that step completes sucessfully. I think what happens is that we get a buffer pin conflict, because these days we can actually process buffer pin conflicts while waiting for a lock. The easiest way to get around that is to increase the replay timeout for that test, I think? I think we need a restart, not a reload, because reloads aren't guaranteed to be processed at any certain point in time :/. Testing a fix in a variety of timing circumstances now... Greetings, Andres Freund