Hi, On 2023-02-06 17:53:00 -0800, Andres Freund wrote: > WRT the fdw_retry_check: I wonder if we should increase the log level of > a) pgfdw_inval_callback deciding to disconnect > b) ReceiveSharedInvalidMessages() deciding to reset > > to DEBUG1, at least temporarily? > > Alternatively we could add a > SET log_min_messages=debug4; > ... > RESET log_min_messages; > > around the postgres_fdw connection re-establishment test? > > > One thing nudging me towards the more global approach is that I have the vague > feelign that there's a wider issue around hitting more sinval resets than we > should - and it'd right now be very hard to know about that.
Luckily it proved to be easy enough to reproduce on a private branch, by setting the test to repeat a couple times. I set the aforementioned log messages to LOG. And indeed: 2023-02-07 02:54:18.773 UTC [10800][client backend] [pg_regress/postgres_fdw][7/10526:0] LOG: cache state reset 2023-02-07 02:54:18.773 UTC [10800][client backend] [pg_regress/postgres_fdw][7/10526:0] LOG: discarding connection 0x802251f00 that was preceded by another log message less than 200 ms before: 2023-02-07 02:54:18.588 UTC [10800][client backend] [pg_regress/postgres_fdw][7/10523:55242] STATEMENT: ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check'); The log statements inbetween are about isolation/reindex-concurrently-toast and pg_regress/indexing. So the problem is indeed that we somehow quickly overflow the sinval queue. I guess we need a bit more logging around the size of the sinval queue and its "fullness"? I'm a bit surprised that MAXNUMMESSAGES is a hardcoded 4096. It's not particularly surprising that that's quickly overflown? There's something off. Isolationtester's control connection emits *loads* of invalidation messages: 2023-02-06 19:29:06.430 PST [2125297][client backend][6/0:121864][isolation/receipt-report/control connection] LOG: previously emitted 7662 messages, 21 this time 2023-02-06 19:29:06.566 PST [2125297][client backend][6/0:121873][isolation/receipt-report/control connection] LOG: previously emitted 8155 messages, 99 this time 2023-02-06 19:29:06.655 PST [2125297][client backend][6/0:121881][isolation/receipt-report/control connection] LOG: previously emitted 8621 messages, 99 this time 2023-02-06 19:29:06.772 PST [2125297][client backend][6/0:121892][isolation/receipt-report/control connection] LOG: previously emitted 9208 messages, 85 this time 2023-02-06 19:29:06.867 PST [2125297][client backend][6/0:121900][isolation/receipt-report/control connection] LOG: previously emitted 9674 messages, 85 this time and this happens with lots of other tests. Greetings, Andres Freund PS: The reindex-concurrently-toast output seems to indicate something is broken in the test... There's lots of non-existing table references in the expected file, without that immediately making sense.