Re: Random pg_upgrade test failure on drongo

Alexander Lakhin Tue, 09 Jan 2024 01:00:16 -0800

Hello Kuroda-san,

09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote:

Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
test, 004_subscription. IIUC, there is a possibility that 004 would fail as 
well.


Per our understanding, this patch can stop random failures. Alexander, can you
test for the confirmation?


Yes, the patch fixes the issue for me (without the patch I observe failures
on iterations 1-2, with 10 tests running in parallel, but with the patch
10 iterations succeeded).

But as far I can see, 004_subscription is not affected by the issue,
because it doesn't enable streaming for nodes new_sub, new_sub1.
As I noted before, I could see the failure only with
shared_buffers = 1MB (which is set with allows_streaming => 'logical').
So I'm not sure, whether we need to modify 004 (or any other test that
runs pg_upgrade).

As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation separately, in case we encounter it, but maybe
it's only me.
On the other hand, if a checkpoint could occur by some reason within a
shorter time span, then increasing the timeout would not matter, I suppose.
(I've also tested the bgwriter_lru_maxpages-only modification of your patch
and can confirm that it works as well.)

Best regards,
Alexander

Re: Random pg_upgrade test failure on drongo

Reply via email to