RE: failures in t/031_recovery_conflict.pl on CI

2022-11-23 Thread Факеев Алексей
Haas ; pgsql-hack...@postgresql.org; Thomas Munro Subject: Re: failures in t/031_recovery_conflict.pl on CI Hi, On 2022-05-03 14:23:23 -0400, Tom Lane wrote: > Andres Freund writes: > >> So it's almost surely a timing issue, and your theory here seems plausible. > > > Unfortu

Re: failures in t/031_recovery_conflict.pl on CI

2022-07-26 Thread Andres Freund
Hi, On 2022-07-26 12:47:38 -0400, Tom Lane wrote: > Alvaro Herrera writes: > > Hey, I just noticed that these tests are still disabled. The next > > minors are coming soon; should we wait until *those* are done and then > > re-enable; or re-enable them now to see how they fare and then > >

Re: failures in t/031_recovery_conflict.pl on CI

2022-07-26 Thread Tom Lane
Alvaro Herrera writes: > Hey, I just noticed that these tests are still disabled. The next > minors are coming soon; should we wait until *those* are done and then > re-enable; or re-enable them now to see how they fare and then > re-disable before the next minors if there's still problems we

Re: failures in t/031_recovery_conflict.pl on CI

2022-07-26 Thread Alvaro Herrera
On 2022-May-08, Andres Freund wrote: > On 2022-05-08 13:59:09 -0400, Tom Lane wrote: > > No one is going to thank us for shipping a known-unstable test case. > > IDK, hiding failures indicating bugs isn't really better, at least if it > doesn't look like a bug in the test. But you seem to have

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-08 Thread Tom Lane
Andres Freund writes: > On 2022-05-08 15:11:39 -0700, Andres Freund wrote: >> But you seem to have a stronger opinion on this than me, so I'll skip the >> entire test for now :/ > And done. Thanks, I appreciate that. regards, tom lane

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-08 Thread Andres Freund
On 2022-05-08 15:11:39 -0700, Andres Freund wrote: > But you seem to have a stronger opinion on this than me, so I'll skip the > entire test for now :/ And done.

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-08 Thread Andres Freund
Hi, On 2022-05-08 13:59:09 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-08 11:28:34 -0400, Tom Lane wrote: > >> Per lapwing's latest results [1], this wasn't enough. I'm again thinking > >> we should pull the whole test from the back branches. > > > That failure is different

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-08 Thread Tom Lane
Andres Freund writes: > On 2022-05-08 11:28:34 -0400, Tom Lane wrote: >> Per lapwing's latest results [1], this wasn't enough. I'm again thinking >> we should pull the whole test from the back branches. > That failure is different from the earlier failures though. I don't think it's > a timing

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-08 Thread Andres Freund
Hi, On 2022-05-08 11:28:34 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-05 23:57:28 -0400, Tom Lane wrote: > >> Are you sure there's just one test that's failing? I haven't checked > >> the buildfarm history close enough to be sure of that. But if it's > >> true, disabling

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-08 Thread Tom Lane
Andres Freund writes: > On 2022-05-05 23:57:28 -0400, Tom Lane wrote: >> Are you sure there's just one test that's failing? I haven't checked >> the buildfarm history close enough to be sure of that. But if it's >> true, disabling just that one would be fine (again, as a stopgap >> measure). >

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Tom Lane
Andres Freund writes: > Done. Perhaps you could trigger a run on longfin, that seems to have been the > most reliably failing animal? No need, its cron job launched already. regards, tom lane

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Andres Freund
On 2022-05-06 12:12:19 -0400, Tom Lane wrote: > Andres Freund writes: > > I looked through all the failures I found and it's two kinds of failures, > > both > > related to the deadlock test. So I'm thinking of skipping just that test as > > in > > the attached. > > > Working on committing /

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Tom Lane
Andres Freund writes: > I looked through all the failures I found and it's two kinds of failures, both > related to the deadlock test. So I'm thinking of skipping just that test as in > the attached. > Working on committing / backpatching that, unless somebody suggests changes > quickly... WFM.

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-06 Thread Andres Freund
Hi, On 2022-05-05 23:57:28 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-05 23:36:22 -0400, Tom Lane wrote: > >> So I reluctantly vote for removing 031_recovery_conflict.pl in the > >> back branches for now, with the expectation that we'll fix the > >> infrastructure and put it

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Tom Lane
Andres Freund writes: > On 2022-05-05 23:36:22 -0400, Tom Lane wrote: >> So I reluctantly vote for removing 031_recovery_conflict.pl in the >> back branches for now, with the expectation that we'll fix the >> infrastructure and put it back after the current release round >> is done. > What about

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Andres Freund
Hi, On 2022-05-05 23:36:22 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-05 22:07:40 -0400, Tom Lane wrote: > >> May I ask where we're at on this? Next week's back-branch release is > >> getting uncomfortably close, and I'm still seeing various buildfarm > >> animals erratically

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Tom Lane
Andres Freund writes: > On 2022-05-05 22:07:40 -0400, Tom Lane wrote: >> May I ask where we're at on this? Next week's back-branch release is >> getting uncomfortably close, and I'm still seeing various buildfarm >> animals erratically failing on 031_recovery_conflict.pl. > Looks like the

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Andres Freund
Hi, On 2022-05-05 22:07:40 -0400, Tom Lane wrote: > Andres Freund writes: > > Attached is a fix for the test that I think should avoid the problem. > > Couldn't > > repro it with it applied, under both rr and valgrind. > > May I ask where we're at on this? Next week's back-branch release is >

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-05 Thread Tom Lane
Andres Freund writes: > Attached is a fix for the test that I think should avoid the problem. Couldn't > repro it with it applied, under both rr and valgrind. May I ask where we're at on this? Next week's back-branch release is getting uncomfortably close, and I'm still seeing various buildfarm

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-03 Thread Andres Freund
Hi, On 2022-05-03 14:23:23 -0400, Tom Lane wrote: > Andres Freund writes: > >> So it's almost surely a timing issue, and your theory here seems plausible. > > > Unfortunately I don't think my theory holds, because I actually had added a > > defense against this into the test that I forgot about

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-03 Thread Tom Lane
Andres Freund writes: > On 2022-05-03 01:16:46 -0400, Tom Lane wrote: >> Irritatingly, it doesn't reproduce (at least not easily) in a manual >> build on the same box. > Odd, given how readily it seem to reproduce on the bf. I assume you built with >> Uses -fsanitize=alignment

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-03 Thread Andres Freund
Hi, On 2022-05-03 01:16:46 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-02 23:44:32 -0400, Tom Lane wrote: > >> I can poke into that tomorrow, but are you sure that that isn't an > >> expectable result? > > > It's not expected. But I think I might see what the problem is: > >

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-03 Thread Alvaro Herrera
On 2022-May-02, Andres Freund wrote: > > > pgindent uses some crazy formatting nearby: > > > SendRecoveryConflictWithBufferPin( > > > > > > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); > > > > I do not believe that that line break is pgindent's

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-02 Thread Tom Lane
Andres Freund writes: > On 2022-05-02 23:44:32 -0400, Tom Lane wrote: >> I can poke into that tomorrow, but are you sure that that isn't an >> expectable result? > It's not expected. But I think I might see what the problem is: > We wait for the FETCH (and thus the buffer pin to be acquired).

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-02 Thread Andres Freund
Hi, On 2022-05-02 23:44:32 -0400, Tom Lane wrote: > Andres Freund writes: > > I ended up committing the extension of the test first, before the fix. I > > think > > that's the cause of the failure on longfin on serinus. Let's hope the > > situation improves with the now also committed (and

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-02 Thread Tom Lane
Andres Freund writes: > I ended up committing the extension of the test first, before the fix. I think > that's the cause of the failure on longfin on serinus. Let's hope the > situation improves with the now also committed (and backpatched) fix. longfin's definitely not very happy: four out of

Re: failures in t/031_recovery_conflict.pl on CI

2022-05-02 Thread Andres Freund
Hi, On 2022-04-29 19:26:59 -0400, Tom Lane wrote: > Andres Freund writes: > > - The test uses pump_until() and wait_for_log(), which don't exist in the > > backbranches. For now I've just inlined the implementation, but I guess we > > could also backpatch their introduction? > > I'd

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-29 Thread Tom Lane
Andres Freund writes: > Questions: > - I'm planning to backpatch the test as 031_recovery_conflict.pl, even though > preceding numbers are unused. It seems way more problematic to use a > different number in the backbranches than have gaps? +1 > - The test uses pump_until() and

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-29 Thread Andres Freund
Hi, Attached are patches for this issue. It adds a test case for deadlock conflicts to make sure that case isn't broken. I also tested the recovery conflict tests in the back branches, and they work there with a reasonably small set of changes. Questions: - I'm planning to backpatch the test as

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-12 Thread Andres Freund
Hi, On 2022-04-12 15:05:22 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-04-09 19:34:26 -0400, Tom Lane wrote: > >> +1. This is probably more feasible given the latch infrastructure > >> than it was when that code was first written. > > > What do you think about just reordering

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-12 Thread Tom Lane
Andres Freund writes: > On 2022-04-09 19:34:26 -0400, Tom Lane wrote: >> +1. This is probably more feasible given the latch infrastructure >> than it was when that code was first written. > What do you think about just reordering the disable_all_timeouts() to be > before the

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-12 Thread Andres Freund
Hi, On 2022-04-09 19:34:26 -0400, Tom Lane wrote: > Andres Freund writes: > > It's been broken in different ways all the way back to 9.0, from what I can > > see, but I didn't check every single version. > > > Afaics the fix is to nuke the idea of doing anything substantial in the > > signal >

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Tom Lane
Andres Freund writes: > It's been broken in different ways all the way back to 9.0, from what I can > see, but I didn't check every single version. > Afaics the fix is to nuke the idea of doing anything substantial in the signal > handler from orbit, and instead just set a flag in the handler.

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Andres Freund
Hi, On 2022-04-09 16:10:02 -0700, Andres Freund wrote: > It's not that (although I still suspect it's a problem). It's a self-deadlock, > because StandbyTimeoutHandler(), which ResolveRecoveryConflictWithBufferPin() > *explicitly enables*, calls SendRecoveryConflictWithBufferPin(). Which does >

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Andres Freund
Hi, On 2022-04-09 15:00:54 -0700, Andres Freund wrote: > What are we expecting to wake the startup process up, once it does > SendRecoveryConflictWithBufferPin()? > > It's likely not the problem here, because we never seem to have even reached > that path, but afaics once we've called

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-09 Thread Andres Freund
Hi, On 2022-04-08 22:05:01 -0700, Andres Freund wrote: > On 2022-04-08 21:55:15 -0700, Andres Freund wrote: > > on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which > > is > > interesting, because I ran it there dozens if not hundreds of times before > > commit, with - I

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-08 Thread Andres Freund
Hi, On 2022-04-08 21:55:15 -0700, Andres Freund wrote: > on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which is > interesting, because I ran it there dozens if not hundreds of times before > commit, with - I think - only cosmetic changes. Scratch that part - I found an