Re: Recent 027_streaming_regress.pl hangs

2024-06-04 Thread Alexander Lakhin
Hello Andres, So it looks like the issue resolved, but there is another apparently performance-related issue: deadlock-parallel test failures. I reduced test concurrency a bit. I hadn't quite realized how the buildfarm config and meson test concurrency interact. But there's still something

Re: Recent 027_streaming_regress.pl hangs

2024-04-04 Thread Andres Freund
Hi, On 2024-04-04 19:00:00 +0300, Alexander Lakhin wrote: > 26.03.2024 10:59, Andres Freund wrote: > > Late, will try to look more in the next few days. > > > > AFAICS, last 027_streaming_regress.pl failures on calliphoridae, > culicidae, tamandua occurred before 2024-03-27: >

Re: Recent 027_streaming_regress.pl hangs

2024-04-04 Thread Alexander Lakhin
Hello Andres, 26.03.2024 10:59, Andres Freund wrote: Late, will try to look more in the next few days. AFAICS, last 027_streaming_regress.pl failures on calliphoridae, culicidae, tamandua occurred before 2024-03-27:

Re: Recent 027_streaming_regress.pl hangs

2024-03-26 Thread Andres Freund
Hi, On 2024-03-26 00:54:54 -0400, Tom Lane wrote: > > I guess I'll try to write a buildfarm database query to extract how long > > that > > phase of the test took from all runs on my menagerie, not just the failing > > one, and see if there's a visible trend. > > +1 Only the query for

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Tom Lane
Andres Freund writes: > On 2024-03-26 00:00:38 -0400, Tom Lane wrote: >> Are you sure it's not just that the total time to run the core >> regression tests has grown to a bit more than what the test timeout >> allows for? > You're right, that could be it - in a way at least, the issue is replay

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Andres Freund
Hi, On 2024-03-26 00:00:38 -0400, Tom Lane wrote: > Andres Freund writes: > > I think there must be some actual regression involved. The frequency of > > failures on HEAD vs failures on 16 - both of which run the tests > > concurrently > > via meson - is just vastly different. > > Are you sure

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Tom Lane
Andres Freund writes: > I think there must be some actual regression involved. The frequency of > failures on HEAD vs failures on 16 - both of which run the tests concurrently > via meson - is just vastly different. Are you sure it's not just that the total time to run the core regression tests

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Andres Freund
Hi, On 2024-03-20 17:41:45 -0700, Andres Freund wrote: > On 2024-03-14 16:56:39 -0400, Tom Lane wrote: > > Also, this is probably not > > helping anything: > > > >'extra_config' => { > > ... > >

Re: Recent 027_streaming_regress.pl hangs

2024-03-21 Thread Andres Freund
Hi, On 2024-03-20 17:41:45 -0700, Andres Freund wrote: > 2024-03-20 22:14:01.904 UTC [56343][client backend][6/1925:0] LOG: > connection authorized: user=bf database=postgres > application_name=027_stream_regress.pl > 2024-03-20 22:14:01.930 UTC [56343][client backend][6/1926:0] LOG: >

Re: Recent 027_streaming_regress.pl hangs

2024-03-20 Thread Andres Freund
Hi, On 2024-03-20 17:41:47 -0700, Andres Freund wrote: > There's a lot of other animals on the same machine, however it's rarely fuly > loaded (with either CPU or IO). > > I don't think the test just being slow is the issue here, e.g. in the last > failing iteration > > [...] > > I suspect we

Re: Recent 027_streaming_regress.pl hangs

2024-03-20 Thread Andres Freund
Hi, On 2024-03-14 16:56:39 -0400, Tom Lane wrote: > Thomas Munro writes: > > On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin > > wrote: > >> Could it be that the timeout (360 sec?) is just not enough for the test > >> under the current (changed due to switch to meson) conditions? > > > But

Re: Recent 027_streaming_regress.pl hangs

2024-03-19 Thread Alexander Lakhin
14.03.2024 23:56, Tom Lane wrote: Thomas Munro writes: On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: Could it be that the timeout (360 sec?) is just not enough for the test under the current (changed due to switch to meson) conditions? But you're right that under meson the test

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Tom Lane
Thomas Munro writes: > On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: >> Could it be that the timeout (360 sec?) is just not enough for the test >> under the current (changed due to switch to meson) conditions? > But you're right that under meson the test takes a lot longer, I guess >

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: > Could it be that the timeout (360 sec?) is just not enough for the test > under the current (changed due to switch to meson) conditions? Hmm, well it looks like he switched over to meson around 42 days ago 2024-02-01, looking at

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Alexander Lakhin
Hello Thomas and Michael, 14.03.2024 06:16, Thomas Munro wrote: Yeah, I was wondering if its checkpoint delaying logic might have got the checkpointer jammed or something like that, but I don't currently see how. Yeah, the replay of bulk newpages could be relevant, but it's not exactly new

Re: Recent 027_streaming_regress.pl hangs

2024-03-13 Thread Thomas Munro
On Thu, Mar 14, 2024 at 3:27 PM Michael Paquier wrote: > Hmm. Perhaps 8af25652489? That looks like the closest thing in the > list that could have played with the way WAL is generated, hence > potentially impacting the records that are replayed. Yeah, I was wondering if its checkpoint delaying

Re: Recent 027_streaming_regress.pl hangs

2024-03-13 Thread Michael Paquier
On Thu, Mar 14, 2024 at 03:00:28PM +1300, Thomas Munro wrote: > Assuming it is due to a commit in master, and given the failure > frequency, I think it is very likely to be a change from this 3 day > window of commits, and more likely in the top half dozen or so: > > d360e3cc60e Fix compiler

Re: Recent 027_streaming_regress.pl hangs

2024-03-13 Thread Thomas Munro
On Wed, Mar 13, 2024 at 10:53 AM Thomas Munro wrote: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2024-02-23%2015%3A44%3A35 Assuming it is due to a commit in master, and given the failure frequency, I think it is very likely to be a change from this 3 day window of commits,