RE: Random pg_upgrade test failure on drongo

2024-01-16 Thread Hayato Kuroda (Fujitsu)
Dear hackers, > > Thanks to both of you. I have pushed the patch. > I have been tracking the BF animals these days, and this failure has not seen anymore. I think we can close the topic. Again, thanks for all efforts. Best Regards, Hayato Kuroda FUJITSU LIMITED

Re: Random pg_upgrade test failure on drongo

2024-01-11 Thread Amit Kapila
On Thu, Jan 11, 2024 at 8:15 AM Hayato Kuroda (Fujitsu) wrote: > > > > But tomorrow it could be for other tables and if we change this > > > TRUNCATE logic for pg_largeobject (of which chances are less) then > > > there is always a chance that one misses changing this comment. I feel > > >

RE: Random pg_upgrade test failure on drongo

2024-01-10 Thread Hayato Kuroda (Fujitsu)
Dear Alexander, Amit, > > But tomorrow it could be for other tables and if we change this > > TRUNCATE logic for pg_largeobject (of which chances are less) then > > there is always a chance that one misses changing this comment. I feel > > keeping it generic in this case would be better as the

Re: Random pg_upgrade test failure on drongo

2024-01-10 Thread Alexander Lakhin
10.01.2024 13:37, Amit Kapila wrote: But tomorrow it could be for other tables and if we change this TRUNCATE logic for pg_largeobject (of which chances are less) then there is always a chance that one misses changing this comment. I feel keeping it generic in this case would be better as the

Re: Random pg_upgrade test failure on drongo

2024-01-10 Thread Amit Kapila
On Wed, Jan 10, 2024 at 3:30 PM Alexander Lakhin wrote: > > 10.01.2024 12:31, Amit Kapila wrote: > > I am slightly hesitant to add any particular system table name in the > > comments as this can happen for any other system table as well, so > > slightly adjusted the comments in the attached.

Re: Random pg_upgrade test failure on drongo

2024-01-10 Thread Alexander Lakhin
10.01.2024 12:31, Amit Kapila wrote: I am slightly hesitant to add any particular system table name in the comments as this can happen for any other system table as well, so slightly adjusted the comments in the attached. However, I think it is okay to mention the particular system table name in

Re: Random pg_upgrade test failure on drongo

2024-01-10 Thread Amit Kapila
On Tue, Jan 9, 2024 at 4:30 PM Alexander Lakhin wrote: > > 09.01.2024 13:08, Amit Kapila wrote: > > > >> As to checkpoint_timeout, personally I would not increase it, because it > >> seems unbelievable to me that pg_restore (with the cluster containing only > >> two empty databases) can run for

Re: Random pg_upgrade test failure on drongo

2024-01-09 Thread Alexander Lakhin
Hello Amit, 09.01.2024 13:08, Amit Kapila wrote: As to checkpoint_timeout, personally I would not increase it, because it seems unbelievable to me that pg_restore (with the cluster containing only two empty databases) can run for longer than 5 minutes. I'd rather investigate such situation

Re: Random pg_upgrade test failure on drongo

2024-01-09 Thread Amit Kapila
On Tue, Jan 9, 2024 at 2:30 PM Alexander Lakhin wrote: > > 09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote: > > Based on the suggestion by Amit, I have created a patch with the alternative > > approach. This just does GUC settings. The reported failure is only for > > 003_logical_slots, but the

Re: Random pg_upgrade test failure on drongo

2024-01-09 Thread Alexander Lakhin
Hello Kuroda-san, 09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote: Based on the suggestion by Amit, I have created a patch with the alternative approach. This just does GUC settings. The reported failure is only for 003_logical_slots, but the patch also includes changes for the recently added

RE: Random pg_upgrade test failure on drongo

2024-01-08 Thread Hayato Kuroda (Fujitsu)
Dear Amit, Alexander, > > We get the effect discussed when the background writer process decides to > > flush a file buffer for pg_largeobject during stage 1. > > (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE, > > the result must be the same.) > > And another important

Re: Random pg_upgrade test failure on drongo

2024-01-08 Thread Amit Kapila
On Mon, Jan 8, 2024 at 9:36 PM Jim Nasby wrote: > > On 1/4/24 10:19 PM, Amit Kapila wrote: > > On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin wrote: > >> > >> 03.01.2024 14:42, Amit Kapila wrote: > >>> > >> > And the internal process is ... background writer (BgBufferSync()). > >

Re: Random pg_upgrade test failure on drongo

2024-01-08 Thread Jim Nasby
On 1/4/24 10:19 PM, Amit Kapila wrote: On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin wrote: 03.01.2024 14:42, Amit Kapila wrote: And the internal process is ... background writer (BgBufferSync()). So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and got 20 x 10

Re: Random pg_upgrade test failure on drongo

2024-01-04 Thread Amit Kapila
On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin wrote: > > 03.01.2024 14:42, Amit Kapila wrote: > > > > >> And the internal process is ... background writer (BgBufferSync()). > >> > >> So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and > >> got 20 x 10 tests passing. > >> >

Re: Random pg_upgrade test failure on drongo

2024-01-04 Thread Alexander Lakhin
Hello Amit, 03.01.2024 14:42, Amit Kapila wrote: So I started to think about other approach: to perform unlink as it's implemented now, but then wait until the DELETE_PENDING state is gone. There is a comment in the code which suggests we shouldn't wait indefinitely. See "However, we won't

Re: Random pg_upgrade test failure on drongo

2024-01-03 Thread Amit Kapila
On Tue, Jan 2, 2024 at 10:30 AM Alexander Lakhin wrote: > > 28.12.2023 06:08, Hayato Kuroda (Fujitsu) wrote: > > Dear Alexander, > > > >> I agree with your analysis and would like to propose a PoC fix (see > >> attached). With this patch applied, 20 iterations succeeded for me. > > There are no

Re: Random pg_upgrade test failure on drongo

2024-01-01 Thread Alexander Lakhin
Hello Kuroda-san, 28.12.2023 06:08, Hayato Kuroda (Fujitsu) wrote: Dear Alexander, I agree with your analysis and would like to propose a PoC fix (see attached). With this patch applied, 20 iterations succeeded for me. There are no reviewers so that I will review again. Let's move the PoC to

RE: Random pg_upgrade test failure on drongo

2023-12-27 Thread Hayato Kuroda (Fujitsu)
Dear Alexander, > I agree with your analysis and would like to propose a PoC fix (see > attached). With this patch applied, 20 iterations succeeded for me. There are no reviewers so that I will review again. Let's move the PoC to the concrete patch. Note that I only focused on fixes of random

Re: Random pg_upgrade test failure on drongo

2023-11-30 Thread Alexander Lakhin
Hello Andrew and Kuroda-san, 27.11.2023 16:58, Andrew Dunstan wrote: It's also interesting, what is full version/build of OS on drongo and fairywren. It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122 I've updated it to 17763.5122 now. Thank you for the

RE: Random pg_upgrade test failure on drongo

2023-11-30 Thread Hayato Kuroda (Fujitsu)
Dear Alexander, Andrew, Thanks for your analysis! > I see that behavior on: > Windows 10 Version 1607 (OS Build 14393.0) > Windows Server 2016 Version 1607 (OS Build 14393.0) > Windows Server 2019 Version 1809 (OS Build 17763.1) > > But it's not reproduced on: > Windows 10 Version 1809 (OS

Re: Random pg_upgrade test failure on drongo

2023-11-27 Thread Andrew Dunstan
On 2023-11-27 Mo 07:39, Andrew Dunstan wrote: On 2023-11-27 Mo 07:00, Alexander Lakhin wrote: Hello Kuroda-san, 25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote: Thanks for attaching a program. This helps us to understand the issue. I wanted to confirm your env - this failure was occurred

Re: Random pg_upgrade test failure on drongo

2023-11-27 Thread Andrew Dunstan
On 2023-11-27 Mo 07:00, Alexander Lakhin wrote: Hello Kuroda-san, 25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote: Thanks for attaching a program. This helps us to understand the issue. I wanted to confirm your env - this failure was occurred on windows server , right? I see that

Re: Random pg_upgrade test failure on drongo

2023-11-27 Thread Alexander Lakhin
Hello Kuroda-san, 25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote: Thanks for attaching a program. This helps us to understand the issue. I wanted to confirm your env - this failure was occurred on windows server , right? I see that behavior on: Windows 10 Version 1607 (OS Build 14393.0)

RE: Random pg_upgrade test failure on drongo

2023-11-25 Thread Hayato Kuroda (Fujitsu)
Dear Alexander, > > Please look at the simple test program attached. It demonstrates the > failure for me when running in two sessions as follows: > unlink-open test 150 1000 > unlink-open test2 150 1000 Thanks for attaching a program. This helps us to understand the issue. I

Re: Random pg_upgrade test failure on drongo

2023-11-24 Thread Alexander Lakhin
Hello Kuroda-san, 23.11.2023 15:15, Hayato Kuroda (Fujitsu) wrote: I agree with your analysis and would like to propose a PoC fix (see attached). With this patch applied, 20 iterations succeeded for me. Thanks, here are comments. I'm quite not sure for the windows, so I may say something

RE: Random pg_upgrade test failure on drongo

2023-11-23 Thread Hayato Kuroda (Fujitsu)
Dear Alexander, > > I can easily reproduce this failure on my workstation by running 5 tests > 003_logical_slots in parallel inside Windows VM with it's CPU resources > limited to 50%, like so: > VBoxManage controlvm "Windows" cpuexecutioncap 50 > > set PGCTLTIMEOUT=180 > python3 -c

Re: Random pg_upgrade test failure on drongo

2023-11-23 Thread Alexander Lakhin
Hello Kuroda-san, 21.11.2023 13:37, Hayato Kuroda (Fujitsu) wrote: This email tells an update. The machine drongo failed the test a week ago [1] and finally got logfiles. PSA files. Oh, sorry. I missed to attach files. You can see pg_upgrade_server.log for now. I can easily reproduce this

RE: Random pg_upgrade test failure on drongo

2023-11-21 Thread Hayato Kuroda (Fujitsu)
Dear hackers, This email tells an update. The machine drongo failed the test a week ago [1] and finally got logfiles. PSA files. ## Observed failure pg_upgrade_server.log is a server log during the pg_upgrade command. According to it, the TRUNCATE command seemed to be failed due to a "File

RE: Random pg_upgrade test failure on drongo

2023-11-15 Thread Hayato Kuroda (Fujitsu)
Dear hackers, > While tracking a buildfarm, I found that drongo failed the test > pg_upgrade/003_logical_slots [1]. > A strange point is that the test passed in the next iteration. Currently I'm > not > sure the reason, but I will keep my eye for it and will investigate if it > happens again.

Random pg_upgrade test failure on drongo

2023-11-08 Thread Hayato Kuroda (Fujitsu)
Dear hackers, While tracking a buildfarm, I found that drongo failed the test pg_upgrade/003_logical_slots [1]. A strange point is that the test passed in the next iteration. Currently I'm not sure the reason, but I will keep my eye for it and will investigate if it happens again. I think this