Hello Kuroda-san,
21.11.2023 13:37, Hayato Kuroda (Fujitsu) wrote:
This email tells an update. The machine drongo failed the test a week ago [1]
and finally got logfiles. PSA files.
Oh, sorry. I missed to attach files. You can see pg_upgrade_server.log for now.
I can easily reproduce this failure on my workstation by running 5 tests
003_logical_slots in parallel inside Windows VM with it's CPU resources
limited to 50%, like so:
VBoxManage controlvm "Windows" cpuexecutioncap 50
set PGCTLTIMEOUT=180
python3 -c "NUMITERATIONS=20;NUMTESTS=5;import os;tsts='';exec('for i in range(1,NUMTESTS+1):
tsts+=f\"pg_upgrade_{i}/003_logical_slots \"'); exec('for i in range(1,NUMITERATIONS+1):print(f\"iteration {i}\");
assert(os.system(f\"meson test --num-processes {NUMTESTS} {tsts}\") == 0)')"
...
iteration 2
ninja: Entering directory `C:\src\postgresql\build'
ninja: no work to do.
1/5 postgresql:pg_upgrade_2 / pg_upgrade_2/003_logical_slots ERROR
60.30s exit status 25
...
pg_restore: error: could not execute query: ERROR: could not create file
"base/1/2683": File exists
...
I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.
Best regards,
Alexander
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 07dd190cbc..5185648388 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -121,6 +121,8 @@ pgunlink(const char *path)
bool is_lnk;
int loops = 0;
struct stat st;
+ char tmppath[MAX_PATH];
+ char *curpath = path;
/*
* This function might be called for a regular file or for a junction
@@ -129,9 +131,14 @@ pgunlink(const char *path)
* if we can unlink directly, since that's expected to be the most common
* case.
*/
- if (unlink(path) == 0)
- return 0;
- if (errno != EACCES)
+ snprintf(tmppath, sizeof(tmppath), "%s.tmp", path);
+ if (pgrename(path, tmppath) == 0)
+ {
+ if (unlink(tmppath) == 0)
+ return 0;
+ curpath = tmppath;
+ }
+ else if (errno != EACCES)
return -1;
/*
@@ -150,7 +157,7 @@ pgunlink(const char *path)
* fail. We want to wait until the file truly goes away so that simple
* recursive directory unlink algorithms work.
*/
- if (lstat(path, &st) < 0)
+ if (lstat(curpath, &st) < 0)
{
if (lstat_error_was_status_delete_pending())
is_lnk = false;
@@ -167,7 +174,7 @@ pgunlink(const char *path)
* someone else to close the file, as the caller might be holding locks
* and blocking other backends.
*/
- while ((is_lnk ? rmdir(path) : unlink(path)) < 0)
+ while ((is_lnk ? rmdir(curpath) : unlink(curpath)) < 0)
{
if (errno != EACCES)
return -1;
@@ -175,6 +182,14 @@ pgunlink(const char *path)
return -1;
pg_usleep(100000); /* us */
}
+
+ loops = 0;
+ while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending())
+ {
+ if (++loops > 100) /* time out after 10 sec */
+ return -1;
+ pg_usleep(100000); /* us */
+ }
return 0;
}