Hello Kuroda-san,

21.11.2023 13:37, Hayato Kuroda (Fujitsu) wrote:
This email tells an update. The machine drongo failed the test a week ago [1]
and finally got logfiles. PSA files.
Oh, sorry. I missed to attach files. You can see pg_upgrade_server.log for now.


I can easily reproduce this failure on my workstation by running 5 tests
003_logical_slots in parallel inside Windows VM with it's CPU resources
limited to 50%, like so:
VBoxManage controlvm "Windows" cpuexecutioncap 50

set PGCTLTIMEOUT=180
python3 -c "NUMITERATIONS=20;NUMTESTS=5;import os;tsts='';exec('for i in range(1,NUMTESTS+1): tsts+=f\"pg_upgrade_{i}/003_logical_slots \"'); exec('for i in range(1,NUMITERATIONS+1):print(f\"iteration {i}\"); assert(os.system(f\"meson test --num-processes {NUMTESTS} {tsts}\") == 0)')"
...
iteration 2
ninja: Entering directory `C:\src\postgresql\build'
ninja: no work to do.
1/5 postgresql:pg_upgrade_2 / pg_upgrade_2/003_logical_slots ERROR            
60.30s   exit status 25
...
pg_restore: error: could not execute query: ERROR:  could not create file 
"base/1/2683": File exists
...

I agree with your analysis and would like to propose a PoC fix (see
attached). With this patch applied, 20 iterations succeeded for me.

Best regards,
Alexander
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 07dd190cbc..5185648388 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -121,6 +121,8 @@ pgunlink(const char *path)
 	bool		is_lnk;
 	int			loops = 0;
 	struct stat st;
+	char		tmppath[MAX_PATH];
+	char	   *curpath = path;
 
 	/*
 	 * This function might be called for a regular file or for a junction
@@ -129,9 +131,14 @@ pgunlink(const char *path)
 	 * if we can unlink directly, since that's expected to be the most common
 	 * case.
 	 */
-	if (unlink(path) == 0)
-		return 0;
-	if (errno != EACCES)
+	snprintf(tmppath, sizeof(tmppath), "%s.tmp", path);
+	if (pgrename(path, tmppath) == 0)
+	{
+		if (unlink(tmppath) == 0)
+			return 0;
+		curpath = tmppath;
+	}
+	else if (errno != EACCES)
 		return -1;
 
 	/*
@@ -150,7 +157,7 @@ pgunlink(const char *path)
 	 * fail. We want to wait until the file truly goes away so that simple
 	 * recursive directory unlink algorithms work.
 	 */
-	if (lstat(path, &st) < 0)
+	if (lstat(curpath, &st) < 0)
 	{
 		if (lstat_error_was_status_delete_pending())
 			is_lnk = false;
@@ -167,7 +174,7 @@ pgunlink(const char *path)
 	 * someone else to close the file, as the caller might be holding locks
 	 * and blocking other backends.
 	 */
-	while ((is_lnk ? rmdir(path) : unlink(path)) < 0)
+	while ((is_lnk ? rmdir(curpath) : unlink(curpath)) < 0)
 	{
 		if (errno != EACCES)
 			return -1;
@@ -175,6 +182,14 @@ pgunlink(const char *path)
 			return -1;
 		pg_usleep(100000);		/* us */
 	}
+
+	loops = 0;
+	while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending())
+	{
+		if (++loops > 100)		/* time out after 10 sec */
+			return -1;
+		pg_usleep(100000);		/* us */
+	}
 	return 0;
 }
 

Reply via email to