Dear Alexander, > > I can easily reproduce this failure on my workstation by running 5 tests > 003_logical_slots in parallel inside Windows VM with it's CPU resources > limited to 50%, like so: > VBoxManage controlvm "Windows" cpuexecutioncap 50 > > set PGCTLTIMEOUT=180 > python3 -c "NUMITERATIONS=20;NUMTESTS=5;import os;tsts='';exec('for i in > range(1,NUMTESTS+1): > tsts+=f\"pg_upgrade_{i}/003_logical_slots \"'); exec('for i in > range(1,NUMITERATIONS+1):print(f\"iteration {i}\"); > assert(os.system(f\"meson test --num-processes {NUMTESTS} {tsts}\") == 0)')" > ... > iteration 2 > ninja: Entering directory `C:\src\postgresql\build' > ninja: no work to do. > 1/5 postgresql:pg_upgrade_2 / pg_upgrade_2/003_logical_slots > ERROR 60.30s exit status 25 > ... > pg_restore: error: could not execute query: ERROR: could not create file > "base/1/2683": File exists > ...
Great. I do not have such an environment so I could not find. This seemed to suggest that the failure was occurred because the system was busy. > I agree with your analysis and would like to propose a PoC fix (see > attached). With this patch applied, 20 iterations succeeded for me. Thanks, here are comments. I'm quite not sure for the windows, so I may say something wrong. * I'm not sure why the file/directory name was changed before doing a unlink. Could you add descriptions? * IIUC, the important points is the latter part, which waits until the status is changed. Based on that, can we remove a double rmtree() from cleanup_output_dirs()? They seems to be add for the similar motivation. ``` + loops = 0; + while (lstat(curpath, &st) < 0 && lstat_error_was_status_delete_pending()) + { + if (++loops > 100) /* time out after 10 sec */ + return -1; + pg_usleep(100000); /* us */ + } ``` Best Regards, Hayato Kuroda FUJITSU LIMITED