Hi Tom,
This is a great catch. I am looking into it: I will start by reproducing the
error as you suggested.
Thanks,
Badrul
-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
Sent: Saturday, March 10, 2018 2:48 PM
To: pgsql-hackers@lists.postgresql.org
Subject: Intermittent pg_ctl failures on Windows
The buildfarm's Windows members occasionally show weird pg_ctl failures, for
instance this recent case:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbuildfarm.postgresql.org%2Fcgi-bin%2Fshow_log.pl%3Fnm%3Dbowerbird%26dt%3D2018-03-10%252020%253A30%253A20=04%7C01%7Cbachow%40microsoft.com%7C28a8094e84c74c26ecb108d586d91a9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563189370087651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1=qBtMsJ0EJFs4DVtkA6TZJhCDNlj392uNxsB6MHnu7po%3D=0
### Restarting node "master"
# Running: pg_ctl -D
G:/prog/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_006_logical_decoding_master_data/pgdata
-l
G:/prog/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_master.log
restart waiting for server to shut down done server stopped waiting for
server to startThe process cannot access the file because it is being used
by another process.
stopped waiting
pg_ctl: could not start server
Examine the log output.
Bail out! system pg_ctl failed
or this one:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbuildfarm.postgresql.org%2Fcgi-bin%2Fshow_log.pl%3Fnm%3Dbowerbird%26dt%3D2017-12-29%252023%253A30%253A24=04%7C01%7Cbachow%40microsoft.com%7C28a8094e84c74c26ecb108d586d91a9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636563189370087651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1=NdoDkZxBagXpiPDjNmhN6znHh%2BITyjEv2StPpLaabaw%3D=0
### Stopping node "subscriber" using mode fast # Running: pg_ctl -D
c:/prog/bf/root/HEAD/pgsql.build/src/test/subscription/tmp_check/t_001_rep_changes_subscriber_data/pgdata
-m fast stop waiting for server to shut downpg_ctl: could not open PID
file
"c:/prog/bf/root/HEAD/pgsql.build/src/test/subscription/tmp_check/t_001_rep_changes_subscriber_data/pgdata/postmaster.pid":
Permission denied Bail out! system pg_ctl failed
I'd been writing these off as Microsoft randomness and/or antivirus
interference, but it suddenly occurred to me that there might be a consistent
explanation: since commit f13ea95f9, when pg_ctl is waiting for server
start/stop, it is trying to read postmaster.pid more-or-less concurrently with
the postmaster writing to that file. On Unix that's not much of a problem, but
I believe that on Windows you have to specifically open the file with sharing
enabled, or you get error messages like these.
The postmaster should be enabling sharing, because port.h redirects open/fopen
to pgwin32_open/pgwin32_fopen which enable the sharing flags.
But it only does that #ifndef FRONTEND. So pg_ctl is just using naked open(),
which could explain these failures.
If this theory is accurate, it should be pretty easy to replicate the problem
if you modify the postmaster to hold postmaster.pid open longer when rewriting
it, e.g. stick fractional-second sleeps into CreateLockFile and
AddToDataDirLockFile.
I'm not in a position to investigate this in detail nor test a fix, but I think
somebody should.
regards, tom lane