I will on vacation until August 11, I look forward to any progress you are able
to make.
Since ensuring there are not orphaned back-end processes is vital, could we add
a check for getppid() == 1 ?
Patch below seemed to work on QNX (first client command after a kill -9 of
postmaster resulted in exit of its associated server process).
diff -rdup postgresql-9.3.5/src/backend/tcop/postgres.c
postgresql-9.3.5_qnx/src/backend/tcop/postgres.c
--- postgresql-9.3.5/src/backend/tcop/postgres.c 2014-07-21
15:10:42.000000000 -0400
+++ postgresql-9.3.5_qnx/src/backend/tcop/postgres.c 2014-07-31
18:17:40.000000000 -0400
@@ -3967,6 +3967,14 @@ PostgresMain(int argc, char *argv[],
*/
firstchar = ReadCommand(&input_message);
+#ifndef WIN32
+ /* Check for death of parent */
+ if (getppid() == 1)
+ ereport(FATAL,
+ (errcode(ERRCODE_CRASH_SHUTDOWN),
+ errmsg("Parent server process has
exited")));
+#endif
+
/*
* (4) disable async signal conditions again.
*/
Keith Baker
> -----Original Message-----
> From: Robert Haas [mailto:[email protected]]
> Sent: Thursday, July 31, 2014 12:58 PM
> To: Tom Lane
> Cc: Baker, Keith [OCDUS Non-J&J]; [email protected]
> Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
>
> On Wed, Jul 30, 2014 at 11:02 AM, Tom Lane <[email protected]> wrote:
> > So it seems like we could possibly go this route, assuming we can
> > think of a variant of your proposal that's race-condition-free. A
> > disadvantage compared to a true file lock is that it would not protect
> > against people trying to start postmasters from two different NFS
> > client machines --- but we don't have protection against that now.
> > (Maybe we could do this *and* do a regular file lock to offer some
> > protection against that case, even if it's not bulletproof?)
>
> That's not a bad idea. By the way, it also wouldn't be too hard to test at
> runtime whether or not flock() has first-close semantics. Not that we'd want
> this exact design, but suppose you configure shmem_interlock=flock in
> postgresql.conf. On startup, we test whether flock is reliable, determine
> that it is, and proceed accordingly.
> Now, you move your database onto an NFS volume and the semantics
> change (because, hey, breaking userspace assumptions is fun) and try to
> restart up your database, and it says FATAL: flock() is broken.
> Now you can either move the database back, or set shmem_interlock to
> some other value.
>
> Now maybe, as you say, it's best to use multiple locking protocols and hope
> that at least one will catch whatever the dangerous situation is.
> I'm just trying to point out that we need not blindly assume the semantics we
> want are there (or that they are not); we can check.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers