I will on vacation until August 11, I look forward to any progress you are able 
to make.

Since ensuring there are not orphaned back-end processes is vital, could we add 
a check for getppid() == 1 ?
Patch below seemed to work on QNX (first client command after a kill -9 of 
postmaster resulted in exit of its associated server process).

        diff -rdup postgresql-9.3.5/src/backend/tcop/postgres.c 
postgresql-9.3.5_qnx/src/backend/tcop/postgres.c
        --- postgresql-9.3.5/src/backend/tcop/postgres.c        2014-07-21 
15:10:42.000000000 -0400
        +++ postgresql-9.3.5_qnx/src/backend/tcop/postgres.c    2014-07-31 
18:17:40.000000000 -0400
        @@ -3967,6 +3967,14 @@ PostgresMain(int argc, char *argv[],
                         */
                        firstchar = ReadCommand(&input_message);
         
        +#ifndef WIN32
        +               /* Check for death of parent */
        +               if (getppid() == 1)
        +                       ereport(FATAL,
        +                               (errcode(ERRCODE_CRASH_SHUTDOWN),
        +                                errmsg("Parent server process has 
exited")));
        +#endif
        +
                        /*
                         * (4) disable async signal conditions again.
                         */

Keith Baker 

> -----Original Message-----
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Thursday, July 31, 2014 12:58 PM
> To: Tom Lane
> Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
> 
> On Wed, Jul 30, 2014 at 11:02 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > So it seems like we could possibly go this route, assuming we can
> > think of a variant of your proposal that's race-condition-free.  A
> > disadvantage compared to a true file lock is that it would not protect
> > against people trying to start postmasters from two different NFS
> > client machines --- but we don't have protection against that now.
> > (Maybe we could do this *and* do a regular file lock to offer some
> > protection against that case, even if it's not bulletproof?)
> 
> That's not a bad idea.  By the way, it also wouldn't be too hard to test at
> runtime whether or not flock() has first-close semantics.  Not that we'd want
> this exact design, but suppose you configure shmem_interlock=flock in
> postgresql.conf.  On startup, we test whether flock is reliable, determine
> that it is, and proceed accordingly.
> Now, you move your database onto an NFS volume and the semantics
> change (because, hey, breaking userspace assumptions is fun) and try to
> restart up your database, and it says FATAL: flock() is broken.
> Now you can either move the database back, or set shmem_interlock to
> some other value.
> 
> Now maybe, as you say, it's best to use multiple locking protocols and hope
> that at least one will catch whatever the dangerous situation is.
> I'm just trying to point out that we need not blindly assume the semantics we
> want are there (or that they are not); we can check.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to