Robert Haas <robertmh...@gmail.com> writes:
> On Fri, Jul 25, 2014 at 6:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> This isn't really acceptable for production usage; if it were, we'd have
>> done it already.  The POSIX APIs lack any way to tell how many processes
>> are attached to a shmem segment, which is *necessary* functionality for
>> us (it's a critical part of the interlock against starting multiple
>> postmasters in one data directory).

> I think it would be good to spend some energy figuring out what to do
> about this.

Well, we've been around on this multiple times before, but if we have
any new ideas, sure ...

> In our last discussion on this topic, we talked about using file locks
> as a substitute for nattch.  You concluded that fcntl was totally
> broken for this purpose because of the possibility of some other piece
> of code accidentally opening and closing the lock file.[2]  lockf
> appears to have the same problem, but flock might not, at least on
> some systems.

My Linux man page for flock says

       flock()  does not lock files over NFS.  Use fcntl(2) instead: that does
       work over NFS, given a sufficiently  recent  version  of  Linux  and  a
       server which supports locking.

which seems like a showstopper problem; we might try to tell people not to
put their databases on NFS, but they're not gonna listen.  It also says

       flock()  and  fcntl(2)  locks  have different semantics with respect to
       forked processes and dup(2).  On systems that implement  flock()  using
       fcntl(2),  the  semantics  of  flock()  will  be  different  from those
       described in this manual page.

which is pretty scary if it's accurate for any still-extant platforms;
we might think we're using flock and still get fcntl behavior.  It's
also of concern that (AFAICS) flock is not in POSIX, which means we
can't even expect that platforms will agree on how it *should* behave.

I also noted that flock does not support atomic downgrade of exclusive
lock to shared lock, which seems like a problem for the lock inheritance
scheme sketched in
http://www.postgresql.org/message-id/18162.1340761...@sss.pgh.pa.us
... but OTOH, it sounds like flock locks are not only inherited through
fork() but even preserved across exec(), which would mean that we don't
need that scheme for file lock inheritance, even with EXEC_BACKEND.
Still, it's not clear to me how we could put much faith in flock.

> Finally, how about named pipes? Linux says that trying to open a
> named pipe for write when there are no readers will return ENXIO, and
> attempting to write to an already-open pipe with no remaining readers
> will cause SIGPIPE.  So: create a permanent named pipe in the data
> directory that all PostgreSQL processes keep open.  When the
> postmaster starts, it opens the pipe for read, then for write, then
> closes it for read.  It then tries to write to the pipe.  If this
> fails to result in SIGPIPE, then somebody else has got the thing open;
> so the new postmaster should die at once.   But if does get a SIGPIPE
> then there are as of that moment no other readers.

Hm.  That particular protocol is broken: two postmasters doing it at the
same time would both pass (because neither has it open for read at the
instant where they try to write).  But we could possibly frob the idea
until it works.  Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline
assumption about what's portable across Unixen, so maybe it would work.
But does NFS support named pipes?

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to