On Mon, Aug 18, 2014 at 09:01:20AM -0400, Robert Haas wrote: > On Sat, Aug 16, 2014 at 3:28 AM, Noah Misch <n...@leadboat.com> wrote: > >> I'd be afraid that a secondary mechanism that mostly-but-not-really > >> works could do more harm by allowing us to miss bugs in the primary, > >> pipe-based locking mechanism than the good it would accomplish. > > > > Users do corrupt their NFS- and GFS2-hosted databases today. I would rather > > have each process hold only an fcntl() lock than hold only the FIFO file > > descriptor. There's no such dichotomy, so let's have both. > > Meh. We can do that, but I think that will provide us with only the > it-works-until-it-doesn't level of protection. Granted, that's more > than zero, but does anyone advocate wearing seatbelts for the first 60 > minutes you're in the car and then taking them off after that? I > think that with a sufficiently long-running server the chances of the > lock somehow getting released approach certainty. But I'm not going > to fight this one tooth and nail.
In case it wasn't clear, I advocate both using the FIFO defense and holding fcntl locks throughout the life of every PostgreSQL process having a shared memory attachment. I grant that this raises the chance of a shortcoming in one mechanism remaining undiscovered. However, we already know that each by itself has limitations. I don't like the prospect of accepting a known hole to help discover unknown holes. We could have the would-be new postmaster, when it hits a fcntl lock conflict, proceed with the FIFO check anyway. If the FIFO check says "go" after the fcntl check said "stop", emit a message about the apparent bug. (That's oversimplified; it needs looping to account for the case of the old postmaster exiting concurrently.) > A bigger question in my view is what to do with the existing > mechanism. The main advantage of making a change like this is that we > could finally dispense with System V shared memory completely. But we > risk encountering systems where the battle-tested System V mechanism > works and this new one either fails to work at all (server won't > start) or fails to work as desired (interlock broken). So it's > tempting to think we should have a GUC or control-file setting to > control which mechanism gets used. Of course for QNX, the actual > subject of this thread, System V won't be an option, but other people > might like a big red button they can push if the new code turns out to > be less than we're hoping. A GUC sounds fine to me, as would using the sysv interlock unconditionally for a couple more releases before removing it. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers