Hi, On 2020-09-09 16:30:37 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2020-09-09 16:09:00 -0400, Tom Lane wrote: > >> We could call it startup_packet_die or something? > > > Yea, I think that'd be good. > > I'll make it so.
Thanks! > >> We see backends going through this code on a very regular basis in the > >> buildfarm, but complete hangs are rare as can be. I think you > >> overestimate the severity of the problem. > > > I don't think the BF exercises the problmetic paths to a significant > > degree. It's mostly local socket connections, and where not it's > > localhost. There's no slow DNS, no more complicated authentication > > methods, no packet loss. How often do we ever actually end up even > > getting close to any of the paths but immediate shutdowns? > > Since we're talking about quickdie(), immediate shutdown/crash restart > is exactly the case of concern, and the buildfarm exercises it all the > time. Yea, but only in simple cases. Largely no SSL / kerberos. Largely untranslated. Mostly the immediate shutdowns aren't when inside plpython or such. > > And in the > > SIGQUIT path, how often do we end up in the SIGKILL path, masking > > potential deadlocks? > > True, we can't really tell that. I wonder if we should make the > postmaster emit a log message when it times out and goes to SIGKILL. > After a few months we could scrape the buildfarm logs and get a > pretty good handle on it. I think that'd be a good idea. Greetings, Andres Freund