On Thu, Apr 18, 2019 at 05:57:39PM -0400, Tom Lane wrote: > It's the latter. I searched the buildfarm database for failure logs > including the string "server does not shut down" within the last three > years, and got all of the hits attached. Not all of these look like > the failure pattern Michael pointed to, but enough of them do to say > that the problem has existed since at least mid-2017. To be concrete, > we have quite a sample of cases where a standby server has received a > "fast shutdown" signal and acknowledged that in its log, but it never > gets to the expected "shutting down" message, meaning it never starts > the shutdown checkpoint let alone finishes it. The oldest case that > clearly looks like that is > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2017-06-02%2018%3A54%3A29
Interesting. I was sort of thinking about c6c3334 first but this failed based on 9fcf670, which does not include the former. > This leads me to suspect that the problem is (a) some very low-level issue > in spinlocks or or latches or the like, or (b) a timing problem that just > doesn't show up on generic Intel-oid platforms. The timing theory is > maybe a bit stronger given that one test case shows this more often than > others. I've not got any clear ideas beyond that. > > Anyway, this is *not* new in v12. Indeed. It seems to me that v12 makes the problem easier to appear though, and I got to wonder if c6c9474 is helping in that as more cases are popping up since mid-March. -- Michael
signature.asc
Description: PGP signature