Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes: > Do you want to take a look at move those elog calls around a bit? That > should do it.
It would be a good idea to have some clarity on *why* that should do it. Looking at the original report's log, but without having actually reproduced the problem, I guess what is happening is this: 1. Subscription worker process (23117) gets a duplicate key conflict while trying to apply an update, and in consequence it exits. (Is that supposed to happen?) 2. Publication server process (23124) doesn't notice client connection loss right away. By chance, the next thing it tries to send to the client is the debug output from LogicalIncreaseRestartDecodingForSlot. Then it detects loss of connection (at 2017-06-21 14:55:12.033) and FATAL's out. But since the spinlock stuff has no tracking infrastructure, we don't know we are still holding the replication slot mutex. 3. Process exit cleanup does know that it's supposed to release the replication slot, so it tries to take the mutex spinlock ... again. Eventually that times out and we get the "stuck spinlock" panic. All correct so far? So, okay, the proximate cause of the crash is a blatant violation of the rule that spinlocks may only be held across straight-line code segments. But I'm wondering about the client exit having occurred in the first place. Why is that, and how would one ever recover? It sure looks like this isn't the first subscription worker process that has tried and failed to apply the update. If our attitude towards this situation is that it's okay to fork-bomb your server with worker processes continually respawning and making no progress, well, I don't think that's good enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers