This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
variant instead of copying that code with possibly a change in that
function to return the required information.

I am also seeing a pattern
Assert(LWLockHeldByMe*())
LWLockAcquire()

at some places. Should we change LWLockAcquire to do
Assert(LWLockHelpByMe()) always to detect such occurrences? Enhance
that pattern to print the information that your patch prints?

It looks weird that we can detect a self deadlock but not handle it.
But handling it requires remembering multiple LWLocks held and then
release them those many times. That may not leave LWLocks LW anymore.

On Thu, Nov 19, 2020 at 4:02 PM Craig Ringer
<craig.rin...@enterprisedb.com> wrote:
>
> Hi all
>
> Here's a patch I wrote a while ago to detect and report when a 
> LWLockAcquire() results in a simple self-deadlock due to the caller already 
> holding the LWLock.
>
> To avoid affecting hot-path performance, it only fires the check on the first 
> iteration through the retry loops in LWLockAcquire() and LWLockWaitForVar(), 
> and just before we sleep, once the fast-path has been missed.
>
> I wrote an earlier version of this when I was chasing down some hairy issues 
> with background workers deadlocking on some exit paths because ereport(ERROR) 
> or elog(ERROR) calls fired when a LWLock was held would cause a 
> before_shmem_exit or on_shmem_exit cleanup function to deadlock when it tried 
> to acquire the same lock.
>
> But it's an easy enough mistake to make and a seriously annoying one to track 
> down, so I figured I'd post it for consideration. Maybe someone else will get 
> some use out of it even if nobody likes the idea of merging it.
>
> As written the check runs only for --enable-cassert builds or when LOCK_DEBUG 
> is defined.



-- 
Best Wishes,
Ashutosh Bapat


Reply via email to