On 06/30/2015 07:05 PM, Jeff Janes wrote:
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:

On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <p...@heroku.com> wrote:

On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
Is there a way to use gdb to figure out who holds the lock they are
waiting
for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.


I hadn't planned on running into this problem, so had not compiled
accordingly.

I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.

Since I messed up the gdb session causing the postmaster to SIGKILL all
the children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.


I've gotten the LWLock deadlock again.  User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 .  So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.

I'm not sure where to go from here.

The user backend 24841 is waiting in a LWLockWaitForVar() call, on WALInsertLock 6, and oldval==0. The checkpointer is holding WALInsertLock 6, but it has set the insertingat value to PG_UINT64_MAX. The LWLockWaitForVar() call should not be blocked on that, because 0 != PG_UINT64_MAX.

Looks like the LWLock-scalability patch that made LWlock acquisiton to use atomic ops instead of a spinlock broke this. LWLockWaitForVar() is supposed to guarantee that it always wakes up from sleep, when the variable's value changes. But there is now a race condition in LWLockWaitForVar that wasn't there in 9.4:

                if (mustwait)
                {
                        /*
                         * Perform comparison using spinlock as we can't rely 
on atomic 64
                         * bit reads/stores.
                         */
#ifdef LWLOCK_STATS
                        lwstats->spin_delay_count += 
SpinLockAcquire(&lock->mutex);
#else
                        SpinLockAcquire(&lock->mutex);
#endif

                        /*
                         * XXX: We can significantly optimize this on platforms 
with 64bit
                         * atomics.
                         */
                        value = *valptr;
                        if (value != oldval)
                        {
                                result = false;
                                mustwait = false;
                                *newval = value;
                        }
                        else
                                mustwait = true;
                        SpinLockRelease(&lock->mutex);
                }
                else
                        mustwait = false;

                if (!mustwait)
                        break;                          /* the lock was free or 
value didn't match */

                /*
                 * Add myself to wait queue. Note that this is racy, somebody 
else
                 * could wakeup before we're finished queuing. NB: We're using 
nearly
                 * the same twice-in-a-row lock acquisition protocol as
                 * LWLockAcquire(). Check its comments for details.
                 */
                LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);

After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued.

Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(.
- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to