Hi hackers,

I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is 
defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug 
seems to be platform-independent and should reproduce in other environments.
A patch to fix the bug is also attached.

## Reproduing a crash

You can reproduce a crash by this way:

    git co a0841ecd2518d4505b96132b764b918ab5d21ad4
    git clean -dfx
    ./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
    make check

In my environment, the following messages appeared.

    ( omit... )
    ../../../src/test/regress/pg_regress --inputdir=. 
--temp-install=./tmp_check --top-builddir=../../..    --dlpath=.  
--schedule=./parallel_schedule
    ============== creating temporary installation        ==============
    ============== initializing database system           ==============
    
    pg_regress: initdb failed

and initdb.log contained the following messages.

    reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
    creating subdirectories ... ok
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting dynamic shared memory implementation ... posix
    creating configuration files ... ok
    creating template1 database in 
/tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 
142: shacq 0 exacq 1 blk 0 spindelay 0
    ( omit... )
    PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
    PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
    ok
    loading system objects' descriptions ... TRAP: 
FailedAssertion("!(CritSectionCount == 0 || (context) == ErrorContext || 
(MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594)
    Aborted (core dumped)
    child process exited with exit code 134
    initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not 
removed at user's request

## The cause of crash

The failing assertion is for prohibiting memory allocation in a critical 
section, which is introduced by commit 4a170ee9 on 2014-04-04.

In my understanding, the root cause of the assertion failure is on-demand 
allocation of lwlock_stats entry.  For each LWLock, a lwlock_stats entry is 
created at the first invocation of LWLockAcquire using MemoryContextAlloc.  If 
the first invocation is in a critical section, the assertion fails.

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function 
was the problem.
I also confirmed the assertion failure on starting postgres on a correctly 
initialized database. In this case, locking CheckpointerCommLock in 
AbsorbFsyncRequests function was the problem.

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash 
table should be populated at the initialization of each process.
The attached patch populate lwlock_stats entries of MainLWLockArray at the end 
of CreateLWLocks, InitProcess and InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this 
patch is not perfect because it does not cover LWLocks outside of 
MainLWLockArray.  I'm not sure where is the right place to initialize 
lwlock_stats entries for that locks.  So I feel it needs some refinements by 
you hackers.
From f96708d14ab0073abd95c463eaf8d60db42f411d Mon Sep 17 00:00:00 2001
From: Yuto Hayamizu <y.hayam...@gmail.com>
Date: Tue, 20 May 2014 16:19:56 +0900
Subject: [PATCH] pre-populating lwlock_stats entries

---
 src/backend/storage/lmgr/lwlock.c |   21 +++++++++++++++++++++
 src/backend/storage/lmgr/proc.c   |    8 ++++++++
 src/include/storage/lwlock.h      |    4 ++++
 3 files changed, 33 insertions(+)

diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index d23ac62..fc97ae0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -359,8 +359,29 @@ CreateLWLocks(void)
        MainLWLockTranche.array_base = MainLWLockArray;
        MainLWLockTranche.array_stride = sizeof(LWLockPadded);
        LWLockRegisterTranche(0, &MainLWLockTranche);
+
+#ifdef LWLOCK_STATS
+       InitializeLWLockStats();
+#endif
 }
 
+#ifdef LWLOCK_STATS
+void
+InitializeLWLockStats(void)
+{
+       int                     numLocks = NumLWLocks();
+       int                     id;
+       LWLockPadded    *lock;
+
+       if (MyProc == NULL) return;
+
+       /* Initialize all lwlock_stats entries of MainLWLockArray */
+       for (id = 0, lock = MainLWLockArray; id < numLocks; id++, lock++)
+       {
+               get_lwlock_stats_entry(&lock->lock);
+       }
+}
+#endif
 
 /*
  * LWLockAssign - assign a dynamically-allocated LWLock number
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 266b0da..e09cbf8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -415,6 +415,10 @@ InitProcess(void)
         * the deadlock checker.
         */
        InitDeadLockChecking();
+
+#ifdef LWLOCK_STATS
+       InitializeLWLockStats();
+#endif
 }
 
 /*
@@ -566,6 +570,10 @@ InitAuxiliaryProcess(void)
         * Arrange to clean up at process exit.
         */
        on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype));
+
+#ifdef LWLOCK_STATS
+       InitializeLWLockStats();
+#endif
 }
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 175fae3..f0f1c21 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,10 @@ extern int LWLockNewTrancheId(void);
 extern void LWLockRegisterTranche(int, LWLockTranche *);
 extern void LWLockInitialize(LWLock *, int tranche_id);
 
+#ifdef LWLOCK_STATS
+extern void InitializeLWLockStats(void);
+#endif
+
 /*
  * Prior to PostgreSQL 9.4, we used an enum type called LWLockId to refer
  * to LWLocks.  New code should instead use LWLock *.  However, for the
-- 
1.7.10.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to