Hi hackers, I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is defined. I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug seems to be platform-independent and should reproduce in other environments. A patch to fix the bug is also attached.
## Reproduing a crash You can reproduce a crash by this way: git co a0841ecd2518d4505b96132b764b918ab5d21ad4 git clean -dfx ./configure --enable-cassert CFLAGS='-DLWLOCK_STATS' make check In my environment, the following messages appeared. ( omit... ) ../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../.. --dlpath=. --schedule=./parallel_schedule ============== creating temporary installation ============== ============== initializing database system ============== pg_regress: initdb failed and initdb.log contained the following messages. reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok creating template1 database in /tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 142: shacq 0 exacq 1 blk 0 spindelay 0 ( omit... ) PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0 PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0 ok loading system objects' descriptions ... TRAP: FailedAssertion("!(CritSectionCount == 0 || (context) == ErrorContext || (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594) Aborted (core dumped) child process exited with exit code 134 initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not removed at user's request ## The cause of crash The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9 on 2014-04-04. In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry. For each LWLock, a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc. If the first invocation is in a critical section, the assertion fails. For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem. I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking CheckpointerCommLock in AbsorbFsyncRequests function was the problem. ## A solution In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the initialization of each process. The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and InitAuxiliaryProcess. With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not cover LWLocks outside of MainLWLockArray. I'm not sure where is the right place to initialize lwlock_stats entries for that locks. So I feel it needs some refinements by you hackers.
From f96708d14ab0073abd95c463eaf8d60db42f411d Mon Sep 17 00:00:00 2001 From: Yuto Hayamizu <y.hayam...@gmail.com> Date: Tue, 20 May 2014 16:19:56 +0900 Subject: [PATCH] pre-populating lwlock_stats entries --- src/backend/storage/lmgr/lwlock.c | 21 +++++++++++++++++++++ src/backend/storage/lmgr/proc.c | 8 ++++++++ src/include/storage/lwlock.h | 4 ++++ 3 files changed, 33 insertions(+) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index d23ac62..fc97ae0 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -359,8 +359,29 @@ CreateLWLocks(void) MainLWLockTranche.array_base = MainLWLockArray; MainLWLockTranche.array_stride = sizeof(LWLockPadded); LWLockRegisterTranche(0, &MainLWLockTranche); + +#ifdef LWLOCK_STATS + InitializeLWLockStats(); +#endif } +#ifdef LWLOCK_STATS +void +InitializeLWLockStats(void) +{ + int numLocks = NumLWLocks(); + int id; + LWLockPadded *lock; + + if (MyProc == NULL) return; + + /* Initialize all lwlock_stats entries of MainLWLockArray */ + for (id = 0, lock = MainLWLockArray; id < numLocks; id++, lock++) + { + get_lwlock_stats_entry(&lock->lock); + } +} +#endif /* * LWLockAssign - assign a dynamically-allocated LWLock number diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 266b0da..e09cbf8 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -415,6 +415,10 @@ InitProcess(void) * the deadlock checker. */ InitDeadLockChecking(); + +#ifdef LWLOCK_STATS + InitializeLWLockStats(); +#endif } /* @@ -566,6 +570,10 @@ InitAuxiliaryProcess(void) * Arrange to clean up at process exit. */ on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype)); + +#ifdef LWLOCK_STATS + InitializeLWLockStats(); +#endif } /* diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h index 175fae3..f0f1c21 100644 --- a/src/include/storage/lwlock.h +++ b/src/include/storage/lwlock.h @@ -209,6 +209,10 @@ extern int LWLockNewTrancheId(void); extern void LWLockRegisterTranche(int, LWLockTranche *); extern void LWLockInitialize(LWLock *, int tranche_id); +#ifdef LWLOCK_STATS +extern void InitializeLWLockStats(void); +#endif + /* * Prior to PostgreSQL 9.4, we used an enum type called LWLockId to refer * to LWLocks. New code should instead use LWLock *. However, for the -- 1.7.10.4
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers