On Mon, Jan 6, 2014 at 9:48 AM, Stephen Frost <sfr...@snowman.net> wrote: >> None of these ideas are a complete solution for LWLOCK_STATS. In the >> other three cases noted above, we only need an identifier for the lock >> "instantaneously", so that we can pass it off to the logger or dtrace >> or whatever. But LWLOCK_STATS wants to hold on to data about the >> locks that were visited until the end of the session, and it does that >> using an array that is *indexed* by lwlockid. I guess we could >> replace that with a hash table. Ugh. Any suggestions? > > Yeah, that's not fun. No good suggestions here offhand.
Replacing it with a hashtable turns out not to be too bad, either in terms of code complexity or performance, so I think that's the way to go. I did some test runs with pgbench -S, scale factor 300, 32 clients, shared_buffers=8GB, five minute runs and got these results: resultsr.lwlock-stats.32.300.300:tps = 195493.037962 (including connections establishing) resultsr.lwlock-stats.32.300.300:tps = 189985.964658 (including connections establishing) resultsr.lwlock-stats.32.300.300:tps = 197641.293892 (including connections establishing) resultsr.lwlock-stats-htab.32.300.300:tps = 193286.066063 (including connections establishing) resultsr.lwlock-stats-htab.32.300.300:tps = 191832.100877 (including connections establishing) resultsr.lwlock-stats-htab.32.300.300:tps = 191899.834211 (including connections establishing) resultsr.master.32.300.300:tps = 197939.111998 (including connections establishing) resultsr.master.32.300.300:tps = 198641.337720 (including connections establishing) resultsr.master.32.300.300:tps = 198675.404349 (including connections establishing) "master" is the master branch, commit 10a82cda67731941c18256e009edad4a784a2994. "lwlock-stats" is the same, but with LWLOCK_STATS defined. "lwlock-stats-htab" is the same, with the attached patch and LWLOCK_STATS defined. The runs were interleaved, but the results are shown here grouped by test run. If we assume that the 189k result is an outlier, then there's probably some regression associated with the lwlock-stats-htab patch, but not a lot. Considering that this code isn't even compiled unless you have LWLOCK_STATS defined, I think that's OK. This is only part of the solution, of course: a complete solution will involve making the hash table key something other than the lock ID. What I'm thinking we can do is making the lock ID consist of two unsigned 32-bit integers. One of these will be stored in the lwlock itself, which if my calculations are correct won't increase the size of LWLockPadded on any common platforms (a 64-bit integer would). Let's call this the "tranch id". The other will be derived from the LWLock address. Let's call this the "instance ID". We'll keep a table of tranch IDs, which will be assigned consecutively starting with 0. We'll keep an array of metadata for tranches, indexed by tranch ID, and each entry will have three associated pieces of information: an array base, a stride length, and a printable name. When we need to identify an lwlock in the log or to dtrace, we'll fetch the tranch ID from the lwlock itself and use that to index into the tranch metadata array. We'll then take the address of the lwlock, subtract the array base address for the tranch, and divide by the stride length; the result is the instance ID. When reporting the user, we can report either the tranch ID directly or the associated name for that tranch; in either case, we'll also report the instance ID. So initially we'll probably just have tranch 0: the main LWLock array. If we move buffer content and I/O locks to the buffer headers, we'll define tranch 1 and tranch 2 with the same base address: the start of the buffer descriptor array, and the same stride length, the size of a buffer descriptor. One will have the associated name "buffer content lock" and the other "buffer I/O lock". If we want, we can define split the main LWLock array into several tranches so that we can more easily identify lock manager locks, predicate lock manager locks, and buffer mapping locks. I like this system because it's very cheap - we only need a small array of metadata and a couple of memory accesses to name a lock - but it still lets us report data in a way that's actually *more* human-readable than what we have now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 4f88d3f..c1da2a8 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -32,6 +32,10 @@ #include "storage/proc.h" #include "storage/spin.h" +#ifdef LWLOCK_STATS +#include "utils/hsearch.h" +#endif + /* We use the ShmemLock spinlock to protect LWLockAssign */ extern slock_t *ShmemLock; @@ -91,11 +95,17 @@ static int lock_addin_request = 0; static bool lock_addin_request_allowed = true; #ifdef LWLOCK_STATS +typedef struct lwlock_stats +{ + LWLockId lockid; + int sh_acquire_count; + int ex_acquire_count; + int block_count; + int spin_delay_count; +} lwlock_stats; + static int counts_for_pid = 0; -static int *sh_acquire_counts; -static int *ex_acquire_counts; -static int *block_counts; -static int *spin_delay_counts; +static HTAB *lwlock_stats_htab; #endif #ifdef LOCK_DEBUG @@ -126,17 +136,25 @@ LOG_LWDEBUG(const char *where, LWLockId lockid, const char *msg) static void init_lwlock_stats(void); static void print_lwlock_stats(int code, Datum arg); +static lwlock_stats *get_lwlock_stats_entry(LWLockId lockid); static void init_lwlock_stats(void) { - int *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int)); - int numLocks = LWLockCounter[1]; + HASHCTL ctl; + + if (lwlock_stats_htab != NULL) + { + hash_destroy(lwlock_stats_htab); + lwlock_stats_htab = NULL; + } - sh_acquire_counts = calloc(numLocks, sizeof(int)); - ex_acquire_counts = calloc(numLocks, sizeof(int)); - spin_delay_counts = calloc(numLocks, sizeof(int)); - block_counts = calloc(numLocks, sizeof(int)); + MemSet(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(LWLockId); + ctl.entrysize = sizeof(lwlock_stats); + ctl.hash = tag_hash; + lwlock_stats_htab = hash_create("lwlock stats", 16384, &ctl, + HASH_ELEM | HASH_FUNCTION); counts_for_pid = MyProcPid; on_shmem_exit(print_lwlock_stats, 0); } @@ -144,23 +162,47 @@ init_lwlock_stats(void) static void print_lwlock_stats(int code, Datum arg) { - int i; - int *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int)); - int numLocks = LWLockCounter[1]; + HASH_SEQ_STATUS scan; + lwlock_stats *lwstats; + + hash_seq_init(&scan, lwlock_stats_htab); /* Grab an LWLock to keep different backends from mixing reports */ LWLockAcquire(0, LW_EXCLUSIVE); - for (i = 0; i < numLocks; i++) + while ((lwstats = (lwlock_stats *) hash_seq_search(&scan)) != NULL) { - if (sh_acquire_counts[i] || ex_acquire_counts[i] || block_counts[i] || spin_delay_counts[i]) - fprintf(stderr, "PID %d lwlock %d: shacq %u exacq %u blk %u spindelay %u\n", - MyProcPid, i, sh_acquire_counts[i], ex_acquire_counts[i], - block_counts[i], spin_delay_counts[i]); + fprintf(stderr, + "PID %d lwlock %d: shacq %u exacq %u blk %u spindelay %u\n", + MyProcPid, lwstats->lockid, lwstats->sh_acquire_count, + lwstats->ex_acquire_count, lwstats->block_count, + lwstats->spin_delay_count); } LWLockRelease(0); } + +static lwlock_stats * +get_lwlock_stats_entry(LWLockId lockid) +{ + lwlock_stats *lwstats; + bool found; + + /* Set up local count state first time through in a given process */ + if (counts_for_pid != MyProcPid) + init_lwlock_stats(); + + /* Fetch or create the entry. */ + lwstats = hash_search(lwlock_stats_htab, &lockid, HASH_ENTER, &found); + if (!found) + { + lwstats->sh_acquire_count = 0; + lwstats->ex_acquire_count = 0; + lwstats->block_count = 0; + lwstats->spin_delay_count = 0; + } + return lwstats; +} #endif /* LWLOCK_STATS */ @@ -344,18 +386,20 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode) PGPROC *proc = MyProc; bool retry = false; int extraWaits = 0; +#ifdef LWLOCK_STATS + lwlock_stats *lwstats; +#endif PRINT_LWDEBUG("LWLockAcquire", lockid, lock); #ifdef LWLOCK_STATS - /* Set up local count state first time through in a given process */ - if (counts_for_pid != MyProcPid) - init_lwlock_stats(); + lwstats = get_lwlock_stats_entry(lockid); + /* Count lock acquisition attempts */ if (mode == LW_EXCLUSIVE) - ex_acquire_counts[lockid]++; + lwstats->ex_acquire_count++; else - sh_acquire_counts[lockid]++; + lwstats->sh_acquire_count++; #endif /* LWLOCK_STATS */ /* @@ -398,7 +442,7 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode) /* Acquire mutex. Time spent holding mutex should be short! */ #ifdef LWLOCK_STATS - spin_delay_counts[lockid] += SpinLockAcquire(&lock->mutex); + lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex); #else SpinLockAcquire(&lock->mutex); #endif @@ -469,7 +513,7 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode) LOG_LWDEBUG("LWLockAcquire", lockid, "waiting"); #ifdef LWLOCK_STATS - block_counts[lockid]++; + lwstats->block_count++; #endif TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode); @@ -598,13 +642,14 @@ LWLockAcquireOrWait(LWLockId lockid, LWLockMode mode) PGPROC *proc = MyProc; bool mustwait; int extraWaits = 0; +#ifdef LWLOCK_STATS + lwlock_stats *lwstats; +#endif PRINT_LWDEBUG("LWLockAcquireOrWait", lockid, lock); #ifdef LWLOCK_STATS - /* Set up local count state first time through in a given process */ - if (counts_for_pid != MyProcPid) - init_lwlock_stats(); + lwstats = get_lwlock_stats_entry(lockid); #endif /* Ensure we will have room to remember the lock */ @@ -674,7 +719,7 @@ LWLockAcquireOrWait(LWLockId lockid, LWLockMode mode) LOG_LWDEBUG("LWLockAcquireOrWait", lockid, "waiting"); #ifdef LWLOCK_STATS - block_counts[lockid]++; + lwstats->block_count++; #endif TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers