On Mon, Jan 6, 2014 at 9:48 AM, Stephen Frost <sfr...@snowman.net> wrote:
>> None of these ideas are a complete solution for LWLOCK_STATS.  In the
>> other three cases noted above, we only need an identifier for the lock
>> "instantaneously", so that we can pass it off to the logger or dtrace
>> or whatever.  But LWLOCK_STATS wants to hold on to data about the
>> locks that were visited until the end of the session, and it does that
>> using an array that is *indexed* by lwlockid.  I guess we could
>> replace that with a hash table.  Ugh.  Any suggestions?
>
> Yeah, that's not fun.  No good suggestions here offhand.

Replacing it with a hashtable turns out not to be too bad, either in
terms of code complexity or performance, so I think that's the way to
go.  I did some test runs with pgbench -S, scale factor 300, 32
clients, shared_buffers=8GB, five minute runs and got these results:

resultsr.lwlock-stats.32.300.300:tps = 195493.037962 (including
connections establishing)
resultsr.lwlock-stats.32.300.300:tps = 189985.964658 (including
connections establishing)
resultsr.lwlock-stats.32.300.300:tps = 197641.293892 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 193286.066063 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 191832.100877 (including
connections establishing)
resultsr.lwlock-stats-htab.32.300.300:tps = 191899.834211 (including
connections establishing)
resultsr.master.32.300.300:tps = 197939.111998 (including connections
establishing)
resultsr.master.32.300.300:tps = 198641.337720 (including connections
establishing)
resultsr.master.32.300.300:tps = 198675.404349 (including connections
establishing)

"master" is the master branch, commit
10a82cda67731941c18256e009edad4a784a2994.  "lwlock-stats" is the same,
but with LWLOCK_STATS defined.  "lwlock-stats-htab" is the same, with
the attached patch and LWLOCK_STATS defined.  The runs were
interleaved, but the results are shown here grouped by test run.  If
we assume that the 189k result is an outlier, then there's probably
some regression associated with the lwlock-stats-htab patch, but not a
lot.  Considering that this code isn't even compiled unless you have
LWLOCK_STATS defined, I think that's OK.

This is only part of the solution, of course: a complete solution will
involve making the hash table key something other than the lock ID.
What I'm thinking we can do is making the lock ID consist of two
unsigned 32-bit integers.  One of these will be stored in the lwlock
itself, which if my calculations are correct won't increase the size
of LWLockPadded on any common platforms (a 64-bit integer would).
Let's call this the "tranch id".  The other will be derived from the
LWLock address.  Let's call this the "instance ID".  We'll keep a
table of tranch IDs, which will be assigned consecutively starting
with 0.  We'll keep an array of metadata for tranches, indexed by
tranch ID, and each entry will have three associated pieces of
information: an array base, a stride length, and a printable name.
When we need to identify an lwlock in the log or to dtrace, we'll
fetch the tranch ID from the lwlock itself and use that to index into
the tranch metadata array.  We'll then take the address of the lwlock,
subtract the array base address for the tranch, and divide by the
stride length; the result is the instance ID.  When reporting the
user, we can report either the tranch ID directly or the associated
name for that tranch; in either case, we'll also report the instance
ID.

So initially we'll probably just have tranch 0: the main LWLock array.
 If we move buffer content and I/O locks to the buffer headers, we'll
define tranch 1 and tranch 2 with the same base address: the start of
the buffer descriptor array, and the same stride length, the size of a
buffer descriptor.  One will have the associated name "buffer content
lock" and the other "buffer I/O lock".  If we want, we can define
split the main LWLock array into several tranches so that we can more
easily identify lock manager locks, predicate lock manager locks, and
buffer mapping locks.

I like this system because it's very cheap - we only need a small
array of metadata and a couple of memory accesses to name a lock - but
it still lets us report data in a way that's actually *more*
human-readable than what we have now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 4f88d3f..c1da2a8 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -32,6 +32,10 @@
 #include "storage/proc.h"
 #include "storage/spin.h"
 
+#ifdef LWLOCK_STATS
+#include "utils/hsearch.h"
+#endif
+
 
 /* We use the ShmemLock spinlock to protect LWLockAssign */
 extern slock_t *ShmemLock;
@@ -91,11 +95,17 @@ static int	lock_addin_request = 0;
 static bool lock_addin_request_allowed = true;
 
 #ifdef LWLOCK_STATS
+typedef struct lwlock_stats
+{
+	LWLockId	lockid;
+	int		sh_acquire_count;
+	int		ex_acquire_count;
+	int		block_count;
+	int		spin_delay_count;
+} lwlock_stats;
+
 static int	counts_for_pid = 0;
-static int *sh_acquire_counts;
-static int *ex_acquire_counts;
-static int *block_counts;
-static int *spin_delay_counts;
+static HTAB *lwlock_stats_htab;
 #endif
 
 #ifdef LOCK_DEBUG
@@ -126,17 +136,25 @@ LOG_LWDEBUG(const char *where, LWLockId lockid, const char *msg)
 
 static void init_lwlock_stats(void);
 static void print_lwlock_stats(int code, Datum arg);
+static lwlock_stats *get_lwlock_stats_entry(LWLockId lockid);
 
 static void
 init_lwlock_stats(void)
 {
-	int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	int			numLocks = LWLockCounter[1];
+	HASHCTL		ctl;
+
+	if (lwlock_stats_htab != NULL)
+	{
+		hash_destroy(lwlock_stats_htab);
+		lwlock_stats_htab = NULL;
+	}
 
-	sh_acquire_counts = calloc(numLocks, sizeof(int));
-	ex_acquire_counts = calloc(numLocks, sizeof(int));
-	spin_delay_counts = calloc(numLocks, sizeof(int));
-	block_counts = calloc(numLocks, sizeof(int));
+	MemSet(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(LWLockId);
+	ctl.entrysize = sizeof(lwlock_stats);
+	ctl.hash = tag_hash;
+	lwlock_stats_htab = hash_create("lwlock stats", 16384, &ctl,
+									HASH_ELEM | HASH_FUNCTION);
 	counts_for_pid = MyProcPid;
 	on_shmem_exit(print_lwlock_stats, 0);
 }
@@ -144,23 +162,47 @@ init_lwlock_stats(void)
 static void
 print_lwlock_stats(int code, Datum arg)
 {
-	int			i;
-	int		   *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int));
-	int			numLocks = LWLockCounter[1];
+	HASH_SEQ_STATUS scan;
+	lwlock_stats *lwstats;
+
+	hash_seq_init(&scan, lwlock_stats_htab);
 
 	/* Grab an LWLock to keep different backends from mixing reports */
 	LWLockAcquire(0, LW_EXCLUSIVE);
 
-	for (i = 0; i < numLocks; i++)
+	while ((lwstats = (lwlock_stats *) hash_seq_search(&scan)) != NULL)
 	{
-		if (sh_acquire_counts[i] || ex_acquire_counts[i] || block_counts[i] || spin_delay_counts[i])
-			fprintf(stderr, "PID %d lwlock %d: shacq %u exacq %u blk %u spindelay %u\n",
-					MyProcPid, i, sh_acquire_counts[i], ex_acquire_counts[i],
-					block_counts[i], spin_delay_counts[i]);
+		fprintf(stderr,
+				"PID %d lwlock %d: shacq %u exacq %u blk %u spindelay %u\n",
+				MyProcPid, lwstats->lockid, lwstats->sh_acquire_count,
+				lwstats->ex_acquire_count, lwstats->block_count,
+				lwstats->spin_delay_count);
 	}
 
 	LWLockRelease(0);
 }
+
+static lwlock_stats *
+get_lwlock_stats_entry(LWLockId lockid)
+{
+	lwlock_stats *lwstats;
+	bool	found;
+
+	/* Set up local count state first time through in a given process */
+	if (counts_for_pid != MyProcPid)
+		init_lwlock_stats();
+
+	/* Fetch or create the entry. */
+	lwstats = hash_search(lwlock_stats_htab, &lockid, HASH_ENTER, &found);
+	if (!found)
+	{
+		lwstats->sh_acquire_count = 0;
+		lwstats->ex_acquire_count = 0;
+		lwstats->block_count = 0;
+		lwstats->spin_delay_count = 0;
+	}
+	return lwstats;
+}
 #endif   /* LWLOCK_STATS */
 
 
@@ -344,18 +386,20 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 	PGPROC	   *proc = MyProc;
 	bool		retry = false;
 	int			extraWaits = 0;
+#ifdef LWLOCK_STATS
+	lwlock_stats *lwstats;
+#endif
 
 	PRINT_LWDEBUG("LWLockAcquire", lockid, lock);
 
 #ifdef LWLOCK_STATS
-	/* Set up local count state first time through in a given process */
-	if (counts_for_pid != MyProcPid)
-		init_lwlock_stats();
+	lwstats = get_lwlock_stats_entry(lockid);
+
 	/* Count lock acquisition attempts */
 	if (mode == LW_EXCLUSIVE)
-		ex_acquire_counts[lockid]++;
+		lwstats->ex_acquire_count++;
 	else
-		sh_acquire_counts[lockid]++;
+		lwstats->sh_acquire_count++;
 #endif   /* LWLOCK_STATS */
 
 	/*
@@ -398,7 +442,7 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 
 		/* Acquire mutex.  Time spent holding mutex should be short! */
 #ifdef LWLOCK_STATS
-		spin_delay_counts[lockid] += SpinLockAcquire(&lock->mutex);
+		lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
 #else
 		SpinLockAcquire(&lock->mutex);
 #endif
@@ -469,7 +513,7 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
 		LOG_LWDEBUG("LWLockAcquire", lockid, "waiting");
 
 #ifdef LWLOCK_STATS
-		block_counts[lockid]++;
+		lwstats->block_count++;
 #endif
 
 		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
@@ -598,13 +642,14 @@ LWLockAcquireOrWait(LWLockId lockid, LWLockMode mode)
 	PGPROC	   *proc = MyProc;
 	bool		mustwait;
 	int			extraWaits = 0;
+#ifdef LWLOCK_STATS
+	lwlock_stats *lwstats;
+#endif
 
 	PRINT_LWDEBUG("LWLockAcquireOrWait", lockid, lock);
 
 #ifdef LWLOCK_STATS
-	/* Set up local count state first time through in a given process */
-	if (counts_for_pid != MyProcPid)
-		init_lwlock_stats();
+	lwstats = get_lwlock_stats_entry(lockid);
 #endif
 
 	/* Ensure we will have room to remember the lock */
@@ -674,7 +719,7 @@ LWLockAcquireOrWait(LWLockId lockid, LWLockMode mode)
 		LOG_LWDEBUG("LWLockAcquireOrWait", lockid, "waiting");
 
 #ifdef LWLOCK_STATS
-		block_counts[lockid]++;
+		lwstats->block_count++;
 #endif
 
 		TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to