On 24.11.2010 07:07, Robert Haas wrote:
Per previous threats, I spent some time tonight running oprofile
(using the directions Tom Lane was foolish enough to provide me back
in May).  I took testlibpq.c and hacked it up to just connect to the
server and then disconnect in a tight loop without doing anything
useful, hoping to measure the overhead of starting up a new
connection.  Ha, ha, funny about that:

120899   18.0616  postgres                 AtProcExit_Buffers
56891     8.4992  libc-2.11.2.so           memset
30987     4.6293  libc-2.11.2.so           memcpy
26944     4.0253  postgres                 hash_search_with_hash_value
26554     3.9670  postgres                 AllocSetAlloc
20407     3.0487  libc-2.11.2.so           _int_malloc
17269     2.5799  libc-2.11.2.so           fread
13005     1.9429  ld-2.11.2.so             do_lookup_x
11850     1.7703  ld-2.11.2.so             _dl_fixup
10194     1.5229  libc-2.11.2.so           _IO_file_xsgetn

In English: the #1 overhead here is actually something that happens
when processes EXIT, not when they start.  Essentially all the time is
in two lines:

  56920  6.6006 :        for (i = 0; i<  NBuffers; i++)
                :        {
  98745 11.4507 :                if (PrivateRefCount[i] != 0)

Oh, that's quite surprising.

Anything we can do about this?  That's a lot of overhead, and it'd be
a lot worse on a big machine with 8GB of shared_buffers.

Micro-optimizing that search for the non-zero value helps a little bit (attached). Reduces the percentage shown by oprofile from about 16% to 12% on my laptop.

For bigger gains, I think you need to somehow make the PrivateRefCount smaller. Perhaps only use one byte for each buffer instead of int32, and use some sort of an overflow list for the rare case that a buffer is pinned more than 255 times. Or make it a hash table instead of a simple lookup array. But whatever you do, you have to be very careful to not add overhead to PinBuffer/UnPinBuffer, those can already be quite high in oprofile reports of real applications. It might be worth experimenting a bit, at the moment PrivateRefCount takes up 5MB of memory per 1GB of shared_buffers. Machines with a high shared_buffers setting have no shortage of memory, but a large array like that might waste a lot of precious CPU cache.

Now, the other question is if this really matters. Even if we eliminate that loop in AtProcExit_Buffers altogether, is connect/disconnect still be so slow that you have to use a connection pooler if you do that a lot?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 54c7109..03593fd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1665,11 +1665,20 @@ static void
 AtProcExit_Buffers(int code, Datum arg)
 {
 	int			i;
+	int		   *ptr;
+	int		   *end;
 
 	AbortBufferIO();
 	UnlockBuffers();
 
-	for (i = 0; i < NBuffers; i++)
+	/* Fast search for the first non-zero entry in PrivateRefCount */
+	end = (int *) &PrivateRefCount[NBuffers - 1];
+	ptr = (int *) PrivateRefCount;
+	while(ptr < end && *ptr == 0)
+		ptr++;
+	i = ((int32 *) ptr) - PrivateRefCount;
+
+	for (;i < NBuffers; i++)
 	{
 		if (PrivateRefCount[i] != 0)
 		{
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to