[HACKERS] Process wakeups when idle and power consumption

Peter Geoghegan Thu, 05 May 2011 12:50:04 -0700

There is a general need to have Postgres consume fewer CPU cycles and
less power when idle. Until something is done about this, shared
hosting providers, particularly those who want to deploy many VM
instances with databases, will continue to choose MySQL out of hand.


I have quantified the difference in the number of wake-ups when idle
between Postgres and MySQL using Intel's powertop utility on my
laptop, which runs Fedora 14. These figures are for a freshly initdb'd
database from git master, and mysql-server 5.1.56 from my system's
package manager.

*snip*
   2.7% ( 11.5)   [      ] postgres
   1.1% (  4.6)   [  1663] Xorg
   0.9% (  3.7)   [  1463] wpa_supplicant
   0.6% (  2.7)   [      ] [ahci] <interrupt>
   0.5% (  2.2)   [      ] mysqld
*snip*

Postgres consistenly has 11.5 wakeups per second, while MySQL
consistently has 2.2 wakeups (averaged over the 5 second period that
each cycle of instrumentation lasts).

If I turn on archiving, the figure for Postgres naturally increases:

*snip*
   1.7% ( 12.5)   [      ] postgres
   1.6% ( 12.0)   [   808] phy0
   0.7% (  5.4)   [  1463] wpa_supplicant
   0.6% (  4.3)   [      ] [ahci] <interrupt>
   0.3% (  2.2)   [      ] mysqld
*snip*

It increases by exactly the amount that you'd expect after looking at
pgarch.c - one wakeup per second. This is because there is a loop
within the main event loop for the process that is a prime example of
what unix_latch.c describes as "the common pattern of using
pg_usleep() or select() to wait until a signal arrives, where the
signal handler sets a global variable". The loop naps for one second
per iteration.

Attached is the first in what I hope will become a series of patches
for reducing power consumption when idle. It makes the archiver
process wake far less frequently, using a latch primitive,
specifically a non-shared latch. I'm not sure if I should have used a
shared latch, and have SetLatch() calls replace
SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) calls. Would that have
broken some implied notion of encapsulation? In any case, if I apply
the patch and rebuild, the difference is quite apparent:

***snip***
 3.9% ( 21.8)   [  1663] Xorg
   3.2% ( 17.9)   [      ] [ath9k] <interrupt>
   2.1% ( 11.9)   [   808] phy0
   2.1% ( 11.5)   [      ] postgres
   1.0% (  5.4)   [  1463] wpa_supplicant
   0.4% (  2.2)   [      ] mysqld
***snip***

The difference from not running the archiver at all appears to have
been completely eliminated (in fact, we still wake up every
PGARCH_AUTOWAKE_INTERVAL seconds, which is 60 seconds, but that
usually isn't apparent to powertop, which measures wakeups over 5
second periods).

If we could gain similar decreases in idle power consumption across
all Postgres ancillary processes, perhaps we'd see Postgres available
as an option for shared hosting plans more frequently. When these
differences are multiplied by thousands of VM instances, they really
matter. Unfortunately, there doesn't seem to be a way to get powertop
to display its instrumentation per-process to quickly get a detailed
overview of where those wake-ups occur across all pg processes.

I hope to work on reducing wakeups for PG ancillary processes in this
order (order of perceived difficulty), using shared latches to
eliminate "the waiting pattern" in each case:

* WALWriter
* BgWriter
* WALReceiver
* Startup process

I'll need to take a look at statistics, autovacuum and Logger
processes too, to see if they present more subtle opportunities for
reduced idle power consumption.

Do constants like PGARCH_AUTOWAKE_INTERVAL need to always be set at
their current, conservative levels? Perhaps these sorts of values
could be collectively controlled with a single GUC that represents a
trade-off between CPU cycles used when idle against
safety/reliability. On the other hand, there are GUCs that control
that per process in some cases already, such as wal_writer_delay, and
that suggestion could well be a bit woolly. It might be an enum value
that represented various levels of concern that would default to
something like 'conservative' (i.e. the current values).

Thoughts?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..01e5350 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -44,6 +44,7 @@
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
 #include "utils/ps_status.h"
+#include "storage/latch.h"
 
 
 /* ----------
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by 
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -370,28 +394,28 @@ pgarch_MainLoop(void)
 			last_copy_time = time(NULL);
 		}
 
-		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		/* 
+		 * Wait on latch, until various signals are received, or 
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals invalidating the timeout of 
+		 * WaitLatch() on some platforms can be safely disregarded, 
+		 * because we handle all expected signals, and all handlers 
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);	
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL - 
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
-
 		/*
 		 * The archiver quits either when the postmaster dies (not expected)
 		 * or after completing one more archiving cycle after receiving

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Process wakeups when idle and power consumption

Reply via email to