Attached is a patch that tracks and displays the accumulated cost when autovacuum is running. Code by Noah Misch and myself. I hope this idea will bring a formal process to vacuum tuning, which is currently too hard to do. I was about to add "without..." to that, but I then realized it needs no disclaimer; it's just too hard, period. Vacuum issues are enemy #1 at all the terabyte scale customer sites I've been fighting with lately.

The patch updates the command string just before the workers sleep to show how much work they've done so far. And at the end, it adds a few new lines to the information written to the logs, when the autovacuum is notable enough to be logged at all. The overhead it adds is at most a few integer operations per buffer processed and a slower title string update once per sleep. It's trivial compared to both the vacuum itself, and to the instrumentation's value to sites with vacuum issues.

To demonstrate the patch in action, here's a test case using a 6.4GB pgbench_accounts table:

$ createdb pgbench
$ pgbench -i -s 500 pgbench
$ psql -d pgbench -c "select pg_relation_size('pgbench_accounts');"
 pg_relation_size
------------------
       6714761216
$ psql -d pgbench -c "select relname,relpages from pg_class where relname='pgbench_accounts';"
     relname      | relpages
------------------+----------
 pgbench_accounts |   819673
$psql -d pgbench -c "delete from pgbench_accounts where aid<20000000"

You can see the new information in the command string with ps and grep:

$ while [ 1 ] ; do (ps -eaf | grep "[a]utovacuum worker" && sleep 60) ; done
gsmith 2687 17718 0 15:44 ? 00:00:00 postgres: autovacuum worker process h=19 m=14196 d=14185
...
gsmith 2687 17718 0 15:44 ? 00:00:09 postgres: autovacuum worker process h=182701 m=301515 d=321345
...
gsmith 2687 17718 1 15:44 ? 00:00:23 postgres: autovacuum worker process h=740359 m=679987 d=617559
...

That's accumulated hit/miss/dirty counts, the raw numbers. When the autovacuum is finished, those totals appear as a new line in the log entry:

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index scans: 1
    pages: 0 removed, 819673 remain
    tuples: 19999999 removed, 30000022 remain
    buffer usage: 809537 hits, 749340 misses, 686660 dirtied
    system usage: CPU 5.70s/19.73u sec elapsed 2211.60 sec

To check if this makes sense, we need the relevant parameters at the time, which were the defaults (I only tweaked some basic config bits here, including shared_buffers=400MB so a bit more was cached):

vacuum_cost_page_hit = 1       # 0-10000 credits
vacuum_cost_page_miss = 10     # 0-10000 credits
vacuum_cost_page_dirty = 20        # 0-10000 credits
vacuum_cost_limit = 200        # 1-10000 credits
autovacuum_vacuum_cost_delay = 20ms

Every 20ms equals 50 times/second. That means the cost accumulation should be 200 * 50 = 10000 / second, or 600K/minute. That's how fast the cost should be increasing here. Given a runtime of 2211.60 seconds, that's a total estimated cost of 2209.15 * 10000 = 22,091,500. Now we check that against the totals printed at the end of the vacuum:

1 * 809537 hits=809,537
10 * 749340 misses=7,493,400
20 * 686607 dirtied=13,732,140

And that gives a directly computed total of 22,035,077. Close enough to show this is working as expected. And how I computed all that should give you an idea how you might use these numbers to extract other useful statistics, if you'd like to tune the balance of various cost_page_* parameters as one example. I have no idea how anyone could ever set those relative to one another without this data, it would take epic guessing skills.

What else can do you do with this data?

-Figure out if the VACUUM is still making progress when it appears stuck
-Estimate how long it will take to finish, based on current progress and whatever total cost was logged last time VACUUM ran against this relation. -Compute approximate hit rate on the read side. OS caching issues and the ring buffer are obviously a problem with that, this isn't too valuable. -Can see the cost split when multiple vacuums are running. This problem is why sites can't just use "total time to vacuum" as a useful proxy to estimate how long one will take to run.
-Easy to track the read/write ratio
-Directly measure the write rate

That last one is I think the part people are most perplexed by right now, and this makes it trivial. How do you turn all these cost figures into real-world read/write rates? It's been hard to do.

Now, you can take a bunch of samples of the data at 1 minute intervals, like my little "ps | grep" example above does. The delta in the "dirty=" column is how much was written per minute, in units of 8K (usually) buffers. Multiply that by 8192/(60*1024*1024), and you get MB/s out of there. I collected that data for a cleanup run of the pgbench_accounts damage done above, CSV file with all the statistics is attached.

I also collected OS level stats from Linux about the actual read/write rate of the process, converted into "Write Mbps" (those are actually in MB/s, sloppy capitalization is via OpenOffice "autocorrect"). Those numbers are close enough to make me confident the dirty buffer totals tracked here do turn into useful MB/s values. Sample of the most interesting part:

Cost Delta    Dirty Mbps    Write Mbps
589,890    2.56    2.73
591,151    2.57    2.73
589,035    2.56    2.72
593,775    3.14    0.20
599,420    2.05    0.00
598,503    2.05    0.00
599,421    2.05    0.00
574,046    0.60    0.01
574,779    0.64    0.67
609,140    2.56    2.68
612,397    2.57    2.69
611,744    2.57    2.69
610,008    2.56    2.68

This shows the expected 600K/minute cost accumulation. And using the dirty= numbers to compute MB/s of write speed closely matches the total write speed of this process. Some of the difference might be I/O to other things besides the main table here, some of it is just because OS write caching will influence the write rate. In the spots where the OS value and what's derived from the dirty rate diverge most, it appears to be because vacuum is filling Linux's write cache. Actual writes accumulated against the process them block for a while. It's a small difference most of the time.

I'd be willing to accept a "Dirty MB/s" figure computed this way as accurate enough for most purposes. And this patch lets you get that data, currently unavailable without poking into the OS statistics (if at all), just by doing a little log file and/or command string scraping. Total at the end or real-time monitoring, based on how much work you want to put into it. For a busy site where one or more autovacuum processes are running most of the time, being able to monitor the vacuum portion of the I/O this way will be a huge improvement over the current state of things. I already have a stack of tools built on top of this data I'm working on, and they're making it much easier to come up with an iterative tuning process for autovacuum.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

#"timestamp","page_hit","hit_bytes","page_miss","miss_bytes","page_dirty","dirty_bytes","Dirty Mbps","Total Cost","Cost Delta","Read Bytes","Write Bytes","Read Mbps","Write Mbps"
08/16/2011-18:46:56,21,"172,032",27088,"221,904,896",27079,"221,831,168",2.56,"812,481","589,680",0,"235,945,984",0.00,2.73
08/16/2011-18:47:56,21,"172,032",46737,"382,869,504",46728,"382,795,776",2.56,"1,401,951","589,470",0,"407,298,048",0.00,2.72
08/16/2011-18:48:56,21,"172,032",66359,"543,612,928",66349,"543,531,008",2.55,"1,990,591","588,640",0,"578,527,232",0.00,2.72
08/16/2011-18:49:56,21,"172,032",86092,"705,265,664",86082,"705,183,744",2.57,"2,582,581","591,990",0,"750,551,040",0.00,2.73
08/16/2011-18:50:56,21,"172,032",105713,"866,000,896",105703,"865,918,976",2.55,"3,171,211","588,630",0,"921,657,344",0.00,2.72
08/16/2011-18:51:57,21,"172,032",125369,"1,027,022,848",125359,"1,026,940,928",2.56,"3,760,891","589,680",0,"1,093,173,248",0.00,2.73
08/16/2011-18:52:57,21,"172,032",145019,"1,187,995,648",145008,"1,187,905,536",2.56,"4,350,371","589,480",0,"1,264,517,120",0.00,2.72
08/16/2011-18:53:57,21,"172,032",164717,"1,349,361,664",164706,"1,349,271,552",2.56,"4,941,311","590,940",0,"1,436,352,512",0.00,2.73
08/16/2011-18:54:57,21,"172,032",184387,"1,510,498,304",184376,"1,510,408,192",2.56,"5,531,411","590,100",0,"1,607,917,568",0.00,2.73
08/16/2011-18:55:57,22,"180,224",204078,"1,671,806,976",204067,"1,671,716,864",2.56,"6,122,142","590,731",0,"1,779,671,040",0.00,2.73
08/16/2011-18:56:57,22,"180,224",223706,"1,832,599,552",223695,"1,832,509,440",2.56,"6,710,982","588,840",0,"1,950,982,144",0.00,2.72
08/16/2011-18:57:57,22,"180,224",243369,"1,993,678,848",243358,"1,993,588,736",2.56,"7,300,872","589,890",0,"2,122,530,816",0.00,2.73
08/16/2011-18:58:57,23,"188,416",263074,"2,155,102,208",263063,"2,155,012,096",2.57,"7,892,023","591,151",0,"2,294,480,896",0.00,2.73
08/16/2011-18:59:57,28,"229,376",282705,"2,315,919,360",282699,"2,315,870,208",2.56,"8,481,058","589,035",0,"2,465,808,384",0.00,2.72
08/16/2011-19:00:57,67293,"551,264,256",287122,"2,352,103,424",306816,"2,513,436,672",3.14,"9,074,833","593,775",0,"2,478,424,064",0.00,0.20
08/16/2011-19:01:57,193313,"1,583,620,096",302902,"2,481,373,184",322596,"2,642,706,432",2.05,"9,674,253","599,420",0,"2,478,424,064",0.00,0.00
08/16/2011-19:02:57,319136,"2,614,362,112",318658,"2,610,446,336",338352,"2,771,779,584",2.05,"10,272,756","598,503",0,"2,478,424,064",0.00,0.00
08/16/2011-19:03:57,445157,"3,646,726,144",334438,"2,739,716,096",354132,"2,901,049,344",2.05,"10,872,177","599,421",0,"2,478,424,064",0.00,0.00
08/16/2011-19:04:57,481663,"3,945,783,296",379050,"3,105,177,600",358703,"2,938,494,976",0.60,"11,446,223","574,046",0,"2,479,005,696",0.00,0.01
08/16/2011-19:05:58,486582,"3,986,079,744",426190,"3,491,348,480",363626,"2,978,824,192",0.64,"12,021,002","574,779",0,"2,520,915,968",0.00,0.67
08/16/2011-19:06:58,506222,"4,146,970,624",445840,"3,652,321,280",383276,"3,139,796,992",2.56,"12,630,142","609,140",0,"2,689,245,184",0.00,2.68
08/16/2011-19:07:58,525969,"4,308,738,048",465595,"3,814,154,240",403031,"3,301,629,952",2.57,"13,242,539","612,397",0,"2,858,622,976",0.00,2.69
08/16/2011-19:08:58,545693,"4,470,317,056",485329,"3,975,815,168",422765,"3,463,290,880",2.57,"13,854,283","611,744",0,"3,027,812,352",0.00,2.69
08/16/2011-19:09:58,565361,"4,631,437,312",505007,"4,137,017,344",442443,"3,624,493,056",2.56,"14,464,291","610,008",0,"3,196,469,248",0.00,2.68
08/16/2011-19:10:58,584994,"4,792,270,848",524650,"4,297,932,800",462086,"3,785,408,512",2.56,"15,073,214","608,923",0,"3,364,765,696",0.00,2.68
08/16/2011-19:11:58,604662,"4,953,391,104",544328,"4,459,134,976",481764,"3,946,610,688",2.56,"15,683,222","610,008",0,"3,533,332,480",0.00,2.68
08/16/2011-19:12:58,624330,"5,114,511,360",564006,"4,620,337,152",501442,"4,107,812,864",2.56,"16,293,230","610,008",0,"3,701,866,496",0.00,2.68
08/16/2011-19:13:58,643999,"5,275,639,808",583683,"4,781,531,136",521119,"4,269,006,848",2.56,"16,903,209","609,979",0,"3,870,490,624",0.00,2.68
08/16/2011-19:14:58,663709,"5,437,104,128",603403,"4,943,077,376",540839,"4,430,553,088",2.57,"17,514,519","611,310",0,"4,040,097,792",0.00,2.70
08/16/2011-19:15:58,683412,"5,598,511,104",623116,"5,104,566,272",560552,"4,592,041,984",2.57,"18,125,612","611,093",0,"4,211,933,184",0.00,2.73
08/16/2011-19:16:58,703094,"5,759,746,048",642808,"5,265,883,136",580244,"4,753,358,848",2.56,"18,736,054","610,442",0,"4,383,735,808",0.00,2.73
08/16/2011-19:17:58,722713,"5,920,464,896",662437,"5,426,683,904",599873,"4,914,159,616",2.56,"19,344,543","608,489",0,"4,554,866,688",0.00,2.72
08/16/2011-19:18:58,742325,"6,081,126,400",682059,"5,587,427,328",619495,"5,074,903,040",2.55,"19,952,815","608,272",0,"4,725,981,184",0.00,2.72
08/16/2011-19:19:59,762016,"6,242,435,072",701758,"5,748,801,536",639194,"5,236,277,248",2.56,"20,563,476","610,661",0,"4,897,742,848",0.00,2.73
08/16/2011-19:20:59,781635,"6,403,153,920",721387,"5,909,602,304",658823,"5,397,078,016",2.56,"21,171,965","608,489",0,"5,068,914,688",0.00,2.72
08/16/2011-19:21:59,801324,"6,564,446,208",741086,"6,070,976,512",678522,"5,558,452,224",2.56,"21,782,624","610,659",0,"5,240,602,624",0.00,2.73
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 889737e..cc2d148 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -43,6 +43,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/ps_status.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
@@ -214,6 +215,9 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 
 		VacuumCostActive = (VacuumCostDelay > 0);
 		VacuumCostBalance = 0;
+		VacuumPageHit = 0;
+		VacuumPageMiss = 0;
+		VacuumPageDirty = 0;
 
 		/*
 		 * Loop to process each selected relation.
@@ -1155,11 +1159,17 @@ vacuum_delay_point(void)
 		VacuumCostBalance >= VacuumCostLimit)
 	{
 		int			msec;
+		StringInfoData ps;
 
 		msec = VacuumCostDelay * VacuumCostBalance / VacuumCostLimit;
 		if (msec > VacuumCostDelay * 4)
 			msec = VacuumCostDelay * 4;
 
+		initStringInfo(&ps);
+		appendStringInfo(&ps, "h=%d m=%d d=%d",
+						 VacuumPageHit, VacuumPageMiss, VacuumPageDirty);
+		set_ps_display(ps.data, false);
+
 		pg_usleep(msec * 1000L);
 
 		VacuumCostBalance = 0;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c5bf32e..a6094fe 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -226,20 +226,42 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 		if (Log_autovacuum_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
 									   Log_autovacuum_min_duration))
-			ereport(LOG,
-					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
-							"pages: %d removed, %d remain\n"
-							"tuples: %.0f removed, %.0f remain\n"
-							"system usage: %s",
-							get_database_name(MyDatabaseId),
-							get_namespace_name(RelationGetNamespace(onerel)),
-							RelationGetRelationName(onerel),
-							vacrelstats->num_index_scans,
-							vacrelstats->pages_removed,
-							vacrelstats->rel_pages,
-							vacrelstats->tuples_deleted,
-							vacrelstats->new_rel_tuples,
-							pg_rusage_show(&ru0))));
+		{
+			if (VacuumCostActive)
+				ereport(LOG,
+						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
+								"pages: %d removed, %d remain\n"
+								"tuples: %.0f removed, %.0f remain\n"
+								"buffer usage: %d hits, %d misses, %d dirtied\n"
+								"system usage: %s",
+								get_database_name(MyDatabaseId),
+								get_namespace_name(RelationGetNamespace(onerel)),
+								RelationGetRelationName(onerel),
+								vacrelstats->num_index_scans,
+								vacrelstats->pages_removed,
+								vacrelstats->rel_pages,
+								vacrelstats->tuples_deleted,
+								vacrelstats->new_rel_tuples,
+								VacuumPageHit,
+								VacuumPageMiss,
+								VacuumPageDirty,
+								pg_rusage_show(&ru0))));
+			else
+				ereport(LOG,
+						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
+								"pages: %d removed, %d remain\n"
+								"tuples: %.0f removed, %.0f remain\n"
+								"system usage: %s",
+								get_database_name(MyDatabaseId),
+								get_namespace_name(RelationGetNamespace(onerel)),
+								RelationGetRelationName(onerel),
+								vacrelstats->num_index_scans,
+								vacrelstats->pages_removed,
+								vacrelstats->rel_pages,
+								vacrelstats->tuples_deleted,
+								vacrelstats->new_rel_tuples,
+								pg_rusage_show(&ru0))));
+		}
 	}
 }
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4c7cfb0..d62adbc 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -341,7 +341,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			*hit = true;
 
 			if (VacuumCostActive)
+			{
+				VacuumPageHit++;
 				VacuumCostBalance += VacuumCostPageHit;
+			}
 
 			TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
 											  smgr->smgr_rnode.node.spcNode,
@@ -471,7 +474,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	}
 
 	if (VacuumCostActive)
+	{
+		VacuumPageMiss++;
 		VacuumCostBalance += VacuumCostPageMiss;
+	}
 
 	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
 									  smgr->smgr_rnode.node.spcNode,
@@ -974,7 +980,10 @@ MarkBufferDirty(Buffer buffer)
 	 * If the buffer was not dirty already, do vacuum cost accounting.
 	 */
 	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+	{
+		VacuumPageDirty++;
 		VacuumCostBalance += VacuumCostPageDirty;
+	}
 
 	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
 
@@ -2299,7 +2308,10 @@ SetBufferCommitInfoNeedsSave(Buffer buffer)
 		LockBufHdr(bufHdr);
 		Assert(bufHdr->refcount > 0);
 		if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+		{
+			VacuumPageDirty++;
 			VacuumCostBalance += VacuumCostPageDirty;
+		}
 		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
 		UnlockBufHdr(bufHdr);
 	}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c4c4154..9ce64e6 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -115,6 +115,10 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 int			VacuumCostDelay = 0;
 
+int			VacuumPageHit = 0;
+int			VacuumPageMiss = 0;
+int			VacuumPageDirty = 0;
+
 int			VacuumCostBalance = 0;		/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9d19417..4ee08fe 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -230,6 +230,10 @@ extern int	VacuumCostPageDirty;
 extern int	VacuumCostLimit;
 extern int	VacuumCostDelay;
 
+extern int	VacuumPageHit;
+extern int	VacuumPageMiss;
+extern int	VacuumPageDirty;
+
 extern int	VacuumCostBalance;
 extern bool VacuumCostActive;
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to