[EMAIL PROTECTED] ("Matthew T. O'Connor") writes: > OK, well as we wait on the fix for the stats system, let me submit my > patch for pg_autovacuum. This patch assumes that the stats system will > be fixed so that all inserts, updates and deletes performed on shared > tables reguardless of what database those commands were executed from, > will show up in the stats shown in each database.
I had to make a further change to this to take quotes off the 'last ANALYZE' in order for it to not overquote the relation name, so there's a _little_ work left to get it to play well. I have deployed it onto several boxes that should be doing some vacuuming over the weekend, and it is now certainly hitting pg_ tables. I would like to present a CVS-oriented patch; unfortunately, I had to change the indentation patterns when editing some of it :-(. The following _may_ be good; not sure... ? pg_autovacuum Index: README.pg_autovacuum =================================================================== RCS file: /projects/cvsroot/pgsql-server/contrib/pg_autovacuum/README.pg_autovacuum,v retrieving revision 1.2 diff -c -b -r1.2 README.pg_autovacuum *** README.pg_autovacuum 12 Jun 2003 01:36:44 -0000 1.2 --- README.pg_autovacuum 12 Sep 2003 22:45:37 -0000 *************** *** 2,67 **** -------------------- pg_autovacuum is a libpq client program that monitors all the ! databases associated with a postgresql server. It uses the stats collector to monitor insert, update and delete activity. ! When a table exceeds its insert or delete threshold (more detail ! on thresholds below) then that table will be vacuumed or analyzed. ! ! This allows postgresql to keep the fsm and table statistics up to ! date, and eliminates the need to schedule periodic vacuums. The primary benefit of pg_autovacuum is that the FSM and table ! statistic information are updated as needed. When a table is actively ! changing, pg_autovacuum will perform the necessary vacuums and ! analyzes, whereas if a table remains static, no cycles will be wasted ! performing unnecessary vacuums/analyzes. A secondary benefit of pg_autovacuum is that it ensures that a ! database wide vacuum is performed prior to xid wraparound. This is an important, if rare, problem, as failing to do so can result in major ! data loss. ! KNOWN ISSUES: ------------- - pg_autovacuum has been tested under Redhat Linux (by me) and Solaris (by - Christopher B. Browne) and all known bugs have been resolved. Please report - any problems to the hackers list. - - pg_autovacuum does not get started automatically by either the postmaster or - by pg_ctl. Along the sames lines, when the postmaster exits no one tells - pg_autovacuum. The result is that at the start of the next loop, - pg_autovacuum fails to connect to the server and exits. Any time it fails - to connect pg_autovacuum exits. - - pg_autovacuum requires that the stats system be enabled and reporting row - level stats. The overhead of the stats system has been shown to be - significant under certain workloads. For instance a tight loop of queries - performing "select 1" was nearly 30% slower with stats enabled. However, - in practice with more realistic workloads, the stats system overhead is - usually nominal. INSTALL: -------- ! As of postgresql v7.4 pg_autovacuum is included in the main source tree ! under contrib. Therefore you just make && make install (similar to most other ! contrib modules) and it will be installed for you. ! ! If you are using an earlier version of postgresql just uncompress the tar.gz ! into the contrib directory and modify the contrib/Makefile to include the pg_autovacuum ! directory. pg_autovacuum will then be made as part of the standard ! postgresql install. ! make sure that the folowing are set in postgresql.conf stats_start_collector = true stats_row_level = true ! start up the postmaster, then execute the pg_autovacuum executable. Command line arguments: ----------------------- --- 2,98 ---- -------------------- pg_autovacuum is a libpq client program that monitors all the ! databases associated with a PostgreSQL server. It uses the statistics collector to monitor insert, update and delete activity. ! When a table exceeds a insert or delete threshold (for more detail on ! thresholds, see "Vacuum and Analyze" below) then that table will be ! vacuumed and/or analyzed. ! ! This allows PostgreSQL to keep the FSM (Free Space Map) and table ! statistics up to date, and eliminates the need to schedule periodic ! vacuums. The primary benefit of pg_autovacuum is that the FSM and table ! statistic information are updated more nearly as frequently as needed. ! When a table is actively changing, pg_autovacuum will perform the ! VACUUMs and ANALYZEs that such a table needs, whereas if a table ! remains static, no cycles will be wasted performing this ! unnecessarily. A secondary benefit of pg_autovacuum is that it ensures that a ! database wide vacuum is performed prior to XID wraparound. This is an important, if rare, problem, as failing to do so can result in major ! data loss. (See the section in the _Administrator's Guide_ entitled ! "Preventing transaction ID wraparound failures" for more details.) KNOWN ISSUES: ------------- + pg_autovacuum has been tested under Redhat Linux (by me) and Debian + GNU/Linux and Solaris (by Christopher B. Browne) and all known bugs + have been resolved. Please report any problems to the hackers list. + + pg_autovacuum requires that the statistics system be enabled and + reporting row level stats. The overhead of the stats system has been + shown to be significant costly under certain workloads. For instance, + a tight loop of queries performing "select 1" was found to run nearly + 30% slower when stats were enabled. However, in practice, with more + realistic workloads, the stats system overhead is usually nominal. + + pg_autovacuum does not get started automatically by either the + postmaster or by pg_ctl. Similarly, when the postmaster exits, no one + tells pg_autovacuum. The result of that is that at the start of the + next loop, pg_autovacuum will fail to connect to the server and + exit(). Any time it fails to connect pg_autovacuum exit()s. + + While pg_autovacuum can manage vacuums for as many databases as you + may have tied to a particular PostgreSQL postmaster, it can only + connect to a single PostgreSQL postmaster. Thus, if you have multiple + postmasters on a particular host, you will need multiple pg_autovacuum + instances, and they have no way, at present, to coordinate between one + another to ensure that they do not concurrently vacuum big tables. + + TODO: + ----- + + At present, there are no sample scripts to automatically start up + pg_autovacuum along with the database. It would be desirable to have + a SysV script to start up pg_autovacuum after PostgreSQL has been + started. + + Some users have expressed interest in making pg_autovacuum more + configurable so that certain tables known to be inactive could be + excluded from being vacuumed. It would probably make sense to + introduce this sort of functionality by providing arguments to specify + the database and schema in which to find a configuration table. INSTALL: -------- ! As of postgresql v7.4 pg_autovacuum is included in the main source ! tree under contrib. Therefore you merely need to "make && make ! install" (similar to most other contrib modules) and it will be ! installed for you. ! ! If you are using an earlier version of PostgreSQL, uncompress the ! tar.gz file into the contrib directory and modify the contrib/Makefile ! to include the pg_autovacuum directory. pg_autovacuum will then be ! built as part of the standard postgresql install. ! make sure that the following are set in postgresql.conf: stats_start_collector = true stats_row_level = true ! Start up the postmaster, then execute the pg_autovacuum executable. ! ! If you have a script that automatically starts up the PostgreSQL ! instance, you might add in, after that, something similar to the ! following: + sleep 10 # To give the database some time to start up + $PGBINS/pg_autovacuum -D -s $SBASE -S $SSCALE ... [other arguments] Command line arguments: ----------------------- *************** *** 69,75 **** pg_autovacuum has the following optional arguments: -d debug: 0 silent, 1 basic info, 2 more debug info, etc... ! -D dameonize: Detach from tty and run in background. -s sleep base value: see "Sleeping" below. -S sleep scaling factor: see "Sleeping" below. -v vacuum base threshold: see Vacuum and Analyze. --- 100,106 ---- pg_autovacuum has the following optional arguments: -d debug: 0 silent, 1 basic info, 2 more debug info, etc... ! -D daemonize: Detach from tty and run in background. -s sleep base value: see "Sleeping" below. -S sleep scaling factor: see "Sleeping" below. -v vacuum base threshold: see Vacuum and Analyze. *************** *** 80,97 **** -U username: Username pg_autovacuum will use to connect with, if not specified the current username is used. -P password: Password pg_autovacuum will use to connect with. ! -H host: host name or IP to connect too. -p port: port used for connection. -h help: list of command line options. ! All arguments have default values defined in pg_autovacuum.h. At the ! time of writing they are: -d 1 -v 1000 -V 2 ! -a 500 (half of -v is not specified) ! -A 1 (half of -v is not specified) -s 300 (5 minutes) -S 2 --- 111,128 ---- -U username: Username pg_autovacuum will use to connect with, if not specified the current username is used. -P password: Password pg_autovacuum will use to connect with. ! -H host: host name or IP to connect to. -p port: port used for connection. -h help: list of command line options. ! Numerous arguments have default values defined in pg_autovacuum.h. At ! the time of writing they are: -d 1 -v 1000 -V 2 ! -a 500 (half of -v if not specified) ! -A 1 (half of -v if not specified) -s 300 (5 minutes) -S 2 *************** *** 99,111 **** Vacuum and Analyze: ------------------- ! pg_autovacuum performs either a vacuum analyze or just analyze depending ! on the quantity and type of table activity (insert, update, or delete): - If the number of (inserts + updates + deletes) > AnalyzeThreshold, then only an analyze is performed. ! - If the number of (deletes + updates ) > VacuumThreshold, then a vacuum analyze is performed. deleteThreshold is equal to: --- 130,143 ---- Vacuum and Analyze: ------------------- ! pg_autovacuum performs either a VACUUM ANALYZE or just ANALYZE ! depending on the mixture of table activity (insert, update, or ! delete): - If the number of (inserts + updates + deletes) > AnalyzeThreshold, then only an analyze is performed. ! - If the number of (deletes + updates) > VacuumThreshold, then a vacuum analyze is performed. deleteThreshold is equal to: *************** *** 115,140 **** analyze_base_value + (analyze_scaling_factor * "number of tuples in the table") The AnalyzeThreshold defaults to half of the VacuumThreshold since it ! represents a much less expensive operation (approx 5%-10% of vacuum), and ! running it more often should not substantially degrade system performance. Sleeping: --------- pg_autovacuum sleeps for a while after it is done checking all the databases. It does this in order to limit the amount of system ! resources it consumes. This also allows the system administrator to configure pg_autovacuum to be more or less aggressive. Reducing the sleep time will cause pg_autovacuum to respond more quickly to changes, whether they be database addition/removal, table addition/removal, or just normal table activity. ! On the other hand, setting pg_autovaccum to sleep values to agressivly ! (for too short a period of time) can have a negative effect on server ! performance. If a table gets vacuumed 5 times during the course of a ! large update, this is likely to take much longer than if the table was ! vacuumed only once, at the end. The total time it sleeps is equal to: --- 147,173 ---- analyze_base_value + (analyze_scaling_factor * "number of tuples in the table") The AnalyzeThreshold defaults to half of the VacuumThreshold since it ! represents a much less expensive operation (approx 5%-10% of vacuum), ! and running ANALYZE more often should not substantially degrade system ! performance. Sleeping: --------- pg_autovacuum sleeps for a while after it is done checking all the databases. It does this in order to limit the amount of system ! resources it consumes. This allows the system administrator to configure pg_autovacuum to be more or less aggressive. Reducing the sleep time will cause pg_autovacuum to respond more quickly to changes, whether they be database addition/removal, table addition/removal, or just normal table activity. ! On the other hand, setting pg_autovacuum to sleep values too ! aggressively (to too short periods of time) can have a negative effect ! on server performance. For instance, if a table gets vacuumed 5 times ! during the course of a large set of updates, this is likely to take a ! lot more work than if the table was vacuumed just once, at the end. The total time it sleeps is equal to: *************** *** 142,156 **** loop" Note that timing measurements are made in seconds; specifying ! "pg_vacuum -s 1" means pg_autovacuum could poll the database upto 60 times ! minute. In a system with large tables where vacuums may run for several ! minutes, longer times between vacuums are likely to be appropriate. What pg_autovacuum monitors: ---------------------------- ! pg_autovacuum dynamically generates a list of all databases and tables that ! exist on the server. It will dynamically add and remove databases and ! tables that are removed from the database server while pg_autovacuum is ! running. Overhead is fairly small per object. For example: 10 databases ! with 10 tables each appears to less than 10k of memory on my Linux box. --- 175,191 ---- loop" Note that timing measurements are made in seconds; specifying ! "pg_vacuum -s 1" means pg_autovacuum could poll the database up to 60 ! times minute. In a system with large tables where vacuums may run for ! several minutes, rather longer times between vacuums are likely to be ! appropriate. What pg_autovacuum monitors: ---------------------------- ! pg_autovacuum dynamically generates a list of all databases and tables ! that exist on the server. It will dynamically add and remove ! databases and tables that are removed from the database server while ! pg_autovacuum is running. Overhead is fairly small per object. For ! example: 10 databases with 10 tables each appears to less than 10k of ! memory on my Linux box. Index: pg_autovacuum.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/contrib/pg_autovacuum/pg_autovacuum.c,v retrieving revision 1.5 diff -c -b -r1.5 pg_autovacuum.c *** pg_autovacuum.c 11 Sep 2003 19:01:18 -0000 1.5 --- pg_autovacuum.c 12 Sep 2003 22:45:37 -0000 *************** *** 118,123 **** --- 118,129 ---- new_tbl->reltuples = atoi(PQgetvalue(res, row, PQfnumber(res, "reltuples"))); new_tbl->relpages = atoi(PQgetvalue(res, row, PQfnumber(res, "relpages"))); + log_entry(PQgetvalue(res, row, PQfnumber(res, "relisshared"))); + if (strcmp("t", PQgetvalue(res, row, PQfnumber(res, "relisshared")))) + new_tbl->relisshared = 0; + else + new_tbl->relisshared = 1; + new_tbl->analyze_threshold = args->analyze_base_threshold + args->analyze_scaling_factor * new_tbl->reltuples; new_tbl->vacuum_threshold = *************** *** 213,219 **** * both remove tables from the list that no longer exist and add * tables to the list that are new */ ! res = send_query(query_table_stats(dbi), dbi); t = PQntuples(res); /* --- 219,225 ---- * both remove tables from the list that no longer exist and add * tables to the list that are new */ ! res = send_query((char *) TABLE_STATS_QUERY, dbi); t = PQntuples(res); /* *************** *** 353,359 **** { sprintf(logbuffer, " table name: %s.%s", tbl->dbi->dbname, tbl->table_name); log_entry(logbuffer); ! sprintf(logbuffer, " relfilenode: %i", tbl->relfilenode); log_entry(logbuffer); sprintf(logbuffer, " reltuples: %i; relpages: %i", tbl->reltuples, tbl->relpages); log_entry(logbuffer); --- 359,365 ---- { sprintf(logbuffer, " table name: %s.%s", tbl->dbi->dbname, tbl->table_name); log_entry(logbuffer); ! sprintf(logbuffer, " relfilenode: %i; relisshared: %i", tbl->relfilenode, tbl->relisshared); log_entry(logbuffer); sprintf(logbuffer, " reltuples: %i; relpages: %i", tbl->reltuples, tbl->relpages); log_entry(logbuffer); *************** *** 688,706 **** /* End of DB List Management Function */ ! /* Begninning of misc Functions */ ! ! ! char * ! query_table_stats(db_info * dbi) ! { ! if (!strcmp(dbi->dbname, "template1")) /* Use template1 to ! * monitor the system ! * tables */ ! return (char *) TABLE_STATS_ALL; ! else ! return (char *) TABLE_STATS_USER; ! } /* Perhaps add some test to this function to make sure that the stats we need are available */ PGconn * --- 694,700 ---- /* End of DB List Management Function */ ! /* Beginning of misc Functions */ /* Perhaps add some test to this function to make sure that the stats we need are available */ PGconn * *************** *** 753,758 **** --- 747,755 ---- if (NULL == dbi->conn) return NULL; + if (args->debug >= 4) + log_entry(query); + res = PQexec(dbi->conn, query); if (!res) *************** *** 964,970 **** int j = 0, loops = 0; ! /* int numInserts, numDeletes, */ int sleep_secs; Dllist *db_list; Dlelem *db_elem, --- 961,967 ---- int j = 0, loops = 0; ! /* int numInserts, numDeletes, */ int sleep_secs; Dllist *db_list; Dlelem *db_elem, *************** *** 1055,1061 **** if (0 == xid_wraparound_check(dbs)); { ! res = send_query(query_table_stats(dbs), dbs); /* Get an updated * snapshot of this dbs * table stats */ for (j = 0; j < PQntuples(res); j++) --- 1052,1058 ---- if (0 == xid_wraparound_check(dbs)); { ! res = send_query(TABLE_STATS_QUERY, dbs); /* Get an updated * snapshot of this dbs * table stats */ for (j = 0; j < PQntuples(res); j++) *************** *** 1087,1093 **** */ if ((tbl->curr_vacuum_count - tbl->CountAtLastVacuum) >= tbl->vacuum_threshold) { ! snprintf(buf, sizeof(buf), "VACUUM ANALYZE \"%s\"", tbl->table_name); if (args->debug >= 1) { sprintf(logbuffer, "Performing: %s", buf); --- 1084,1094 ---- */ if ((tbl->curr_vacuum_count - tbl->CountAtLastVacuum) >= tbl->vacuum_threshold) { ! /* if relisshared = t and database != template1 then only do an analyze */ ! if((tbl->relisshared > 0) && (strcmp("template1",dbs->dbname))) ! snprintf(buf, sizeof(buf), "ANALYZE %s", tbl->table_name); ! else ! snprintf(buf, sizeof(buf), "VACUUM ANALYZE %s", tbl->table_name); if (args->debug >= 1) { sprintf(logbuffer, "Performing: %s", buf); *************** *** 1101,1107 **** } else if ((tbl->curr_analyze_count - tbl->CountAtLastAnalyze) >= tbl->analyze_threshold) { ! snprintf(buf, sizeof(buf), "ANALYZE \"%s\"", tbl->table_name); if (args->debug >= 1) { sprintf(logbuffer, "Performing: %s", buf); --- 1102,1108 ---- } else if ((tbl->curr_analyze_count - tbl->CountAtLastAnalyze) >= tbl->analyze_threshold) { ! snprintf(buf, sizeof(buf), "ANALYZE %s", tbl->table_name); if (args->debug >= 1) { sprintf(logbuffer, "Performing: %s", buf); Index: pg_autovacuum.h =================================================================== RCS file: /projects/cvsroot/pgsql-server/contrib/pg_autovacuum/pg_autovacuum.h,v retrieving revision 1.6 diff -c -b -r1.6 pg_autovacuum.h *** pg_autovacuum.h 7 Aug 2003 21:11:57 -0000 1.6 --- pg_autovacuum.h 12 Sep 2003 22:45:37 -0000 *************** *** 34,41 **** #define VACUUM_ANALYZE 0 #define ANALYZE_ONLY 1 ! #define TABLE_STATS_ALL "select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.reltuples,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.relfilenode=b.relid" ! #define TABLE_STATS_USER "select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.reltuples,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_user_tables b where a.relfilenode=b.relid" #define FRONTEND #define PAGES_QUERY "select relfilenode,reltuples,relpages from pg_class where relfilenode=%i" #define FROZENOID_QUERY "select oid,age(datfrozenxid) from pg_database where datname = 'template1'" --- 34,41 ---- #define VACUUM_ANALYZE 0 #define ANALYZE_ONLY 1 ! #define TABLE_STATS_QUERY "select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.relfilenode=b.relid and a.relkind = 'r'" ! #define FRONTEND #define PAGES_QUERY "select relfilenode,reltuples,relpages from pg_class where relfilenode=%i" #define FROZENOID_QUERY "select oid,age(datfrozenxid) from pg_database where datname = 'template1'" *************** *** 86,91 **** --- 86,92 ---- *table_name; int relfilenode, reltuples, + relisshared, relpages; long analyze_threshold, vacuum_threshold; *************** *** 132,138 **** static PGconn *db_connect(db_info * dbi); static void db_disconnect(db_info * dbi); static PGresult *send_query(const char *query, db_info * dbi); - static char *query_table_stats(db_info * dbi); /* Other Generally needed Functions */ static void daemonize(void); --- 133,138 ---- -- let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land) ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly