Hi,

attached is a version of the patch that I believe is ready for the
commitfest. As the patch was discussed over a large number of messages,
I've prepared a brief summary for those who did not follow the thread.


Issue
=====

The patch aims to improve the situation in deployments with many tables
in many databases (think for example 1000 tables in 1000 databases).
Currently all the stats for all the objects (dbs, tables and functions)
are written in a single file (pgstat.stat), which may get quite large
and that consequently leads to various issues:

1) I/O load - the file is frequently written / read, which may use a
significant part of the I/O bandwidth. For example we'have to deal with
cases when the pgstat.stat size is >150MB, and it's written (and read)
continuously (once it's written, a new write starts) and utilizes 100%
bandwidth on that device.

2) CPU load - a common solution to the previous issue is moving the file
into RAM, using a tmpfs filesystem. That "fixes" the I/O bottleneck but
causes high CPU load because the system is serializing and deserializing
large amounts of data. We often see ~1 CPU core "lost" due to this (and
causing higher power consumption, but that's Amazon's problem ;-)

3) disk space utilization - the pgstat.stat file is updated in two
steps, i.e. a new version is written to another file (pgstat.tmp) and
then it's renamed to pgstat.stat, which means the device (amount of RAM,
if using tmpfs device) needs to be >2x the actual size of the file.
(Actually more, because there may be descriptors open to multiple
versions of the file.)

This patch does not attempt to fix a "single DB with multiple schemas"
scenario, although it should not have a negative impact on it.


What the patch does
===================

1) split into global and per-db files
-------------------------------------

The patch "splits" the huge pgstat.stat file into smaller pieces - one
"global" one (global.stat) with database stats, and one file for each of
the databases (oid.stat) with table.

This makes it possible to write/read much smaller amounts of data, because

a) autovacuum launcher does not need to read the whole file - it needs
just the list of databases (and not the table/func stats)

b) autovacuum workers do request a fresh copy of a single database, so
the stats collector may write just the global.stat + one of the per-db files

and that consequently leads to much lower I/O and CPU load. During our
tests we've seen the I/O to drop from ~150MB/s to less than 4MB/s, and
much lower CPU utilization.


2) a new global/stat directory
------------------------------

The pgstat.stat file was originally saved into the "global" directory,
but with so many files that would get rather messy so I've created a new
global/stat directory and all the files are stored there.

This also means we can do a simple "delete files in the dir" when
pgstat_reset_all is called.


3) pgstat_(read|write)_statsfile split
--------------------------------------

These two functions were moved into a global and per-db functions, so
now there's

  pgstat_write_statsfile    -- global.stat
  pgstat_write_db_statsfile -- oid.stat

  pgstat_read_statsfile    -- global.stat
  pgstat_read_db_statsfile -- oid.stat

There's a logic to read/write only those files that are actually needed.


4) list of (OID, timestamp) inquiries, last db-write
----------------------------------------------------

Originally there was a single pair of request/write timestamps for the
whole file, updated whenever a worker requested a fresh file or when the
file was written.

With the split, this had to be replaced by two lists - a timestamp of
the last inquiry (per DB), and a timestamp when each database file was
written for the last time.

The timestamp of the last DB write was added to the PgStat_StatDBEntry
and the list of inquiries is kept in last_statrequests. The fields are
used at several places, so it's probably best to see the code.

Handling the timestamps is a rather complex stuff because of the clock
skews. One of those checks is not needed as the list of inquiries is
freed right after writing all the databases. But I wouldn't be surprised
if there was something I missed, as splitting the file into multiple
pieces made this part more complex.

So please, if you're going to review this patch this is one of the
tricky places.


5) dummy file
-------------

A special handling is necessary when an inquiry arrives for a database
without a PgStat_StatDBEntry - this happens for example right after
initdb, when there are no stats for template0 and template1, yet the
autovacuum workers do send inqiries for them.

The backend_read_statsfile now uses the timestamp stored in the header
of the per-db file (not in the global one), and the easies way to handle
this for new databases is writing an empty 'dummy file' (just a header
with timestamp). Without this, this would result in 'pgstat wait
timeout' errors.

This is what pgstat_write_db_dummyfile (used in pgstat_write_statsfile)
is for.


6) format ID
------------

I've bumped PGSTAT_FILE_FORMAT_ID to a new random value, although the
filenames changed to so we could live with the old ID just fine.



We've done a fair amount of testing so far, and if everything goes fine
we plan to deploy a back-ported version of this patch (to 9.1) on a
production in ~2 weeks.

Then I'll be able to provide some numbers from a real-world workload
(although our deployment and workload is not quite usual I guess).

regards
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..37b85e6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* ----------
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * ----------
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME         "global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE          "global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY                "global/stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME         "global/stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE          "global/stat/global.tmp"
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME      "global/stat/%d.stat"
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE       "global/stat/%d.tmp"
 
 /* ----------
  * Timer definitions.
@@ -115,8 +119,11 @@ int                        
pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * ----------
  */
+char      *pgstat_stat_directory = NULL;
 char      *pgstat_stat_filename = NULL;
 char      *pgstat_stat_tmpname = NULL;
+char      *pgstat_stat_db_filename = NULL;
+char      *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static int        localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+       Oid                     databaseid;             /* OID of the database 
to write */
+       TimestampTz request_time;       /* timestamp of the last write request 
*/
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
                                         Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool 
permanent);
+static void pgstat_write_db_dummyfile(Oid databaseid);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB 
*funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +303,6 @@ static void 
pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -549,8 +566,34 @@ startup_failed:
 void
 pgstat_reset_all(void)
 {
-       unlink(pgstat_stat_filename);
-       unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+       DIR * dir;
+       struct dirent * entry;
+
+       dir = AllocateDir(pgstat_stat_directory);
+       while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+       {
+               char fname[strlen(pgstat_stat_directory) + 
strlen(entry->d_name) + 1];
+
+               if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, 
"..") == 0)
+                       continue;
+
+               sprintf(fname, "%s/%s", pgstat_stat_directory, entry->d_name);
+               unlink(fname);
+       }
+       FreeDir(dir);
+
+       dir = AllocateDir(PGSTAT_STAT_PERMANENT_DIRECTORY);
+       while ((entry = ReadDir(dir, PGSTAT_STAT_PERMANENT_DIRECTORY)) != NULL)
+       {
+               char fname[strlen(PGSTAT_STAT_PERMANENT_FILENAME) + 
strlen(entry->d_name) + 1];
+
+               if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, 
"..") == 0)
+                       continue;
+
+               sprintf(fname, "%s/%s", PGSTAT_STAT_PERMANENT_FILENAME, 
entry->d_name);
+               unlink(fname);
+       }
+       FreeDir(dir);
 }
 
 #ifdef EXEC_BACKEND
@@ -1408,13 +1451,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid 
databaseid)
 {
        PgStat_MsgInquiry msg;
 
        pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
        msg.clock_time = clock_time;
        msg.cutoff_time = cutoff_time;
+       msg.databaseid = databaseid;
        pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3048,7 @@ PgstatCollectorMain(int argc, char *argv[])
        int                     len;
        PgStat_Msg      msg;
        int                     wr;
+       bool            first_write = true;
 
        IsUnderPostmaster = true;       /* we are a postmaster subprocess now */
 
@@ -3053,17 +3098,11 @@ PgstatCollectorMain(int argc, char *argv[])
        init_ps_display("stats collector process", "", "", "");
 
        /*
-        * Arrange to write the initial status file right away
-        */
-       last_statrequest = GetCurrentTimestamp();
-       last_statwrite = last_statrequest - 1;
-
-       /*
         * Read in an existing statistics stats file or initialize the stats to
-        * zero.
+        * zero (read data for all databases, including table/func stats).
         */
        pgStatRunningInCollector = true;
-       pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+       pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
        /*
         * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3107,10 +3146,14 @@ PgstatCollectorMain(int argc, char *argv[])
 
                        /*
                         * Write the stats file if a new request has arrived 
that is not
-                        * satisfied by existing file.
+                        * satisfied by existing file (force writing all files 
if it's
+                        * the first write after startup).
                         */
-                       if (last_statwrite < last_statrequest)
-                               pgstat_write_statsfile(false);
+                       if (first_write || pgstat_write_statsfile_needed())
+                       {
+                               pgstat_write_statsfile(false, first_write);
+                               first_write = false;
+                       }
 
                        /*
                         * Try to receive and process a message.  This will not 
block,
@@ -3269,7 +3312,7 @@ PgstatCollectorMain(int argc, char *argv[])
        /*
         * Save the final stats to reuse at next startup.
         */
-       pgstat_write_statsfile(true);
+       pgstat_write_statsfile(true, true);
 
        exit(0);
 }
@@ -3429,23 +3472,25 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid 
tableoid, bool create)
  *     shutting down only), remove the temporary file so that backends
  *     starting up under a new postmaster can't read the old data before
  *     the new collector is ready.
+ * 
+ *     When the 'force' is false, only the requested databases (listed in
+ *     last_statrequests) will be written. If 'force' is true, all databases
+ *     will be written (this is used e.g. at shutdown).
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfile(bool permanent, bool force)
 {
        HASH_SEQ_STATUS hstat;
-       HASH_SEQ_STATUS tstat;
-       HASH_SEQ_STATUS fstat;
        PgStat_StatDBEntry *dbentry;
-       PgStat_StatTabEntry *tabentry;
-       PgStat_StatFuncEntry *funcentry;
        FILE       *fpout;
        int32           format_id;
        const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : 
pgstat_stat_tmpname;
        const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
        int                     rc;
 
+       elog(DEBUG1, "writing statsfile '%s'", statfile);
+       
        /*
         * Open the statistics temp file to write out the current values.
         */
@@ -3484,6 +3529,20 @@ pgstat_write_statsfile(bool permanent)
        while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != 
NULL)
        {
                /*
+                * Write our the tables and functions into a separate file, but 
only
+                * if the database is in the requests or if it's a forced write 
(then
+                * all the DBs need to be written - e.g. at the shutdown).
+                * 
+                * We need to do this before the dbentry write to write the 
proper
+                * timestamp to the global file.
+                */
+               if (force || pgstat_db_requested(dbentry->databaseid)) {
+                       elog(DEBUG1, "writing statsfile for DB %d", 
dbentry->databaseid);
+                       dbentry->stats_timestamp = globalStats.stats_timestamp;
+                       pgstat_write_db_statsfile(dbentry, permanent);
+               }
+
+               /*
                 * Write out the DB entry including the number of live 
backends. We
                 * don't write the tables or functions pointers, since they're 
of no
                 * use to any other process.
@@ -3493,29 +3552,10 @@ pgstat_write_statsfile(bool permanent)
                (void) rc;                              /* we'll check for 
error with ferror */
 
                /*
-                * Walk through the database's access stats per table.
-                */
-               hash_seq_init(&tstat, dbentry->tables);
-               while ((tabentry = (PgStat_StatTabEntry *) 
hash_seq_search(&tstat)) != NULL)
-               {
-                       fputc('T', fpout);
-                       rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, 
fpout);
-                       (void) rc;                      /* we'll check for 
error with ferror */
-               }
-
-               /*
-                * Walk through the database's function stats table.
-                */
-               hash_seq_init(&fstat, dbentry->functions);
-               while ((funcentry = (PgStat_StatFuncEntry *) 
hash_seq_search(&fstat)) != NULL)
-               {
-                       fputc('F', fpout);
-                       rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, 
fpout);
-                       (void) rc;                      /* we'll check for 
error with ferror */
-               }
-
-               /*
                 * Mark the end of this DB
+                * 
+                * TODO Does using these chars still make sense, when the 
tables/func
+                * stats are moved to a separate file?
                 */
                fputc('d', fpout);
        }
@@ -3527,6 +3567,28 @@ pgstat_write_statsfile(bool permanent)
         */
        fputc('E', fpout);
 
+       /* In any case, we can just throw away all the db requests, but we need 
to
+        * write dummy files for databases without a stat entry (it would cause
+        * issues in pgstat_read_db_statsfile_timestamp and pgstat wait 
timeouts).
+        * This may happend e.g. for shared DB (oid = 0) right after initdb.
+        */
+       if (last_statrequests != NULL)
+       {
+               int i = 0;
+               for (i = 0; i < num_statrequests; i++)
+               {
+                       /* Create dummy files for requested databases without a 
proper
+                        * dbentry. It's much easier this way than dealing with 
multiple
+                        * timestamps, possibly existing but not yet written 
DBs etc. */
+                       if (! 
pgstat_get_db_entry(last_statrequests[i].databaseid, false))
+                               
pgstat_write_db_dummyfile(last_statrequests[i].databaseid);
+               }
+
+               pfree(last_statrequests);
+               last_statrequests = NULL;
+               num_statrequests = 0;
+       }
+
        if (ferror(fpout))
        {
                ereport(LOG,
@@ -3552,57 +3614,247 @@ pgstat_write_statsfile(bool permanent)
                                                tmpfile, statfile)));
                unlink(tmpfile);
        }
-       else
+
+       if (permanent)
+               unlink(pgstat_stat_filename);
+}
+
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *     Tell the news. This writes stats file for a single database.
+ *
+ *     If writing to the permanent file (happens when the collector is
+ *     shutting down only), remove the temporary file so that backends
+ *     starting up under a new postmaster can't read the old data before
+ *     the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+       HASH_SEQ_STATUS tstat;
+       HASH_SEQ_STATUS fstat;
+       PgStat_StatTabEntry *tabentry;
+       PgStat_StatFuncEntry *funcentry;
+       FILE       *fpout;
+       int32           format_id;
+       const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_DB_TMPFILE : 
pgstat_stat_db_tmpname;
+       const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : 
pgstat_stat_db_filename;
+       int                     rc;
+
+       /*
+        * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 
byte
+        */
+       char db_tmpfile[strlen(tmpfile) + 11];
+       char db_statfile[strlen(statfile) + 11];
+
+       /*
+        * Append database OID at the end of the basic filename (both for tmp 
and target file).
+        */
+       snprintf(db_tmpfile, strlen(tmpfile) + 11, tmpfile, 
dbentry->databaseid);
+       snprintf(db_statfile, strlen(statfile) + 11, statfile, 
dbentry->databaseid);
+
+       elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+       /*
+        * Open the statistics temp file to write out the current values.
+        */
+       fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+       if (fpout == NULL)
        {
-               /*
-                * Successful write, so update last_statwrite.
-                */
-               last_statwrite = globalStats.stats_timestamp;
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                                errmsg("could not open temporary statistics 
file \"%s\": %m",
+                                               db_tmpfile)));
+               return;
+       }
 
-               /*
-                * If there is clock skew between backends and the collector, 
we could
-                * receive a stats request time that's in the future.  If so, 
complain
-                * and reset last_statrequest.  Resetting ensures that no 
inquiry
-                * message can cause more than one stats file write to occur.
-                */
-               if (last_statrequest > last_statwrite)
-               {
-                       char       *reqtime;
-                       char       *mytime;
-
-                       /* Copy because timestamptz_to_str returns a static 
buffer */
-                       reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-                       mytime = pstrdup(timestamptz_to_str(last_statwrite));
-                       elog(LOG, "last_statrequest %s is later than 
collector's time %s",
-                                reqtime, mytime);
-                       pfree(reqtime);
-                       pfree(mytime);
-
-                       last_statrequest = last_statwrite;
-               }
+       /*
+        * Write the file header --- currently just a format ID.
+        */
+       format_id = PGSTAT_FILE_FORMAT_ID;
+       rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+       (void) rc;                                      /* we'll check for 
error with ferror */
+
+       /*
+        * Write the timestamp.
+        */
+       rc = fwrite(&(globalStats.stats_timestamp), 
sizeof(globalStats.stats_timestamp), 1, fpout);
+       (void) rc;                                      /* we'll check for 
error with ferror */
+
+       /*
+        * Walk through the database's access stats per table.
+        */
+       hash_seq_init(&tstat, dbentry->tables);
+       while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != 
NULL)
+       {
+               fputc('T', fpout);
+               rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+               (void) rc;                      /* we'll check for error with 
ferror */
        }
 
+       /*
+        * Walk through the database's function stats table.
+        */
+       hash_seq_init(&fstat, dbentry->functions);
+       while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) 
!= NULL)
+       {
+               fputc('F', fpout);
+               rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+               (void) rc;                      /* we'll check for error with 
ferror */
+       }
+
+       /*
+        * No more output to be done. Close the temp file and replace the old
+        * pgstat.stat with it.  The ferror() check replaces testing for error
+        * after each individual fputc or fwrite above.
+        */
+       fputc('E', fpout);
+
+       if (ferror(fpout))
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                          errmsg("could not write temporary statistics file 
\"%s\": %m",
+                                         db_tmpfile)));
+               FreeFile(fpout);
+               unlink(db_tmpfile);
+       }
+       else if (FreeFile(fpout) < 0)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                          errmsg("could not close temporary statistics file 
\"%s\": %m",
+                                         db_tmpfile)));
+               unlink(db_tmpfile);
+       }
+       else if (rename(db_tmpfile, db_statfile) < 0)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                                errmsg("could not rename temporary statistics 
file \"%s\" to \"%s\": %m",
+                                               db_tmpfile, db_statfile)));
+               unlink(db_tmpfile);
+       }
+       
        if (permanent)
-               unlink(pgstat_stat_filename);
+       {
+               char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+               snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+                                pgstat_stat_db_filename, dbentry->databaseid);
+               elog(DEBUG1, "removing temporary stat file '%s'", db_statfile);
+               unlink(db_statfile);
+       }
 }
 
 
 /* ----------
+ * pgstat_write_db_dummyfile() -
+ *
+ *     All this does is writing a dummy stat file for databases without dbentry
+ *     yet. It basically writes just a file header - format ID and a timestamp.
+ * ----------
+ */
+static void
+pgstat_write_db_dummyfile(Oid databaseid)
+{
+       FILE       *fpout;
+       int32           format_id;
+       int                     rc;
+
+       /*
+        * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 
byte
+        */
+       char db_tmpfile[strlen(pgstat_stat_db_tmpname) + 11];
+       char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+
+       /*
+        * Append database OID at the end of the basic filename (both for tmp 
and target file).
+        */
+       snprintf(db_tmpfile, strlen(pgstat_stat_db_tmpname) + 11, 
pgstat_stat_db_tmpname, databaseid);
+       snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11, 
pgstat_stat_db_filename, databaseid);
+
+       elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+       /*
+        * Open the statistics temp file to write out the current values.
+        */
+       fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+       if (fpout == NULL)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                                errmsg("could not open temporary statistics 
file \"%s\": %m",
+                                               db_tmpfile)));
+               return;
+       }
+
+       /*
+        * Write the file header --- currently just a format ID.
+        */
+       format_id = PGSTAT_FILE_FORMAT_ID;
+       rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+       (void) rc;                                      /* we'll check for 
error with ferror */
+
+       /*
+        * Write the timestamp.
+        */
+       rc = fwrite(&(globalStats.stats_timestamp), 
sizeof(globalStats.stats_timestamp), 1, fpout);
+       (void) rc;                                      /* we'll check for 
error with ferror */
+
+       /*
+        * No more output to be done. Close the temp file and replace the old
+        * pgstat.stat with it.  The ferror() check replaces testing for error
+        * after each individual fputc or fwrite above.
+        */
+       fputc('E', fpout);
+
+       if (ferror(fpout))
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                          errmsg("could not write temporary dummy statistics 
file \"%s\": %m",
+                                         db_tmpfile)));
+               FreeFile(fpout);
+               unlink(db_tmpfile);
+       }
+       else if (FreeFile(fpout) < 0)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                          errmsg("could not close temporary dummy statistics 
file \"%s\": %m",
+                                         db_tmpfile)));
+               unlink(db_tmpfile);
+       }
+       else if (rename(db_tmpfile, db_statfile) < 0)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                                errmsg("could not rename temporary dummy 
statistics file \"%s\" to \"%s\": %m",
+                                               db_tmpfile, db_statfile)));
+               unlink(db_tmpfile);
+       }
+
+}
+
+/* ----------
  * pgstat_read_statsfile() -
  *
  *     Reads in an existing statistics collector file and initializes the
  *     databases' hash table (whose entries point to the tables' hash tables).
+ * 
+ *     Allows reading only the global stats (at database level), which is just
+ *     enough for many purposes (e.g. autovacuum launcher etc.). If this is
+ *     sufficient for you, use onlydbs=true.
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
        PgStat_StatDBEntry *dbentry;
        PgStat_StatDBEntry dbbuf;
-       PgStat_StatTabEntry *tabentry;
-       PgStat_StatTabEntry tabbuf;
-       PgStat_StatFuncEntry funcbuf;
-       PgStat_StatFuncEntry *funcentry;
        HASHCTL         hash_ctl;
        HTAB       *dbhash;
        HTAB       *tabhash = NULL;
@@ -3613,6 +3865,11 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
        const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
 
        /*
+        * If we want a db-level stats only, we don't want a particular db.
+        */
+       Assert(!((onlydb != InvalidOid) && onlydbs));
+
+       /*
         * The tables will live in pgStatLocalContext.
         */
        pgstat_setup_memcxt();
@@ -3758,6 +4015,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                 */
                                tabhash = dbentry->tables;
                                funchash = dbentry->functions;
+
+                               /*
+                                * Read the data from the file for this 
database. If there was
+                                * onlydb specified (!= InvalidOid), we would 
not get here because
+                                * of a break above. So we don't need to 
recheck.
+                                */
+                               if (! onlydbs)
+                                       
pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+                                                                               
        permanent);
+
                                break;
 
                                /*
@@ -3768,6 +4035,105 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                funchash = NULL;
                                break;
 
+                       case 'E':
+                               goto done;
+
+                       default:
+                               ereport(pgStatRunningInCollector ? LOG : 
WARNING,
+                                               (errmsg("corrupted statistics 
file \"%s\"",
+                                                               statfile)));
+                               goto done;
+               }
+       }
+
+done:
+       FreeFile(fpin);
+
+       if (permanent)
+               unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+       return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *     Reads in an existing statistics collector db file and initializes the
+ *     tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool 
permanent)
+{
+       PgStat_StatTabEntry *tabentry;
+       PgStat_StatTabEntry tabbuf;
+       PgStat_StatFuncEntry funcbuf;
+       PgStat_StatFuncEntry *funcentry;
+       FILE       *fpin;
+       int32           format_id;
+       TimestampTz timestamp;
+       bool            found;
+       const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : 
pgstat_stat_db_filename;
+
+       /*
+        * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 
byte
+        */
+       char db_statfile[strlen(statfile) + 11];
+
+       /*
+        * Append database OID at the end of the basic filename (both for tmp 
and target file).
+        */
+       snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
+
+       /*
+        * Try to open the status file. If it doesn't exist, the backends simply
+        * return zero for anything and the collector simply starts from scratch
+        * with empty counters.
+        *
+        * ENOENT is a possibility if the stats collector is not running or has
+        * not yet written the stats file the first time.  Any other failure
+        * condition is suspicious.
+        */
+       if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
+       {
+               if (errno != ENOENT)
+                       ereport(pgStatRunningInCollector ? LOG : WARNING,
+                                       (errcode_for_file_access(),
+                                        errmsg("could not open statistics file 
\"%s\": %m",
+                                                       db_statfile)));
+               return;
+       }
+
+       /*
+        * Verify it's of the expected format.
+        */
+       if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+               || format_id != PGSTAT_FILE_FORMAT_ID)
+       {
+               ereport(pgStatRunningInCollector ? LOG : WARNING,
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
+               goto done;
+       }
+
+       /*
+        * Read global stats struct
+        */
+       if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+       {
+               ereport(pgStatRunningInCollector ? LOG : WARNING,
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
+               goto done;
+       }
+
+       /*
+        * We found an existing collector stats file. Read it and put all the
+        * hashtable entries into place.
+        */
+       for (;;)
+       {
+               switch (fgetc(fpin))
+               {
                                /*
                                 * 'T'  A PgStat_StatTabEntry follows.
                                 */
@@ -3777,7 +4143,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3795,7 +4161,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3811,7 +4177,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3829,7 +4195,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3845,7 +4211,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                        default:
                                ereport(pgStatRunningInCollector ? LOG : 
WARNING,
                                                (errmsg("corrupted statistics 
file \"%s\"",
-                                                               statfile)));
+                                                               db_statfile)));
                                goto done;
                }
        }
@@ -3854,37 +4220,47 @@ done:
        FreeFile(fpin);
 
        if (permanent)
-               unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+       {
+               char db_statfile[strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 
11];
+               snprintf(db_statfile, strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) 
+ 11,
+                                PGSTAT_STAT_PERMANENT_DB_FILENAME, databaseid);
+               elog(DEBUG1, "removing permanent stats file '%s'", db_statfile);
+               unlink(db_statfile);
+       }
 
-       return dbhash;
+       return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *     Attempt to fetch the timestamp of an existing stats file.
+ *     Attempt to fetch the timestamp of an existing stats file (for a DB).
  *     Returns TRUE if successful (timestamp is stored at *ts).
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz 
*ts)
 {
-       PgStat_GlobalStats myGlobalStats;
+       TimestampTz timestamp;
        FILE       *fpin;
        int32           format_id;
-       const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
+       const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : 
pgstat_stat_db_filename;
+       char db_statfile[strlen(statfile) + 11];
+
+       /* format the db statfile filename */
+       snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
 
        /*
         * Try to open the status file.  As above, anything but ENOENT is worthy
         * of complaining about.
         */
-       if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+       if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
        {
                if (errno != ENOENT)
                        ereport(pgStatRunningInCollector ? LOG : WARNING,
                                        (errcode_for_file_access(),
                                         errmsg("could not open statistics file 
\"%s\": %m",
-                                                       statfile)));
+                                                       db_statfile)));
                return false;
        }
 
@@ -3895,7 +4271,7 @@ pgstat_read_statsfile_timestamp(bool permanent, 
TimestampTz *ts)
                || format_id != PGSTAT_FILE_FORMAT_ID)
        {
                ereport(pgStatRunningInCollector ? LOG : WARNING,
-                               (errmsg("corrupted statistics file \"%s\"", 
statfile)));
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
                FreeFile(fpin);
                return false;
        }
@@ -3903,15 +4279,15 @@ pgstat_read_statsfile_timestamp(bool permanent, 
TimestampTz *ts)
        /*
         * Read global stats struct
         */
-       if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != 
sizeof(myGlobalStats))
+       if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != 
sizeof(TimestampTz))
        {
                ereport(pgStatRunningInCollector ? LOG : WARNING,
-                               (errmsg("corrupted statistics file \"%s\"", 
statfile)));
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
                FreeFile(fpin);
                return false;
        }
 
-       *ts = myGlobalStats.stats_timestamp;
+       *ts = timestamp;
 
        FreeFile(fpin);
        return true;
@@ -3947,7 +4323,7 @@ backend_read_statsfile(void)
 
                CHECK_FOR_INTERRUPTS();
 
-               ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+               ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, 
&file_ts);
 
                cur_ts = GetCurrentTimestamp();
                /* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4382,7 @@ backend_read_statsfile(void)
                                pfree(mytime);
                        }
 
-                       pgstat_send_inquiry(cur_ts, min_ts);
+                       pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
                        break;
                }
 
@@ -4016,7 +4392,7 @@ backend_read_statsfile(void)
 
                /* Not there or too old, so kick the collector and wait a bit */
                if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-                       pgstat_send_inquiry(cur_ts, min_ts);
+                       pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
                pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
        }
@@ -4026,9 +4402,16 @@ backend_read_statsfile(void)
 
        /* Autovacuum launcher wants stats about all databases */
        if (IsAutoVacuumLauncherProcess())
-               pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+               /* 
+                * FIXME Does it really need info including tables/functions? 
Or is it enough to read
+                * database-level stats? It seems to me the launcher needs 
PgStat_StatDBEntry only
+                * (at least that's how I understand the 
rebuild_database_list() in autovacuum.c),
+                * because pgstat_stattabentries are used in do_autovacuum() 
only, that that's what's
+                * executed in workers ... So maybe we'd be just fine by 
reading in the dbentries?
+                */
+               pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
        else
-               pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+               pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, 
false);
 }
 
 
@@ -4084,44 +4467,84 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
-       /*
-        * Advance last_statrequest if this requestor has a newer cutoff time
-        * than any previous request.
-        */
-       if (msg->cutoff_time > last_statrequest)
-               last_statrequest = msg->cutoff_time;
+       int i = 0;
+       bool found = false;
+       PgStat_StatDBEntry *dbentry;
+
+       elog(DEBUG1, "received inquiry for %d", msg->databaseid);
 
        /*
-        * If the requestor's local clock time is older than last_statwrite, we
-        * should suspect a clock glitch, ie system time going backwards; though
-        * the more likely explanation is just delayed message receipt.  It is
-        * worth expending a GetCurrentTimestamp call to be sure, since a large
-        * retreat in the system clock reading could otherwise cause us to 
neglect
-        * to update the stats file for a long time.
+        * Find the last write request for this DB (found=true in that case). 
Plain
+        * linear search, not really worth doing any magic here (probably).
         */
-       if (msg->clock_time < last_statwrite)
+       for (i = 0; i < num_statrequests; i++)
+       {
+               if (last_statrequests[i].databaseid == msg->databaseid)
+               {
+                       found = true;
+                       break;
+               }
+       }
+       
+       if (found)
+       {
+               /*
+                * There already is a request for this DB, so lets advance the
+                * request time  if this requestor has a newer cutoff time
+                * than any previous request.
+                */
+               if (msg->cutoff_time > last_statrequests[i].request_time)
+                       last_statrequests[i].request_time = msg->cutoff_time;
+       }
+       else
        {
-               TimestampTz cur_ts = GetCurrentTimestamp();
+               /*
+                * There's no request for this DB yet, so lets create it 
(allocate a
+                * space for it, set the values).
+                */
+               if (last_statrequests == NULL)
+                       last_statrequests = palloc(sizeof(DBWriteRequest));
+               else
+                       last_statrequests = repalloc(last_statrequests,
+                                                               
(num_statrequests + 1)*sizeof(DBWriteRequest));
+               
+               last_statrequests[num_statrequests].databaseid = 
msg->databaseid;
+               last_statrequests[num_statrequests].request_time = 
msg->clock_time;
+               num_statrequests += 1;
 
-               if (cur_ts < last_statwrite)
+               /*
+               * If the requestor's local clock time is older than 
last_statwrite, we
+               * should suspect a clock glitch, ie system time going 
backwards; though
+               * the more likely explanation is just delayed message receipt.  
It is
+               * worth expending a GetCurrentTimestamp call to be sure, since 
a large
+               * retreat in the system clock reading could otherwise cause us 
to neglect
+               * to update the stats file for a long time.
+               */
+               dbentry = pgstat_get_db_entry(msg->databaseid, false);
+               if ((dbentry != NULL) && (msg->clock_time < 
dbentry->stats_timestamp))
                {
-                       /*
-                        * Sure enough, time went backwards.  Force a new stats 
file write
-                        * to get back in sync; but first, log a complaint.
-                        */
-                       char       *writetime;
-                       char       *mytime;
-
-                       /* Copy because timestamptz_to_str returns a static 
buffer */
-                       writetime = pstrdup(timestamptz_to_str(last_statwrite));
-                       mytime = pstrdup(timestamptz_to_str(cur_ts));
-                       elog(LOG, "last_statwrite %s is later than collector's 
time %s",
-                                writetime, mytime);
-                       pfree(writetime);
-                       pfree(mytime);
-
-                       last_statrequest = cur_ts;
-                       last_statwrite = last_statrequest - 1;
+                       TimestampTz cur_ts = GetCurrentTimestamp();
+
+                       if (cur_ts < dbentry->stats_timestamp)
+                       {
+                               /*
+                               * Sure enough, time went backwards.  Force a 
new stats file write
+                               * to get back in sync; but first, log a 
complaint.
+                               */
+                               char       *writetime;
+                               char       *mytime;
+
+                               /* Copy because timestamptz_to_str returns a 
static buffer */
+                               writetime = 
pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
+                               mytime = pstrdup(timestamptz_to_str(cur_ts));
+                               elog(LOG, "last_statwrite %s is later than 
collector's time %s for "
+                                       "db %d", writetime, mytime, 
dbentry->databaseid);
+                               pfree(writetime);
+                               pfree(mytime);
+
+                               
last_statrequests[num_statrequests].request_time = cur_ts;
+                               dbentry->stats_timestamp = cur_ts - 1;
+                       }
                }
        }
 }
@@ -4278,10 +4701,17 @@ pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
        dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
 
        /*
-        * If found, remove it.
+        * If found, remove it (along with the db statfile).
         */
        if (dbentry)
        {
+               char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+               snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+                                pgstat_stat_filename, dbentry->databaseid);
+               
+               elog(DEBUG1, "removing %s", db_statfile);
+               unlink(db_statfile);
+               
                if (dbentry->tables != NULL)
                        hash_destroy(dbentry->tables);
                if (dbentry->functions != NULL)
@@ -4687,3 +5117,58 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
                                                   HASH_REMOVE, NULL);
        }
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *     Checks whether there's a db stats request, requiring a file write.
+ * 
+ *     TODO Seems that thanks the way we handle last_statrequests (erase after
+ *     a write), this is unnecessary. Just check that there's at least one
+ *     request and you're done. Although there might be delayed requests ...
+ * ----------
+ */
+
+static bool pgstat_write_statsfile_needed()
+{
+       int i = 0;
+       PgStat_StatDBEntry *dbentry;
+       
+       /* Check the databases if they need to refresh the stats. */
+       for (i = 0; i < num_statrequests; i++)
+       {
+               dbentry = pgstat_get_db_entry(last_statrequests[i].databaseid, 
false);
+               
+               /* No dbentry yet or too old. */
+               if ((! dbentry) ||
+                       (dbentry->stats_timestamp < 
last_statrequests[i].request_time)) {
+                       return true;
+               }
+               
+       }
+       
+       /* Well, everything was written recently ... */
+       return false;
+}
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *     Checks whether stats for a particular DB need to be written to a file).
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+       int i = 0;
+       
+       /* Check the databases if they need to refresh the stats. */
+       for (i = 0; i < num_statrequests; i++)
+       {
+               if (last_statrequests[i].databaseid == databaseid)
+                       return true;
+       }
+       
+       return false;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2cf34ce..e3e432b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8730,20 +8730,43 @@ static void
 assign_pgstat_temp_directory(const char *newval, void *extra)
 {
        /* check_canonical_path already canonicalized newval for us */
+       char       *dname;
        char       *tname;
        char       *fname;
-
-       tname = guc_malloc(ERROR, strlen(newval) + 12);         /* /pgstat.tmp 
*/
-       sprintf(tname, "%s/pgstat.tmp", newval);
-       fname = guc_malloc(ERROR, strlen(newval) + 13);         /* /pgstat.stat 
*/
-       sprintf(fname, "%s/pgstat.stat", newval);
-
+       char       *tname_db;
+       char       *fname_db;
+
+       /* directory */
+       dname = guc_malloc(ERROR, strlen(newval) + 1);          /* runtime dir 
*/
+       sprintf(dname, "%s", newval);
+
+       /* global stats */
+       tname = guc_malloc(ERROR, strlen(newval) + 12);         /* /global.tmp 
*/
+       sprintf(tname, "%s/global.tmp", newval);
+       fname = guc_malloc(ERROR, strlen(newval) + 13);         /* /global.stat 
*/
+       sprintf(fname, "%s/global.stat", newval);
+
+       /* per-db stats */
+       tname_db = guc_malloc(ERROR, strlen(newval) + 8);               /* 
/%d.tmp */
+       sprintf(tname_db, "%s/%%d.tmp", newval);
+       fname_db = guc_malloc(ERROR, strlen(newval) + 9);               /* 
/%d.stat */
+       sprintf(fname_db, "%s/%%d.stat", newval);
+
+       if (pgstat_stat_directory)
+               free(pgstat_stat_directory);
+       pgstat_stat_directory = dname;
        if (pgstat_stat_tmpname)
                free(pgstat_stat_tmpname);
        pgstat_stat_tmpname = tname;
        if (pgstat_stat_filename)
                free(pgstat_stat_filename);
        pgstat_stat_filename = fname;
+       if (pgstat_stat_db_tmpname)
+               free(pgstat_stat_db_tmpname);
+       pgstat_stat_db_tmpname = tname_db;
+       if (pgstat_stat_db_filename)
+               free(pgstat_stat_db_filename);
+       pgstat_stat_db_filename = fname_db;
 }
 
 static bool
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3e05ac3..8c86301 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -179,6 +179,7 @@ char           *restrict_env;
 #endif
 const char *subdirs[] = {
        "global",
+       "global/stat",
        "pg_xlog",
        "pg_xlog/archive_status",
        "pg_clog",
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..b3467d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
        PgStat_MsgHdr m_hdr;
        TimestampTz clock_time;         /* observed local clock time */
        TimestampTz cutoff_time;        /* minimum acceptable file timestamp */
+       Oid                     databaseid;             /* requested DB 
(InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -514,7 +515,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID  0x01A5BC9A
+#define PGSTAT_FILE_FORMAT_ID  0xA240CA47
 
 /* ----------
  * PgStat_StatDBEntry                  The collector's data per database
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
        PgStat_Counter n_block_write_time;
 
        TimestampTz stat_reset_timestamp;
+       TimestampTz stats_timestamp;            /* time of db stats file update 
*/
 
        /*
         * tables and functions must be last in the struct, because we don't 
write
@@ -722,8 +724,11 @@ extern bool pgstat_track_activities;
 extern bool pgstat_track_counts;
 extern int     pgstat_track_functions;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern char *pgstat_stat_directory;
 extern char *pgstat_stat_tmpname;
 extern char *pgstat_stat_filename;
+extern char *pgstat_stat_db_tmpname;
+extern char *pgstat_stat_db_filename;
 
 /*
  * BgWriter statistics counters are updated directly by bgwriter and bufmgr
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to