On 21.11.2012 19:02, Robert Haas wrote:
> On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra <t...@fuzzy.cz> wrote:
>> The two main changes are these:
>>
>> (1) The stats file is split into a common "db" file, containing all the
>>     DB Entries, and per-database files with tables/functions. The common
>>     file is still called "pgstat.stat", the per-db files have the
>>     database OID appended, so for example "pgstat.stat.12345" etc.
>>
>>     This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
>>     functions, introducing two new functions:
>>
>>        pgstat_read_db_statsfile
>>        pgstat_write_db_statsfile
>>
>>     that do the trick of reading/writing stat file for one database.
>>
>> (2) The pgstat_read_statsfile has an additional parameter "onlydbs" that
>>     says that you don't need table/func stats - just the list of db
>>     entries. This is used for autovacuum launcher, which does not need
>>     to read the table/stats (if I'm reading the code in autovacuum.c
>>     correctly - it seems to be working as expected).
> 
> I'm not an expert on the stats system, but this seems like a promising
> approach to me.
> 
>> (a) It does not solve the "many-schema" scenario at all - that'll need
>>     a completely new approach I guess :-(
> 
> We don't need to solve every problem in the first patch.  I've got no
> problem kicking this one down the road.
> 
>> (b) It does not solve the writing part at all - the current code uses a
>>     single timestamp (last_statwrite) to decide if a new file needs to
>>     be written.
>>
>>     That clearly is not enough for multiple files - there should be one
>>     timestamp for each database/file. I'm thinking about how to solve
>>     this and how to integrate it with pgstat_send_inquiry etc.
> 
> Presumably you need a last_statwrite for each file, in a hash table or
> something, and requests need to specify which file is needed.
> 
>>     And yet another one I'm thinking about is using a fixed-length
>>     array of timestamps (e.g. 256), indexed by mod(dboid,256). That
>>     would mean stats for all databases with the same mod(oid,256) would
>>     be written at the same time. Seems like an over-engineering though.
> 
> That seems like an unnecessary kludge.
> 
>> (c) I'm a bit worried about the number of files - right now there's one
>>     for each database and I'm thinking about splitting them by type
>>     (one for tables, one for functions) which might make it even faster
>>     for some apps with a lot of stored procedures etc.
>>
>>     But is the large number of files actually a problem? After all,
>>     we're using one file per relation fork in the "base" directory, so
>>     this seems like a minor issue.
> 
> I don't see why one file per database would be a problem.  After all,
> we already have on directory per database inside base/.  If the user
> has so many databases that dirent lookups in a directory of that size
> are a problem, they're already hosed, and this will probably still
> work out to a net win.

Attached is a v2 of the patch, fixing some of the issues and unclear
points from the initial version.

The main improvement is that it implements writing only stats for the
requested database (set when sending inquiry). There's a dynamic array
of request - for each DB only the last request is kept.

I've done a number of changes - most importantly:

- added a stats_timestamp field to PgStat_StatDBEntry, keeping the last
  write of the database (i.e. a per-database last_statwrite), which is
  used to decide whether the file is stale or not

- handling of the 'permanent' flag correctly (used when starting or
  stopping the cluster) for per-db files

- added a very simple header to the per-db files (basically just a
  format ID and a timestamp) - this is needed for checking of the
  timestamp of the last write from workers (although maybe we could
  just read the pgstat.stat, which is now rather small)

- a 'force' parameter (true - write all databases, even if they weren't
  specifically requested)


So with the exception of 'multi-schema' case (which was not the aim of
this effort), it should solve all the issues of the initial version.

There are two blocks of code dealing with clock glitches. I haven't
fixed those yet, but that can wait I guess. I've also left there some
logging I've used during development (printing inquiries and which file
is written and when).

The main unsolved problem I'm struggling with is what to do when a
database is dropped? Right now, the statfile remains in pg_stat_tmp
forewer (or until the restart) - is there a good way to remove the
file? I'm thinking about adding a message to be sent to the collector
from the code that handles DROP TABLE.


I've done some very simple performance testing - I've created 1000
databases with 1000 tables each, done ANALYZE on all of them. With only
autovacum running, I've seen this:


Without the patch
-----------------

%CPU  %MEM   TIME+   COMMAND
18    3.0    0:10.10 postgres: autovacuum launcher process
17    2.6    0:11.44 postgres: stats collector process

The I/O was seriously bogged down, doing ~150 MB/s (basically what the
drive can handle) - with less dbs, or when the statfiles are placed on
tmpfs filesystem, we usually see ~70% of one core doing just this.


With the patch
--------------

Then, the typical "top" for PostgreSQL processes looked like this:

%CPU  %MEM   TIME+   COMMAND
2     0.3    1:16.57 postgres: autovacuum launcher process
2     3.1    0:25.34 postgres: stats collector process

and the average write speed from the stats collector was ~3.5MB/s
(measured using iotop), and even when running the ANALYZE etc. I was
getting rather light IO usage (like ~15 MB/s or something).

With both cases, the total size was ~150MB, but without the space
requirements are actually 2x that (because of writing a copy and then
renaming).


I'd like to put this into 2013-01 commit fest, but if we can do some
prior testing / comments, that'd be great.


regards
Tomas
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..63b9e14 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -222,8 +222,16 @@ static PgStat_GlobalStats globalStats;
 /* Last time the collector successfully wrote the stats file */
 static TimestampTz last_statwrite;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+       Oid                     databaseid;             /* OID of the database 
to write */
+       TimestampTz request_time;       /* timestamp of the last write request 
*/
+} DBWriteRequest;
+
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,8 +260,10 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
                                         Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool 
permanent);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB 
*funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
@@ -285,6 +295,8 @@ static void 
pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
 
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
@@ -1408,13 +1420,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid 
databaseid)
 {
        PgStat_MsgInquiry msg;
 
        pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
        msg.clock_time = clock_time;
        msg.cutoff_time = cutoff_time;
+       msg.databaseid = databaseid;
        pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3017,7 @@ PgstatCollectorMain(int argc, char *argv[])
        int                     len;
        PgStat_Msg      msg;
        int                     wr;
+       bool            first_write = true;
 
        IsUnderPostmaster = true;       /* we are a postmaster subprocess now */
 
@@ -3055,15 +3069,15 @@ PgstatCollectorMain(int argc, char *argv[])
        /*
         * Arrange to write the initial status file right away
         */
-       last_statrequest = GetCurrentTimestamp();
-       last_statwrite = last_statrequest - 1;
-
+       // last_statrequest = GetCurrentTimestamp();
+       // last_statwrite = GetCurrentTimestamp() - 1;
+       
        /*
         * Read in an existing statistics stats file or initialize the stats to
         * zero.
         */
        pgStatRunningInCollector = true;
-       pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+       pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
        /*
         * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3109,8 +3123,11 @@ PgstatCollectorMain(int argc, char *argv[])
                         * Write the stats file if a new request has arrived 
that is not
                         * satisfied by existing file.
                         */
-                       if (last_statwrite < last_statrequest)
-                               pgstat_write_statsfile(false);
+                       if (first_write || pgstat_write_statsfile_needed())
+                       {
+                               pgstat_write_statsfile(false, first_write);
+                               first_write = false;
+                       }
 
                        /*
                         * Try to receive and process a message.  This will not 
block,
@@ -3269,7 +3286,7 @@ PgstatCollectorMain(int argc, char *argv[])
        /*
         * Save the final stats to reuse at next startup.
         */
-       pgstat_write_statsfile(true);
+       pgstat_write_statsfile(true, true);
 
        exit(0);
 }
@@ -3432,20 +3449,18 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid 
tableoid, bool create)
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfile(bool permanent, bool force)
 {
        HASH_SEQ_STATUS hstat;
-       HASH_SEQ_STATUS tstat;
-       HASH_SEQ_STATUS fstat;
        PgStat_StatDBEntry *dbentry;
-       PgStat_StatTabEntry *tabentry;
-       PgStat_StatFuncEntry *funcentry;
        FILE       *fpout;
        int32           format_id;
        const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : 
pgstat_stat_tmpname;
        const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
        int                     rc;
 
+       elog(WARNING, "writing statsfile '%s'", statfile);
+       
        /*
         * Open the statistics temp file to write out the current values.
         */
@@ -3489,36 +3504,36 @@ pgstat_write_statsfile(bool permanent)
                 * use to any other process.
                 */
                fputc('D', fpout);
+               dbentry->stats_timestamp = globalStats.stats_timestamp;
                rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, 
fpout);
                (void) rc;                              /* we'll check for 
error with ferror */
 
                /*
-                * Walk through the database's access stats per table.
+                * Write our the tables and functions into a separate file, but 
only
+                * if the database is in the requests or if it's a forced write 
(then
+                * all the DBs need to be written - e.g. at the shutdown).
                 */
-               hash_seq_init(&tstat, dbentry->tables);
-               while ((tabentry = (PgStat_StatTabEntry *) 
hash_seq_search(&tstat)) != NULL)
-               {
-                       fputc('T', fpout);
-                       rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, 
fpout);
-                       (void) rc;                      /* we'll check for 
error with ferror */
-               }
-
-               /*
-                * Walk through the database's function stats table.
-                */
-               hash_seq_init(&fstat, dbentry->functions);
-               while ((funcentry = (PgStat_StatFuncEntry *) 
hash_seq_search(&fstat)) != NULL)
-               {
-                       fputc('F', fpout);
-                       rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, 
fpout);
-                       (void) rc;                      /* we'll check for 
error with ferror */
+               if (force || pgstat_db_requested(dbentry->databaseid)) {
+                       elog(WARNING, "writing statsfile for DB %d", 
dbentry->databaseid);
+                       pgstat_write_db_statsfile(dbentry, permanent);
                }
 
                /*
                 * Mark the end of this DB
+                * 
+                * FIXME does it really make much sense, when the 
tables/functions
+                * are moved to a separate file (using those chars?)
                 */
                fputc('d', fpout);
        }
+       
+       /* In any case, we can just throw away all the db requests. */
+       if (last_statrequests != NULL)
+       {
+               pfree(last_statrequests);
+               last_statrequests = NULL;
+               num_statrequests = 0;
+       }
 
        /*
         * No more output to be done. Close the temp file and replace the old
@@ -3559,27 +3574,28 @@ pgstat_write_statsfile(bool permanent)
                 */
                last_statwrite = globalStats.stats_timestamp;
 
+               /* FIXME Update to the per-db request times. */
                /*
                 * If there is clock skew between backends and the collector, 
we could
                 * receive a stats request time that's in the future.  If so, 
complain
                 * and reset last_statrequest.  Resetting ensures that no 
inquiry
                 * message can cause more than one stats file write to occur.
                 */
-               if (last_statrequest > last_statwrite)
-               {
-                       char       *reqtime;
-                       char       *mytime;
-
-                       /* Copy because timestamptz_to_str returns a static 
buffer */
-                       reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-                       mytime = pstrdup(timestamptz_to_str(last_statwrite));
-                       elog(LOG, "last_statrequest %s is later than 
collector's time %s",
-                                reqtime, mytime);
-                       pfree(reqtime);
-                       pfree(mytime);
-
-                       last_statrequest = last_statwrite;
-               }
+//             if (last_statrequest > last_statwrite)
+//             {
+//                     char       *reqtime;
+//                     char       *mytime;
+// 
+//                     /* Copy because timestamptz_to_str returns a static 
buffer */
+//                     reqtime = pstrdup(timestamptz_to_str(last_statrequest));
+//                     mytime = pstrdup(timestamptz_to_str(last_statwrite));
+//                     elog(LOG, "last_statrequest %s is later than 
collector's time %s",
+//                              reqtime, mytime);
+//                     pfree(reqtime);
+//                     pfree(mytime);
+// 
+//                     last_statrequest = last_statwrite;
+//             }
        }
 
        if (permanent)
@@ -3587,6 +3603,137 @@ pgstat_write_statsfile(bool permanent)
 }
 
 
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *     Tell the news.
+ *     If writing to the permanent file (happens when the collector is
+ *     shutting down only), remove the temporary file so that backends
+ *     starting up under a new postmaster can't read the old data before
+ *     the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+       HASH_SEQ_STATUS tstat;
+       HASH_SEQ_STATUS fstat;
+       PgStat_StatTabEntry *tabentry;
+       PgStat_StatFuncEntry *funcentry;
+       FILE       *fpout;
+       int32           format_id;
+       const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : 
pgstat_stat_tmpname;
+       const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
+       int                     rc;
+
+       /*
+        * OIDs are 32-bit values, so 10 chars should be safe, +2 for the dot 
and \0 byte
+        */
+       char db_tmpfile[strlen(tmpfile) + 12];
+       char db_statfile[strlen(statfile) + 12];
+
+       /*
+        * Append database OID at the end of the basic filename (both for tmp 
and target file).
+        */
+       snprintf(db_tmpfile, strlen(tmpfile) + 12, "%s.%d", tmpfile, 
dbentry->databaseid);
+       snprintf(db_statfile, strlen(statfile) + 12, "%s.%d", statfile, 
dbentry->databaseid);
+
+       elog(WARNING, "writing statsfile '%s'", db_statfile);
+
+       /*
+        * Open the statistics temp file to write out the current values.
+        */
+       fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+       if (fpout == NULL)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                                errmsg("could not open temporary statistics 
file \"%s\": %m",
+                                               db_tmpfile)));
+               return;
+       }
+
+       /*
+        * Write the file header --- currently just a format ID.
+        */
+       format_id = PGSTAT_FILE_FORMAT_ID;
+       rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+       (void) rc;                                      /* we'll check for 
error with ferror */
+
+       /*
+        * Write the timestamp.
+        */
+       rc = fwrite(&(globalStats.stats_timestamp), 
sizeof(globalStats.stats_timestamp), 1, fpout);
+       (void) rc;                                      /* we'll check for 
error with ferror */
+
+       /*
+        * Walk through the database's access stats per table.
+        */
+       hash_seq_init(&tstat, dbentry->tables);
+       while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != 
NULL)
+       {
+               fputc('T', fpout);
+               rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+               (void) rc;                      /* we'll check for error with 
ferror */
+       }
+
+       /*
+        * Walk through the database's function stats table.
+        */
+       hash_seq_init(&fstat, dbentry->functions);
+       while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) 
!= NULL)
+       {
+               fputc('F', fpout);
+               rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+               (void) rc;                      /* we'll check for error with 
ferror */
+       }
+
+       /*
+        * No more output to be done. Close the temp file and replace the old
+        * pgstat.stat with it.  The ferror() check replaces testing for error
+        * after each individual fputc or fwrite above.
+        */
+       fputc('E', fpout);
+
+       if (ferror(fpout))
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                          errmsg("could not write temporary statistics file 
\"%s\": %m",
+                                         db_tmpfile)));
+               FreeFile(fpout);
+               unlink(db_tmpfile);
+       }
+       else if (FreeFile(fpout) < 0)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                          errmsg("could not close temporary statistics file 
\"%s\": %m",
+                                         db_tmpfile)));
+               unlink(db_tmpfile);
+       }
+       else if (rename(db_tmpfile, db_statfile) < 0)
+       {
+               ereport(LOG,
+                               (errcode_for_file_access(),
+                                errmsg("could not rename temporary statistics 
file \"%s\" to \"%s\": %m",
+                                               db_tmpfile, db_statfile)));
+               unlink(db_tmpfile);
+       }
+       
+       if (permanent)
+       {
+               /* FIXME This aliases the existing db_statfile variable (might 
have different
+                * length). */
+               char db_statfile[strlen(pgstat_stat_filename) + 12];
+               snprintf(db_statfile, strlen(pgstat_stat_filename) + 12, 
"%s.%d",
+                                pgstat_stat_filename, dbentry->databaseid);
+               elog(DEBUG1, "removing stat file '%s'", db_statfile);
+               unlink(db_statfile);
+       }
+}
+
 /* ----------
  * pgstat_read_statsfile() -
  *
@@ -3595,14 +3742,10 @@ pgstat_write_statsfile(bool permanent)
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
        PgStat_StatDBEntry *dbentry;
        PgStat_StatDBEntry dbbuf;
-       PgStat_StatTabEntry *tabentry;
-       PgStat_StatTabEntry tabbuf;
-       PgStat_StatFuncEntry funcbuf;
-       PgStat_StatFuncEntry *funcentry;
        HASHCTL         hash_ctl;
        HTAB       *dbhash;
        HTAB       *tabhash = NULL;
@@ -3758,6 +3901,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                 */
                                tabhash = dbentry->tables;
                                funchash = dbentry->functions;
+
+                               /*
+                                * Read the data from the file for this 
database. If there was
+                                * onlydb specified (!= InvalidOid), we would 
not get here because
+                                * of a break above. So we don't need to 
recheck.
+                                */
+                               if (! onlydbs)
+                                       
pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+                                                                               
        permanent);
+
                                break;
 
                                /*
@@ -3768,6 +3921,105 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                funchash = NULL;
                                break;
 
+                       case 'E':
+                               goto done;
+
+                       default:
+                               ereport(pgStatRunningInCollector ? LOG : 
WARNING,
+                                               (errmsg("corrupted statistics 
file \"%s\"",
+                                                               statfile)));
+                               goto done;
+               }
+       }
+
+done:
+       FreeFile(fpin);
+
+       if (permanent)
+               unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+       return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *     Reads in an existing statistics collector db file and initializes the
+ *     tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool 
permanent)
+{
+       PgStat_StatTabEntry *tabentry;
+       PgStat_StatTabEntry tabbuf;
+       PgStat_StatFuncEntry funcbuf;
+       PgStat_StatFuncEntry *funcentry;
+       FILE       *fpin;
+       int32           format_id;
+       TimestampTz timestamp;
+       bool            found;
+       const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
+
+       /*
+        * OIDs are 32-bit values, so 10 chars should be safe, +2 for the dot 
and \0 byte
+        */
+       char db_statfile[strlen(statfile) + 12];
+
+       /*
+        * Append database OID at the end of the basic filename (both for tmp 
and target file).
+        */
+       snprintf(db_statfile, strlen(statfile) + 12, "%s.%d", statfile, 
databaseid);
+
+       /*
+        * Try to open the status file. If it doesn't exist, the backends simply
+        * return zero for anything and the collector simply starts from scratch
+        * with empty counters.
+        *
+        * ENOENT is a possibility if the stats collector is not running or has
+        * not yet written the stats file the first time.  Any other failure
+        * condition is suspicious.
+        */
+       if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
+       {
+               if (errno != ENOENT)
+                       ereport(pgStatRunningInCollector ? LOG : WARNING,
+                                       (errcode_for_file_access(),
+                                        errmsg("could not open statistics file 
\"%s\": %m",
+                                                       db_statfile)));
+               return;
+       }
+
+       /*
+        * Verify it's of the expected format.
+        */
+       if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+               || format_id != PGSTAT_FILE_FORMAT_ID)
+       {
+               ereport(pgStatRunningInCollector ? LOG : WARNING,
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
+               goto done;
+       }
+
+       /*
+        * Read global stats struct
+        */
+       if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+       {
+               ereport(pgStatRunningInCollector ? LOG : WARNING,
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
+               goto done;
+       }
+
+       /*
+        * We found an existing collector stats file. Read it and put all the
+        * hashtable entries into place.
+        */
+       for (;;)
+       {
+               switch (fgetc(fpin))
+               {
                                /*
                                 * 'T'  A PgStat_StatTabEntry follows.
                                 */
@@ -3777,7 +4029,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3795,7 +4047,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3811,7 +4063,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3829,7 +4081,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                                {
                                        ereport(pgStatRunningInCollector ? LOG 
: WARNING,
                                                        (errmsg("corrupted 
statistics file \"%s\"",
-                                                                       
statfile)));
+                                                                       
db_statfile)));
                                        goto done;
                                }
 
@@ -3845,7 +4097,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
                        default:
                                ereport(pgStatRunningInCollector ? LOG : 
WARNING,
                                                (errmsg("corrupted statistics 
file \"%s\"",
-                                                               statfile)));
+                                                               db_statfile)));
                                goto done;
                }
        }
@@ -3854,37 +4106,49 @@ done:
        FreeFile(fpin);
 
        if (permanent)
-               unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+       {
+               /* FIXME This aliases the existing db_statfile variable (might 
have different
+                * length). */
+               char db_statfile[strlen(PGSTAT_STAT_PERMANENT_FILENAME) + 12];
+               snprintf(db_statfile, strlen(PGSTAT_STAT_PERMANENT_FILENAME) + 
12, "%s.%d",
+                                PGSTAT_STAT_PERMANENT_FILENAME, databaseid);
+               elog(DEBUG1, "removing permanent stats file '%s'", db_statfile);
+               unlink(db_statfile);
+       }
 
-       return dbhash;
+       return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *     Attempt to fetch the timestamp of an existing stats file.
+ *     Attempt to fetch the timestamp of an existing stats file (for a DB).
  *     Returns TRUE if successful (timestamp is stored at *ts).
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz 
*ts)
 {
-       PgStat_GlobalStats myGlobalStats;
+       TimestampTz timestamp;
        FILE       *fpin;
        int32           format_id;
        const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : 
pgstat_stat_filename;
+       char db_statfile[strlen(statfile) + 12];
+
+       /* format the db statfile filename */
+       snprintf(db_statfile, strlen(statfile) + 12, "%s.%d", statfile, 
databaseid);
 
        /*
         * Try to open the status file.  As above, anything but ENOENT is worthy
         * of complaining about.
         */
-       if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+       if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
        {
                if (errno != ENOENT)
                        ereport(pgStatRunningInCollector ? LOG : WARNING,
                                        (errcode_for_file_access(),
                                         errmsg("could not open statistics file 
\"%s\": %m",
-                                                       statfile)));
+                                                       db_statfile)));
                return false;
        }
 
@@ -3895,7 +4159,7 @@ pgstat_read_statsfile_timestamp(bool permanent, 
TimestampTz *ts)
                || format_id != PGSTAT_FILE_FORMAT_ID)
        {
                ereport(pgStatRunningInCollector ? LOG : WARNING,
-                               (errmsg("corrupted statistics file \"%s\"", 
statfile)));
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
                FreeFile(fpin);
                return false;
        }
@@ -3903,15 +4167,15 @@ pgstat_read_statsfile_timestamp(bool permanent, 
TimestampTz *ts)
        /*
         * Read global stats struct
         */
-       if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != 
sizeof(myGlobalStats))
+       if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != 
sizeof(TimestampTz))
        {
                ereport(pgStatRunningInCollector ? LOG : WARNING,
-                               (errmsg("corrupted statistics file \"%s\"", 
statfile)));
+                               (errmsg("corrupted statistics file \"%s\"", 
db_statfile)));
                FreeFile(fpin);
                return false;
        }
 
-       *ts = myGlobalStats.stats_timestamp;
+       *ts = timestamp;
 
        FreeFile(fpin);
        return true;
@@ -3947,7 +4211,7 @@ backend_read_statsfile(void)
 
                CHECK_FOR_INTERRUPTS();
 
-               ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+               ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, 
&file_ts);
 
                cur_ts = GetCurrentTimestamp();
                /* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4270,7 @@ backend_read_statsfile(void)
                                pfree(mytime);
                        }
 
-                       pgstat_send_inquiry(cur_ts, min_ts);
+                       pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
                        break;
                }
 
@@ -4016,7 +4280,7 @@ backend_read_statsfile(void)
 
                /* Not there or too old, so kick the collector and wait a bit */
                if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-                       pgstat_send_inquiry(cur_ts, min_ts);
+                       pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
                pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
        }
@@ -4026,9 +4290,16 @@ backend_read_statsfile(void)
 
        /* Autovacuum launcher wants stats about all databases */
        if (IsAutoVacuumLauncherProcess())
-               pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+               /* 
+                * FIXME Does it really need info including tables/functions? 
Or is it enough to read
+                * database-level stats? It seems to me the launcher needs 
PgStat_StatDBEntry only
+                * (at least that's how I understand the 
rebuild_database_list() in autovacuum.c),
+                * because pgstat_stattabentries are used in do_autovacuum() 
only, that that's what's
+                * executed in workers ... So maybe we'd be just fine by 
reading in the dbentries?
+                */
+               pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
        else
-               pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+               pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, 
false);
 }
 
 
@@ -4084,13 +4355,53 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
+       int i = 0;
+       bool found = false;
+
+       elog(WARNING, "received inquiry for %d", msg->databaseid);
+
        /*
-        * Advance last_statrequest if this requestor has a newer cutoff time
-        * than any previous request.
+        * Find the last write request for this DB (found=true in that case). 
Plain
+        * linear search, not really worth doing any magic here (probably).
         */
-       if (msg->cutoff_time > last_statrequest)
-               last_statrequest = msg->cutoff_time;
+       for (i = 0; i < num_statrequests; i++)
+       {
+               if (last_statrequests[i].databaseid == msg->databaseid)
+               {
+                       found = true;
+                       break;
+               }
+       }
+       
+       if (found)
+       {
+               /*
+                * There already is a request for this DB, so lets advance the
+                * request time  if this requestor has a newer cutoff time
+                * than any previous request.
+                */
+               if (msg->cutoff_time > last_statrequests[i].request_time)
+                       last_statrequests[i].request_time = msg->cutoff_time;
+       }
+       else
+       {
+               /*
+                * There's no request for this DB yet, so lets create it 
(allocate a
+                * space for it, set the values).
+                */
+               if (last_statrequests == NULL)
+                       last_statrequests = palloc(sizeof(DBWriteRequest));
+               else
+                       last_statrequests = repalloc(last_statrequests,
+                                                               
(num_statrequests + 1)*sizeof(DBWriteRequest));
+               
+               last_statrequests[num_statrequests].databaseid = 
msg->databaseid;
+               last_statrequests[num_statrequests].request_time = 
msg->clock_time;
+               num_statrequests += 1;
+       }
 
+       /* FIXME Do we need to update this to work with per-db stats? This 
should
+        * be moved to the "else" branch I guess. */
        /*
         * If the requestor's local clock time is older than last_statwrite, we
         * should suspect a clock glitch, ie system time going backwards; though
@@ -4099,31 +4410,31 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
         * retreat in the system clock reading could otherwise cause us to 
neglect
         * to update the stats file for a long time.
         */
-       if (msg->clock_time < last_statwrite)
-       {
-               TimestampTz cur_ts = GetCurrentTimestamp();
-
-               if (cur_ts < last_statwrite)
-               {
-                       /*
-                        * Sure enough, time went backwards.  Force a new stats 
file write
-                        * to get back in sync; but first, log a complaint.
-                        */
-                       char       *writetime;
-                       char       *mytime;
-
-                       /* Copy because timestamptz_to_str returns a static 
buffer */
-                       writetime = pstrdup(timestamptz_to_str(last_statwrite));
-                       mytime = pstrdup(timestamptz_to_str(cur_ts));
-                       elog(LOG, "last_statwrite %s is later than collector's 
time %s",
-                                writetime, mytime);
-                       pfree(writetime);
-                       pfree(mytime);
-
-                       last_statrequest = cur_ts;
-                       last_statwrite = last_statrequest - 1;
-               }
-       }
+//     if (msg->clock_time < last_statwrite)
+//     {
+//             TimestampTz cur_ts = GetCurrentTimestamp();
+// 
+//             if (cur_ts < last_statwrite)
+//             {
+//                     /*
+//                      * Sure enough, time went backwards.  Force a new stats 
file write
+//                      * to get back in sync; but first, log a complaint.
+//                      */
+//                     char       *writetime;
+//                     char       *mytime;
+// 
+//                     /* Copy because timestamptz_to_str returns a static 
buffer */
+//                     writetime = pstrdup(timestamptz_to_str(last_statwrite));
+//                     mytime = pstrdup(timestamptz_to_str(cur_ts));
+//                     elog(LOG, "last_statwrite %s is later than collector's 
time %s",
+//                              writetime, mytime);
+//                     pfree(writetime);
+//                     pfree(mytime);
+// 
+//                     last_statrequest = cur_ts;
+//                     last_statwrite = last_statrequest - 1;
+//             }
+//     }
 }
 
 
@@ -4687,3 +4998,54 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
                                                   HASH_REMOVE, NULL);
        }
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *     Checks whether there's a db stats request, requiring a file write.
+ * ----------
+ */
+
+static bool pgstat_write_statsfile_needed()
+{
+       int i = 0;
+       PgStat_StatDBEntry *dbentry;
+       
+       /* Check the databases if they need to refresh the stats. */
+       for (i = 0; i < num_statrequests; i++)
+       {
+               dbentry = pgstat_get_db_entry(last_statrequests[i].databaseid, 
false);
+               
+               /* No dbentry yet or too old. */
+               if ((! dbentry) ||
+                       (dbentry->stats_timestamp < 
last_statrequests[i].request_time)) {
+                       return true;
+               }
+               
+       }
+       
+       /* Well, everything was written recently ... */
+       return false;
+}
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *     Checks whether stats for a particular DB need to be written to a file).
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+       int i = 0;
+       
+       /* Check the databases if they need to refresh the stats. */
+       for (i = 0; i < num_statrequests; i++)
+       {
+               if (last_statrequests[i].databaseid == databaseid)
+                       return true;
+       }
+       
+       return false;
+}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..bdb1bbc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
        PgStat_MsgHdr m_hdr;
        TimestampTz clock_time;         /* observed local clock time */
        TimestampTz cutoff_time;        /* minimum acceptable file timestamp */
+       Oid                     databaseid;             /* requested DB 
(InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
        PgStat_Counter n_block_write_time;
 
        TimestampTz stat_reset_timestamp;
+       TimestampTz stats_timestamp;            /* time of db stats file update 
*/
 
        /*
         * tables and functions must be last in the struct, because we don't 
write
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to