One of the most useful bits of feedback on how well checkpoint I/O is going is the amount of time taken to sync files to disk. Right now the only way to get that is to parse the logs. The attached patch publishes the most useful three bits of data you could only get from log_checkpoints before out to pg_stat_bgwriter. Easiest to just show an example:

$ createdb pgbench
$ psql -c "select pg_stat_reset_shared('bgwriter')"
$ pgbench -i -s 10 pgbench
$ psql -c "checkpoint"

$ tail $PGLOG
LOG: checkpoint complete: wrote 590 buffers (14.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.117 s, total=1.746 s; sync files=31, longest=0.104 s, average=0.003 s

$ psql -c "select write_ms,sync_ms,sync_files from pg_stat_bgwriter"
 write_ms | sync_ms | sync_files
----------+---------+------------
        2 |     117 |         31

Like the other checkpoint statistics, these are updated in one lump when the checkpoint finishes.

This includes rough documentation and the code isn't very complicated; a cat version bump is required. Some open concerns I still have about this:

-The total amount of time in the sync phase is not the same as the total of all sync wait times. I have published the total elapsed sync time. The total of the times spent waiting on individual sync calls is also available at the point when the above data is collected, and it could be published as well. I don't think that's necessary yet, since they are almost identical right now. But any feature that staggers sync calls around might make that distinction important.

-The total checkpoint time is larger than the total of write+sync. You can see that here: 2ms write + 117ms sync = 119ms; total checkpoint time is actually 1746ms though. While I don't normally find the total useful from a tuning perspective, it may be sensible to publish it just to further reduce any perceived need for log scraping here.

-None of the debug1 level information about individual sync calls can be published this way usefully. I'd like that to come out somewhere eventually, but really what's needed there is a performance event logging history collector, not pg_stats. I'm content that this patch gets the big picture data out; it's sufficient to let you rule out checkpoint problems if they aren't happening. If you only have to drill into the logs when long times start showing up in this data, that's not so bad to live with.

-I can't tell for sure if this is working properly when log_checkpoints is off. This now collects checkpoint end time data in all cases, whereas before it ignored that work if log_checkpoints was off. I think I will have to fake the logging in this area, make it log some things even when log_checkpoints is off, to confirm it gives the right results. I haven't done that yet.

I would like to be able to tell people they need never turn on log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets a good precedent for what direction the database is going. It would be nice for pg_stat_bgwriter's format to settle down for a long period too. In addition to my small list of concerns here, there seem to be an increasing number of write path feature patches floating around here lately. That all works against this being stable. The main consumer of this data I use regularly is the pg_stat_bgwriter graphs that Munin produces, and my expectation is that myself and/or Magnus will make sure it's compatible with any changes made here for 9.2.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a12a9a2..674091c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1182,6 +1182,34 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
      </row>
 
      <row>
+      <entry><literal><function>pg_stat_get_bgwriter_write_ms()</function></literal></entry>
+      <entry><type>bigint</type></entry>
+      <entry>
+       Total amount of time that has been spent in the part of checkpoint
+       processing where files are written to disk, in milliseconds.
+      </entry>
+     </row>
+
+
+     <row>
+      <entry><literal><function>pg_stat_get_bgwriter_sync_ms()</function></literal></entry>
+      <entry><type>bigint</type></entry>
+      <entry>
+       Total amount of time that has been spent in the part of checkpoint
+       processing where files are synchronized to disk, in milliseconds.
+      </entry>
+     </row>
+
+     <row>
+      <entry><literal><function>pg_stat_get_bgwriter_sync_files()</function></literal></entry>
+      <entry><type>bigint</type></entry>
+      <entry>
+       Total number of files that have been synchronized to disk during
+       checkpoint processing.
+      </entry>
+     </row>
+
+     <row>
       <entry><literal><function>pg_stat_get_wal_senders()</function></literal></entry>
       <entry><type>setof record</type></entry>
       <entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..4ff7b54 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7413,10 +7413,10 @@ LogCheckpointStart(int flags, bool restartpoint)
 }
 
 /*
- * Log end of a checkpoint.
+ * Time and potentially log the end of a checkpoint.
  */
 static void
-LogCheckpointEnd(bool restartpoint)
+TimeCheckpointEnd(bool restartpoint)
 {
 	long		write_secs,
 				sync_secs,
@@ -7444,6 +7444,16 @@ LogCheckpointEnd(bool restartpoint)
 						CheckpointStats.ckpt_sync_end_t,
 						&sync_secs, &sync_usecs);
 
+	BgWriterStats.m_write_ms=(uint64) write_secs * 1000 + write_usecs / 1000;
+	BgWriterStats.m_sync_ms=(uint64) sync_secs * 1000 + sync_usecs / 1000;
+	BgWriterStats.m_sync_files=CheckpointStats.ckpt_sync_rels;
+
+	/*
+	 * All of the published timing statistics are accounted for.  Only
+	 * continue if a log message is being written.
+	 */
+	if (!log_checkpoints) return;
+
 	/*
 	 * Timing values returned from CheckpointStats are in microseconds.
 	 * Convert to the second plus microsecond form that TimestampDifference
@@ -7892,8 +7902,7 @@ CreateCheckPoint(int flags)
 		TruncateSUBTRANS(GetOldestXmin(true, false));
 
 	/* All real work is done, but log before releasing lock. */
-	if (log_checkpoints)
-		LogCheckpointEnd(false);
+	TimeCheckpointEnd(false);
 
 	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
 									 NBuffers,
@@ -8158,8 +8167,7 @@ CreateRestartPoint(int flags)
 		TruncateSUBTRANS(GetOldestXmin(true, false));
 
 	/* All real work is done, but log before releasing lock. */
-	if (log_checkpoints)
-		LogCheckpointEnd(true);
+	TimeCheckpointEnd(true);
 
 	xtime = GetLatestXTime();
 	ereport((log_checkpoints ? LOG : DEBUG2),
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 50ba20c..280f563 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -622,7 +622,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_written_backend() AS buffers_backend,
         pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
         pg_stat_get_buf_alloc() AS buffers_alloc,
-        pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+        pg_stat_get_bgwriter_stat_reset_time() AS stats_reset,
+        pg_stat_get_bgwriter_write_ms() AS write_ms,
+        pg_stat_get_bgwriter_sync_ms() AS sync_ms,
+        pg_stat_get_bgwriter_sync_files() AS sync_files
+        ;
 
 CREATE VIEW pg_user_mappings AS
     SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 323d42b..29c7fb4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4360,6 +4360,9 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_written_backend += msg->m_buf_written_backend;
 	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
 	globalStats.buf_alloc += msg->m_buf_alloc;
+	globalStats.write_ms += msg->m_write_ms;
+	globalStats.sync_ms += msg->m_sync_ms;
+	globalStats.sync_files += msg->m_sync_files;
 }
 
 /* ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b4986d8..69d16d0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -86,6 +86,9 @@ extern Datum pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);
@@ -1305,6 +1308,22 @@ pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS)
 }
 
 Datum
+pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_global()->write_ms);
+}
+Datum
+pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_global()->sync_ms);
+}
+Datum
+pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_global()->sync_files);
+}
+
+Datum
 pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stat_reset_timestamp);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1ddf4bf..7f8de4d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -15,6 +15,7 @@
 #include "access/xlogdefs.h"
 #include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
+#include "pgstat.h"
 #include "storage/buf.h"
 #include "storage/latch.h"
 #include "utils/pg_crc.h"
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 355c61a..22c2392 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2645,13 +2645,18 @@ DATA(insert OID = 2773 ( pg_stat_get_bgwriter_maxwritten_clean PGNSP PGUID 12 1
 DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning");
 DATA(insert OID = 3075 ( pg_stat_get_bgwriter_stat_reset_time PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_	pg_stat_get_bgwriter_stat_reset_time _null_ _null_ _null_ ));
 DESCR("statistics: last reset for the bgwriter");
+DATA(insert OID = 3144 ( pg_stat_get_bgwriter_write_ms PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_write_ms _null_ _null_ _null_ ));
+DESCR("statistics: total amount of time the bgwriter has spent writing buffers to disk");
+DATA(insert OID = 3145 ( pg_stat_get_bgwriter_sync_ms PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_ms _null_ _null_ _null_ ));
+DESCR("statistics: total amount of time the bgwriter has spent syncing buffers to disk");
+DATA(insert OID = 3146 ( pg_stat_get_bgwriter_sync_files PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_files _null_ _null_ _null_ ));
+DESCR("statistics: total number of files the bgwriter has synced to disk");
 DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ ));
 DESCR("statistics: number of buffers written by backends");
 DATA(insert OID = 3063 ( pg_stat_get_buf_fsync_backend PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_fsync_backend _null_ _null_ _null_ ));
 DESCR("statistics: number of backend buffer writes that did their own fsync");
 DATA(insert OID = 2859 ( pg_stat_get_buf_alloc			PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_alloc _null_ _null_ _null_ ));
 DESCR("statistics: number of buffer allocations");
-
 DATA(insert OID = 2978 (  pg_stat_get_function_calls		PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_calls _null_ _null_ _null_ ));
 DESCR("statistics: number of function calls");
 DATA(insert OID = 2979 (  pg_stat_get_function_time			PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_time _null_ _null_ _null_ ));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b8c6d82..311a95a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -362,6 +362,9 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_buf_written_backend;
 	PgStat_Counter m_buf_fsync_backend;
 	PgStat_Counter m_buf_alloc;
+	PgStat_Counter m_write_ms;
+	PgStat_Counter m_sync_ms;
+	PgStat_Counter m_sync_files;
 } PgStat_MsgBgWriter;
 
 /* ----------
@@ -584,6 +587,9 @@ typedef struct PgStat_GlobalStats
 	PgStat_Counter buf_written_backend;
 	PgStat_Counter buf_fsync_backend;
 	PgStat_Counter buf_alloc;
+	PgStat_Counter write_ms;
+	PgStat_Counter sync_ms;
+	PgStat_Counter sync_files;
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 454e1f9..d4bf9ce 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1295,7 +1295,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
  pg_stat_activity                | SELECT s.datid, d.datname, s.procpid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.waiting, s.current_query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
  pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
  pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
- pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+ pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset, pg_stat_get_bgwriter_write_ms() AS write_ms, pg_stat_get_bgwriter_sync_ms() AS sync_ms, pg_stat_get_bgwriter_sync_files() AS sync_files;
  pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
  pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
  pg_stat_replication             | SELECT s.procpid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(procpid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.procpid = w.procpid));
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to