One of the most useful bits of feedback on how well checkpoint I/O is
going is the amount of time taken to sync files to disk. Right now the
only way to get that is to parse the logs. The attached patch publishes
the most useful three bits of data you could only get from
log_checkpoints before out to pg_stat_bgwriter. Easiest to just show an
example:
$ createdb pgbench
$ psql -c "select pg_stat_reset_shared('bgwriter')"
$ pgbench -i -s 10 pgbench
$ psql -c "checkpoint"
$ tail $PGLOG
LOG: checkpoint complete: wrote 590 buffers (14.4%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.117 s,
total=1.746 s; sync files=31, longest=0.104 s, average=0.003 s
$ psql -c "select write_ms,sync_ms,sync_files from pg_stat_bgwriter"
write_ms | sync_ms | sync_files
----------+---------+------------
2 | 117 | 31
Like the other checkpoint statistics, these are updated in one lump when
the checkpoint finishes.
This includes rough documentation and the code isn't very complicated; a
cat version bump is required. Some open concerns I still have about this:
-The total amount of time in the sync phase is not the same as the total
of all sync wait times. I have published the total elapsed sync time.
The total of the times spent waiting on individual sync calls is also
available at the point when the above data is collected, and it could be
published as well. I don't think that's necessary yet, since they are
almost identical right now. But any feature that staggers sync calls
around might make that distinction important.
-The total checkpoint time is larger than the total of write+sync. You
can see that here: 2ms write + 117ms sync = 119ms; total checkpoint
time is actually 1746ms though. While I don't normally find the total
useful from a tuning perspective, it may be sensible to publish it just
to further reduce any perceived need for log scraping here.
-None of the debug1 level information about individual sync calls can be
published this way usefully. I'd like that to come out somewhere
eventually, but really what's needed there is a performance event
logging history collector, not pg_stats. I'm content that this patch
gets the big picture data out; it's sufficient to let you rule out
checkpoint problems if they aren't happening. If you only have to drill
into the logs when long times start showing up in this data, that's not
so bad to live with.
-I can't tell for sure if this is working properly when log_checkpoints
is off. This now collects checkpoint end time data in all cases,
whereas before it ignored that work if log_checkpoints was off. I think
I will have to fake the logging in this area, make it log some things
even when log_checkpoints is off, to confirm it gives the right
results. I haven't done that yet.
I would like to be able to tell people they need never turn on
log_checkpoints if they monitor pg_stat_bgwriter instead, because that
sets a good precedent for what direction the database is going. It
would be nice for pg_stat_bgwriter's format to settle down for a long
period too. In addition to my small list of concerns here, there seem
to be an increasing number of write path feature patches floating around
here lately. That all works against this being stable. The main
consumer of this data I use regularly is the pg_stat_bgwriter graphs
that Munin produces, and my expectation is that myself and/or Magnus
will make sure it's compatible with any changes made here for 9.2.
--
Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a12a9a2..674091c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1182,6 +1182,34 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
</row>
<row>
+ <entry><literal><function>pg_stat_get_bgwriter_write_ms()</function></literal></entry>
+ <entry><type>bigint</type></entry>
+ <entry>
+ Total amount of time that has been spent in the part of checkpoint
+ processing where files are written to disk, in milliseconds.
+ </entry>
+ </row>
+
+
+ <row>
+ <entry><literal><function>pg_stat_get_bgwriter_sync_ms()</function></literal></entry>
+ <entry><type>bigint</type></entry>
+ <entry>
+ Total amount of time that has been spent in the part of checkpoint
+ processing where files are synchronized to disk, in milliseconds.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal><function>pg_stat_get_bgwriter_sync_files()</function></literal></entry>
+ <entry><type>bigint</type></entry>
+ <entry>
+ Total number of files that have been synchronized to disk during
+ checkpoint processing.
+ </entry>
+ </row>
+
+ <row>
<entry><literal><function>pg_stat_get_wal_senders()</function></literal></entry>
<entry><type>setof record</type></entry>
<entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..4ff7b54 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7413,10 +7413,10 @@ LogCheckpointStart(int flags, bool restartpoint)
}
/*
- * Log end of a checkpoint.
+ * Time and potentially log the end of a checkpoint.
*/
static void
-LogCheckpointEnd(bool restartpoint)
+TimeCheckpointEnd(bool restartpoint)
{
long write_secs,
sync_secs,
@@ -7444,6 +7444,16 @@ LogCheckpointEnd(bool restartpoint)
CheckpointStats.ckpt_sync_end_t,
&sync_secs, &sync_usecs);
+ BgWriterStats.m_write_ms=(uint64) write_secs * 1000 + write_usecs / 1000;
+ BgWriterStats.m_sync_ms=(uint64) sync_secs * 1000 + sync_usecs / 1000;
+ BgWriterStats.m_sync_files=CheckpointStats.ckpt_sync_rels;
+
+ /*
+ * All of the published timing statistics are accounted for. Only
+ * continue if a log message is being written.
+ */
+ if (!log_checkpoints) return;
+
/*
* Timing values returned from CheckpointStats are in microseconds.
* Convert to the second plus microsecond form that TimestampDifference
@@ -7892,8 +7902,7 @@ CreateCheckPoint(int flags)
TruncateSUBTRANS(GetOldestXmin(true, false));
/* All real work is done, but log before releasing lock. */
- if (log_checkpoints)
- LogCheckpointEnd(false);
+ TimeCheckpointEnd(false);
TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
NBuffers,
@@ -8158,8 +8167,7 @@ CreateRestartPoint(int flags)
TruncateSUBTRANS(GetOldestXmin(true, false));
/* All real work is done, but log before releasing lock. */
- if (log_checkpoints)
- LogCheckpointEnd(true);
+ TimeCheckpointEnd(true);
xtime = GetLatestXTime();
ereport((log_checkpoints ? LOG : DEBUG2),
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 50ba20c..280f563 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -622,7 +622,11 @@ CREATE VIEW pg_stat_bgwriter AS
pg_stat_get_buf_written_backend() AS buffers_backend,
pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
pg_stat_get_buf_alloc() AS buffers_alloc,
- pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+ pg_stat_get_bgwriter_stat_reset_time() AS stats_reset,
+ pg_stat_get_bgwriter_write_ms() AS write_ms,
+ pg_stat_get_bgwriter_sync_ms() AS sync_ms,
+ pg_stat_get_bgwriter_sync_files() AS sync_files
+ ;
CREATE VIEW pg_user_mappings AS
SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 323d42b..29c7fb4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4360,6 +4360,9 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
globalStats.buf_written_backend += msg->m_buf_written_backend;
globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
globalStats.buf_alloc += msg->m_buf_alloc;
+ globalStats.write_ms += msg->m_write_ms;
+ globalStats.sync_ms += msg->m_sync_ms;
+ globalStats.sync_files += msg->m_sync_files;
}
/* ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b4986d8..69d16d0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -86,6 +86,9 @@ extern Datum pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS);
extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
extern Datum pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS);
extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
extern Datum pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS);
extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);
@@ -1305,6 +1308,22 @@ pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS)
}
Datum
+pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_global()->write_ms);
+}
+Datum
+pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_global()->sync_ms);
+}
+Datum
+pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_global()->sync_files);
+}
+
+Datum
pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
{
PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stat_reset_timestamp);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1ddf4bf..7f8de4d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -15,6 +15,7 @@
#include "access/xlogdefs.h"
#include "datatype/timestamp.h"
#include "lib/stringinfo.h"
+#include "pgstat.h"
#include "storage/buf.h"
#include "storage/latch.h"
#include "utils/pg_crc.h"
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 355c61a..22c2392 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2645,13 +2645,18 @@ DATA(insert OID = 2773 ( pg_stat_get_bgwriter_maxwritten_clean PGNSP PGUID 12 1
DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning");
DATA(insert OID = 3075 ( pg_stat_get_bgwriter_stat_reset_time PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_stat_reset_time _null_ _null_ _null_ ));
DESCR("statistics: last reset for the bgwriter");
+DATA(insert OID = 3144 ( pg_stat_get_bgwriter_write_ms PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_write_ms _null_ _null_ _null_ ));
+DESCR("statistics: total amount of time the bgwriter has spent writing buffers to disk");
+DATA(insert OID = 3145 ( pg_stat_get_bgwriter_sync_ms PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_ms _null_ _null_ _null_ ));
+DESCR("statistics: total amount of time the bgwriter has spent syncing buffers to disk");
+DATA(insert OID = 3146 ( pg_stat_get_bgwriter_sync_files PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_files _null_ _null_ _null_ ));
+DESCR("statistics: total number of files the bgwriter has synced to disk");
DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ ));
DESCR("statistics: number of buffers written by backends");
DATA(insert OID = 3063 ( pg_stat_get_buf_fsync_backend PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_fsync_backend _null_ _null_ _null_ ));
DESCR("statistics: number of backend buffer writes that did their own fsync");
DATA(insert OID = 2859 ( pg_stat_get_buf_alloc PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_alloc _null_ _null_ _null_ ));
DESCR("statistics: number of buffer allocations");
-
DATA(insert OID = 2978 ( pg_stat_get_function_calls PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_calls _null_ _null_ _null_ ));
DESCR("statistics: number of function calls");
DATA(insert OID = 2979 ( pg_stat_get_function_time PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_time _null_ _null_ _null_ ));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b8c6d82..311a95a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -362,6 +362,9 @@ typedef struct PgStat_MsgBgWriter
PgStat_Counter m_buf_written_backend;
PgStat_Counter m_buf_fsync_backend;
PgStat_Counter m_buf_alloc;
+ PgStat_Counter m_write_ms;
+ PgStat_Counter m_sync_ms;
+ PgStat_Counter m_sync_files;
} PgStat_MsgBgWriter;
/* ----------
@@ -584,6 +587,9 @@ typedef struct PgStat_GlobalStats
PgStat_Counter buf_written_backend;
PgStat_Counter buf_fsync_backend;
PgStat_Counter buf_alloc;
+ PgStat_Counter write_ms;
+ PgStat_Counter sync_ms;
+ PgStat_Counter sync_files;
TimestampTz stat_reset_timestamp;
} PgStat_GlobalStats;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 454e1f9..d4bf9ce 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1295,7 +1295,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
pg_stat_activity | SELECT s.datid, d.datname, s.procpid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.waiting, s.current_query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
pg_stat_all_indexes | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
pg_stat_all_tables | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
- pg_stat_bgwriter | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+ pg_stat_bgwriter | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset, pg_stat_get_bgwriter_write_ms() AS write_ms, pg_stat_get_bgwriter_sync_ms() AS sync_ms, pg_stat_get_bgwriter_sync_files() AS sync_files;
pg_stat_database | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
pg_stat_database_conflicts | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
pg_stat_replication | SELECT s.procpid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(procpid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.procpid = w.procpid));
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers