On 2016/03/15 3:41, Robert Haas wrote:
> On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote <amitlangot...@gmail.com> wrote:
>> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
>> which AMs should call for every block that's read (say, right before a
>> call to ReadBufferExtended) as part of a given vacuum run. The
>> callback with help of some bookkeeping state can count each block and
>> report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
>> blocks for every vacuum or if it's possible that some blocks are read
>> more than once in single vacuum, etc.  IOW, some AM's processing may
>> be non-linear and counting blocks 1..N (where N is reported total
>> index blocks) may not be possible.  However, this is the best I could
>> think of as doing what we are trying to do here. Maybe index AM
>> experts can chime in on that.
>>
>> Thoughts?
> 
> Well, I think you need to study the index AMs and figure this out.

OK.  I tried to put calls to the callback in appropriate places, but
couldn't get the resulting progress numbers to look sane.  So I ended up
concluding that any attempt to do so is futile unless I analyze each AM's
vacuum code carefully to be able to determine in advance the max bound on
the count of blocks that the callback will report.  Anyway, as you
suggest, we can improve it later.

> But I think for starters you should write a patch that reports the following:
> 
> 1. phase
> 2. number of heap blocks scanned
> 3. number of heap blocks vacuumed
> 4. number of completed index vac cycles
> 5. number of dead tuples collected since the last index vac cycle
> 6. number of dead tuples that we can store before needing to perform
> an index vac cycle
> 
> All of that should be pretty straightforward, and then we'd have
> something we can ship.  We can add the detailed index reporting later,
> when we get to it, perhaps for 9.7.

OK, I agree with this plan.  Attached updated patch implements this.

Thanks,
Amit
>From 0f830273cb2afb528b8b9ed2dcbaf136d4c3e64f Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Tue, 15 Mar 2016 10:14:33 +0900
Subject: [PATCH] WIP: Implement progress reporting for VACUUM command.

This basically utilizes the pgstat_progress* API to report a handful of
paramters to indicate its progress.  lazy_vacuum_rel() has already been
taught in b6fb6471 to report command start and end so that it's listed
in pg_stat_get_progress_info('VACUUM'). This commit makes lazy_scan_heap
to report following parameters: vacuum phase, total number of heap blocks,
number of heap blocks vacuumed, dead tuples found since last index vacuum
cycle, dead tuple slots left to fill until the next index vacuum run.
Following phases are identified and reported whenever one changes to
another: 'scanning heap', 'vacuuming indexes', 'vacuuming heap', and
finally 'cleanup'.

A view named pg_stat_progress_vacuum has been added with the following
columns: pid (int), datid (oid), datname (name), relid (oid),
vacuum_phase (text), heap_blks_total, heap_blks_vacuumed,
index_vacuum_count, dead_tup_found, dead_tup_slots (all bigint),
percent_done (numeric).
---
 doc/src/sgml/monitoring.sgml         |  114 ++++++++++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql |   25 ++++++++
 src/backend/commands/vacuumlazy.c    |   68 ++++++++++++++++++++
 src/test/regress/expected/rules.out  |   22 +++++++
 4 files changed, 229 insertions(+), 0 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ec5328e..e5ffa10 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -507,6 +507,12 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       yet included in <structname>pg_stat_user_functions</>).</entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_vacuum</><indexterm><primary>pg_stat_progress_vacuum</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+      <command>VACUUM</>, showing current progress in terms of heap pages it
+      has finished processing.</entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
@@ -2204,6 +2210,114 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    controls exactly which functions are tracked.
   </para>
 
+  <table id="pg-stat-progress-vacuum" xreflabel="pg_stat_progress_vacuum">
+   <title><structname>pg_stat_progress_vacuum</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</></entry>
+     <entry><type>integer</></entry>
+     <entry>Process ID of backend</entry>
+    </row>
+    <row>
+     <entry><structfield>datid</></entry>
+     <entry><type>oid</></entry>
+     <entry>OID of the database this backend is connected to</entry>
+    </row>
+    <row>
+     <entry><structfield>datname</></entry>
+     <entry><type>name</></entry>
+     <entry>Name of the database this backend is connected to</entry>
+    </row>
+    <row>
+     <entry><structfield>relid</></entry>
+     <entry><type>oid</></entry>
+     <entry>OID of the table being vacuumed</entry>
+    </row>
+    <row>
+     <entry><structfield>vacuum_phase</></entry>
+     <entry><type>text</></entry>
+     <entry>Current processing phase of vacuum.
+       Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>scanning heap</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>vacuuming indexes</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>vacuuming heap</>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>cleanup</>
+         </para>
+        </listitem>
+       </itemizedlist>
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>heap_blks_total</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Total number of heap blocks in the table</entry>
+    </row>
+    <row>
+     <entry><structfield>heap_blks_vacuumed</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of heap blocks vacuumed</entry>
+    </row>
+    <row>
+     <entry><structfield>index_vacuum_count</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of completed index vacuums cycles</entry>
+    </row>
+    <row>
+     <entry><structfield>dead_tup_found</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of dead tuples collected since the last index vacuum cycle</entry>
+    </row>
+    <row>
+     <entry><structfield>dead_tup_slots</></entry>
+     <entry><type>bigint</></entry>
+     <entry>
+      Number of dead tuples that we can store before needing to perform
+      an index vacuum cycle
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>percent_done</></entry>
+     <entry><type>numeric</></entry>
+     <entry>
+      Amount of work finished in percent in terms of table blocks vacuumed
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_progress_vacuum</structname> view will contain
+   one row for each backend (including autovacuum worker processes), showing
+   parameters that can help determine the progress of <command>VACUUM</command>
+   command running in it. Note that the backends running
+   <command>VACUUM FULL</command> are not shown.
+  </para>
+
  </sect2>
 
  <sect2 id="monitoring-stats-functions">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 84aa061..28ffc37 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -972,3 +972,28 @@ RETURNS jsonb
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'jsonb_set';
+
+CREATE VIEW pg_stat_progress_vacuum AS
+    SELECT
+            S.pid AS pid,
+            S.datid AS datid,
+            D.datname AS datname,
+            S.relid AS relid,
+            CASE S.param1
+                WHEN 1 THEN 'scanning heap'
+                WHEN 2 THEN 'vacuuming indexes'
+                WHEN 3 THEN 'vacuuming heap'
+                WHEN 4 THEN 'cleanup'
+                ELSE 'unknown phase'
+            END AS vacuum_phase,
+            S.param2 AS heap_blks_total,
+            S.param3 AS heap_blks_vacuumed,
+            S.param4 AS index_vacuum_count,
+            S.param5 AS dead_tup_found,
+            S.param6 AS dead_tup_slots,
+            CASE S.param2
+                WHEN 0 THEN round(100.0, 2)
+			    ELSE round((S.param3 + 1) * 100.0 / S.param2, 2)
+            END AS percent_done
+    FROM pg_stat_get_progress_info('VACUUM') AS S
+         JOIN pg_database D ON S.datid = D.oid;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index fe87243..be30861 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -97,6 +97,29 @@
  */
 #define SKIP_PAGES_THRESHOLD	((BlockNumber) 32)
 
+/*
+ * Progress parameters of (lazy) vacuum reported to pgstat progress tracking
+ * facility
+ */
+#define PROG_PARAM_VAC_PHASE					0
+#define PROG_PARAM_VAC_TOTAL_HEAP_BLKS			1
+#define PROG_PARAM_VAC_HEAP_BLKS_VACUUMED		2
+#define PROG_PARAM_VAC_NUM_INDEX_VACUUMS		3
+#define PROG_PARAM_VAC_NUM_DEAD_TUP				4
+#define PROG_PARAM_VAC_NUM_DEAD_TUP_SLOTS		5
+
+/*
+ * Following distinct phases of lazy vacuum are identified.  #1, #2 and #3
+ * run in a cyclical manner due to possibly limited memory to work with,
+ * whereby #1 is periodically interrupted to run #2, followed by #3, and
+ * back to #1.  Cycle repeats until all blocks of the relation have been
+ * covered by #1.
+ */
+#define LV_PHASE_SCAN_HEAP			1
+#define LV_PHASE_VACUUM_INDEX		2
+#define LV_PHASE_VACUUM_HEAP		3
+#define LV_PHASE_CLEANUP			4
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -481,6 +504,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	lazy_space_alloc(vacrelstats, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* About to begin the main heap scan loop */
+	pgstat_progress_update_param(PROG_PARAM_VAC_PHASE, LV_PHASE_SCAN_HEAP);
+
+	/* total_heap_blocks */
+	pgstat_progress_update_param(PROG_PARAM_VAC_TOTAL_HEAP_BLKS, nblocks);
+
 	/*
 	 * Except when aggressive is set, we want to skip pages that are
 	 * all-visible according to the visibility map, but only when we can skip
@@ -572,6 +601,9 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
 
+		/* heap_blocks_vacuumed */
+		pgstat_progress_update_param(PROG_PARAM_VAC_HEAP_BLKS_VACUUMED, blkno);
+
 		if (blkno == next_unskippable_block)
 		{
 			/* Time to advance next_unskippable_block */
@@ -668,11 +700,20 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			vacuum_log_cleanup_info(onerel, vacrelstats);
 
 			/* Remove index entries */
+			pgstat_progress_update_param(PROG_PARAM_VAC_PHASE,
+										 LV_PHASE_VACUUM_INDEX);
 			for (i = 0; i < nindexes; i++)
 				lazy_vacuum_index(Irel[i],
 								  &indstats[i],
 								  vacrelstats);
+
+			/* index_vacuum_count */
+			pgstat_progress_update_param(PROG_PARAM_VAC_NUM_INDEX_VACUUMS,
+										 vacrelstats->num_index_scans + 1);
+
 			/* Remove tuples from heap */
+			pgstat_progress_update_param(PROG_PARAM_VAC_PHASE,
+										 LV_PHASE_VACUUM_HEAP);
 			lazy_vacuum_heap(onerel, vacrelstats);
 
 			/*
@@ -682,6 +723,10 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 			vacrelstats->num_index_scans++;
+
+			/* Report going back to scanning heap */
+			pgstat_progress_update_param(PROG_PARAM_VAC_PHASE,
+										 LV_PHASE_SCAN_HEAP);
 		}
 
 		/*
@@ -1180,8 +1225,21 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		 */
 		if (vacrelstats->num_dead_tuples == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+
+		/* dead_tup_found and dead_tup_slots, if changed */
+		if (vacrelstats->num_dead_tuples != prev_dead_count)
+		{
+			pgstat_progress_update_param(PROG_PARAM_VAC_NUM_DEAD_TUP,
+										 vacrelstats->num_dead_tuples);
+			pgstat_progress_update_param(PROG_PARAM_VAC_NUM_DEAD_TUP_SLOTS,
+										 vacrelstats->max_dead_tuples -
+										 vacrelstats->num_dead_tuples);
+		}
 	}
 
+	/* heap_blocks_vacuumed, one last time */
+	pgstat_progress_update_param(PROG_PARAM_VAC_HEAP_BLKS_VACUUMED, blkno);
+
 	pfree(frozen);
 
 	/* save stats for use later */
@@ -1212,16 +1270,26 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		vacuum_log_cleanup_info(onerel, vacrelstats);
 
 		/* Remove index entries */
+		pgstat_progress_update_param(PROG_PARAM_VAC_PHASE,
+									 LV_PHASE_VACUUM_INDEX);
 		for (i = 0; i < nindexes; i++)
 			lazy_vacuum_index(Irel[i],
 							  &indstats[i],
 							  vacrelstats);
+
+		/* index_vacuum_count */
+		pgstat_progress_update_param(PROG_PARAM_VAC_NUM_INDEX_VACUUMS,
+									 vacrelstats->num_index_scans + 1);
+
 		/* Remove tuples from heap */
+		pgstat_progress_update_param(PROG_PARAM_VAC_PHASE,
+									 LV_PHASE_VACUUM_HEAP);
 		lazy_vacuum_heap(onerel, vacrelstats);
 		vacrelstats->num_index_scans++;
 	}
 
 	/* Do post-vacuum cleanup and statistics update for each index */
+	pgstat_progress_update_param(PROG_PARAM_VAC_PHASE, LV_PHASE_CLEANUP);
 	for (i = 0; i < nindexes; i++)
 		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 22ea06c..a8c3a33 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1747,6 +1747,28 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
     pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
     pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
    FROM pg_database d;
+pg_stat_progress_vacuum| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 1 THEN 'scanning heap'::text
+            WHEN 2 THEN 'vacuuming indexes'::text
+            WHEN 3 THEN 'vacuuming heap'::text
+            WHEN 4 THEN 'cleanup'::text
+            ELSE 'unknown phase'::text
+        END AS vacuum_phase,
+    s.param2 AS heap_blks_total,
+    s.param3 AS heap_blks_vacuumed,
+    s.param4 AS index_vacuum_count,
+    s.param5 AS dead_tups_found,
+    s.param6 AS dead_tup_slots,
+        CASE s.param2
+            WHEN 0 THEN round(100.0, 2)
+            ELSE round(((((s.param3 + 1))::numeric * 100.0) / (s.param2)::numeric), 2)
+        END AS percent_done
+   FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10)
+     JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_replication| SELECT s.pid,
     s.usesysid,
     u.rolname AS usename,
-- 
1.7.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to