doc review for parallel vacuum

Justin Pryzby Sat, 21 Mar 2020 19:18:46 -0700

Original, long thread
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f


diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ab1b8c2398..140637983a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -237,15 +237,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 
<replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of 
<command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of 
<command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref 
linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref 
linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -253,7 +253,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 
<replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and terminate at the 
end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -372,7 +372,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 
<replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>

diff --git a/src/backend/access/heap/vacuumlazy.c 
b/src/backend/access/heap/vacuumlazy.c
index e017db4446..0ab0bea312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -194,7 +194,7 @@ typedef struct LVShared
         * live tuples in the index vacuum case or the new live tuples in the
         * index cleanup case.
         *
-        * estimated_count is true if the reltuples is an estimated value.
+        * estimated_count is true if reltuples is an estimated value.
         */
        double          reltuples;
        bool            estimated_count;
@@ -757,7 +757,7 @@ skip_blocks(Relation onerel, VacuumParams *params, 
BlockNumber *next_unskippable
  *             to reclaim dead line pointers.
  *
  *             If the table has at least two indexes, we execute both index 
vacuum
- *             and index cleanup with parallel workers unless the parallel 
vacuum is
+ *             and index cleanup with parallel workers unless parallel vacuum 
is
  *             disabled.  In a parallel vacuum, we enter parallel mode and then
  *             create both the parallel context and the DSM segment before 
starting
  *             heap scan so that we can record dead tuples to the DSM segment. 
 All
@@ -836,7 +836,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
LVRelStats *vacrelstats,
        vacrelstats->latestRemovedXid = InvalidTransactionId;
 
        /*
-        * Initialize the state for a parallel vacuum.  As of now, only one 
worker
+        * Initialize state for a parallel vacuum.  As of now, only one worker
         * can be used for an index, so we invoke parallelism only if there are 
at
         * least two indexes on a table.
         */
@@ -864,7 +864,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
LVRelStats *vacrelstats,
        }
 
        /*
-        * Allocate the space for dead tuples in case the parallel vacuum is not
+        * Allocate the space for dead tuples in case parallel vacuum is not
         * initialized.
         */
        if (!ParallelVacuumIsActive(lps))
@@ -2111,7 +2111,7 @@ parallel_vacuum_index(Relation *Irel, 
IndexBulkDeleteResult **stats,
                shared_indstats = get_indstats(lvshared, idx);
 
                /*
-                * Skip processing indexes that doesn't participate in parallel
+                * Skip processing indexes that don't participate in parallel
                 * operation
                 */
                if (shared_indstats == NULL ||
@@ -2223,7 +2223,7 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult 
**stats,
                shared_indstats->updated = true;
 
                /*
-                * Now that the stats[idx] points to the DSM segment, we don't 
need
+                * Now that stats[idx] points to the DSM segment, we don't need
                 * the locally allocated results.
                 */
                pfree(*stats);
@@ -2329,7 +2329,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult 
**stats,
  *     lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *             reltuples is the number of heap tuples and estimated_count is 
true
- *             if the reltuples is an estimated value.
+ *             if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -2916,9 +2916,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -2937,7 +2937,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int 
nindexes, int nrequested,
        int                     i;
 
        /*
-        * We don't allow to perform parallel operation in standalone backend or
+        * We don't allow performing parallel operation in standalone backend or
         * when parallelism is disabled.
         */
        if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3010,7 +3010,7 @@ prepare_index_statistics(LVShared *lvshared, bool 
*can_parallel_vacuum,
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3181,7 +3181,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, 
LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3288,7 +3288,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared 
*lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index df06e7d174..ac348b312c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -493,7 +493,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..76d33b1ba2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2034,23 +2034,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.  We 
achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done 
less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold, which is calculated based on the number of active
  * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
        /*
         * The number of parallel vacuum workers.  0 by default which means 
choose
-        * based on the number of indexes.  -1 indicates a parallel vacuum is
+        * based on the number of indexes.  -1 indicates parallel vacuum is
         * disabled.
         */
        int                     nworkers;
-- 
2.17.0

>From aec387f1c5e405d504ade077a20db7b6ff6c3835 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <[email protected]>
Date: Sun, 19 Jan 2020 22:35:01 -0600
Subject: [PATCH v1 1/2] docs review for parallel vacuum

---
 doc/src/sgml/ref/vacuum.sgml | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ab1b8c2398..140637983a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -237,15 +237,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -253,7 +253,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and terminate at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -372,7 +372,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
-- 
2.17.0

>From 6ee0aca17afdd365a4b3d51a9c0ad152761b7576 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <[email protected]>
Date: Mon, 20 Jan 2020 00:18:44 -0600
Subject: [PATCH v1 2/2] Review comments for parallel vacuum

---
 src/backend/access/heap/vacuumlazy.c  | 26 +++++++++++++-------------
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/commands/vacuum.c         | 18 +++++++++---------
 src/include/commands/vacuum.h         |  2 +-
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e017db4446..0ab0bea312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -194,7 +194,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -757,7 +757,7 @@ skip_blocks(Relation onerel, VacuumParams *params, BlockNumber *next_unskippable
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -836,7 +836,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
+	 * Initialize state for a parallel vacuum.  As of now, only one worker
 	 * can be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
@@ -864,7 +864,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}
 
 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2111,7 +2111,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);
 
 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2223,7 +2223,7 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;
 
 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
+		 * Now that stats[idx] points to the DSM segment, we don't need
 		 * the locally allocated results.
 		 */
 		pfree(*stats);
@@ -2329,7 +2329,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -2916,9 +2916,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -2937,7 +2937,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;
 
 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3010,7 +3010,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3181,7 +3181,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3288,7 +3288,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index df06e7d174..ac348b312c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -493,7 +493,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..76d33b1ba2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2034,23 +2034,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold, which is calculated based on the number of active
  * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
2.17.0

doc review for parallel vacuum

Reply via email to