On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> > The only change from the current patch is (3).  (1) still calls
> > count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> > the progress reporting to set the TOTAL in ProcessUtilitySlow().
>
> As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
> ProcessUtilitySlow is not the only call site for DefineIndex (although, I 
> don’t know whether all of them need progress tracking), for instance, there 
> is ALTER TABLE that calls DefineIndex to create index for constraints. So I 
> feel like rearranging progress reporting will result in unnecessary code 
> duplication in those call sites, so passing in an optional parameter seems to 
> be easier here, if we are going to optimize it, after all. Especially if 
> back-patching is a non-issue.

Yeah.  See attached.  I don't like duplicating the loop.  Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions().  But why is it important to lock
them for this?  If they weren't locked before, that'd be a pre-existing
problem...
>From 1e05b055d1bd21265b68265ddef5e9654629087c Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladys...@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 ++-
 src/backend/bootstrap/bootparse.y             |  2 +
 src/backend/commands/indexcmds.c              | 70 +++++++++++++++++--
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/tcop/utility.c                    |  8 ++-
 src/backend/utils/activity/backend_progress.c | 28 ++++++++
 src/include/commands/defrem.h                 |  1 +
 src/include/utils/backend_progress.h          |  1 +
 8 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6912,7 +6915,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598e..81a1b7bfec3 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -306,6 +306,7 @@ Boot_DeclareIndexStmt:
 								$4,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
@@ -358,6 +359,7 @@ Boot_DeclareUniqueIndexStmt:
 								$5,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..62a8f0d6aa2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,32 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions.  Note that this
+ * also excludes foreign tables.  This should be consistent with the loop in
+ * ProcessUtilitySlow().
+ * XXX: are the partitions already locked when this is called by other code paths ?
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid			partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -518,6 +544,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
  *		case the caller had better have checked it earlier.
  * 'skip_build': make the catalog entries but don't create the index files
  * 'quiet': suppress the NOTICE chatter ordinarily provided for constraints.
+ * 'total_leaves': total number of leaf partitions (excluding foreign tables)
  *
  * Returns the object address of the created index.
  */
@@ -527,6 +554,7 @@ DefineIndex(Oid relationId,
 			Oid indexRelationId,
 			Oid parentIndexId,
 			Oid parentConstraintId,
+			int total_leaves,
 			bool is_alter_table,
 			bool check_rights,
 			bool check_not_in_use,
@@ -1219,8 +1247,22 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command;
+			 * don't update it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+				 /*
+				  * When called by ProcessUtilitySlow() the number of leaves is
+				  * passed in as an optimization.
+				  */
+				 if (total_leaves < 0)
+					 total_leaves = count_leaf_partitions(relationId);
+
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_leaves);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1292,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1302,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1426,14 +1470,24 @@ DefineIndex(Oid relationId,
 								InvalidOid, /* no predefined OID */
 								indexRelationId,	/* this is our child */
 								createdConstraintId,
+								-1, /* The progress isn't updated during recursion */
 								is_alter_table, check_rights, check_not_in_use,
 								skip_build, quiet);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					/*
+					 * If a pre-existing index was attached, the progress
+					 * report is updated here.  But if the index is
+					 * partitioned, we don't count its partitions, since
+					 * that's expensive, and the ATTACH is fast anyway.
+					 */
+					if (!RELKIND_HAS_PARTITIONS(child_relkind))
+						pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1538,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e2c5f797cd..2ef4491ec8e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1216,6 +1216,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 						InvalidOid,
 						RelationGetRelid(idxRel),
 						constraintOid,
+						-1,
 						false, false, false, false, false);
 
 			index_close(idxRel, AccessShareLock);
@@ -8640,6 +8641,7 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 						  InvalidOid,	/* no predefined OID */
 						  InvalidOid,	/* no parent index */
 						  InvalidOid,	/* no parent constraint */
+						  -1,
 						  true, /* is_alter_table */
 						  check_rights,
 						  false,	/* check_not_in_use - we did it already */
@@ -18105,7 +18107,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 										   &conOid);
 			DefineIndex(RelationGetRelid(attachrel), stmt, InvalidOid,
 						RelationGetRelid(idxRel),
-						conOid,
+						conOid, -1,
 						true, false, false, false, false);
 		}
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index c7d9d96b45d..c8317822a64 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1462,6 +1462,7 @@ ProcessUtilitySlow(ParseState *pstate,
 					Oid			relid;
 					LOCKMODE	lockmode;
 					bool		is_alter_table;
+					int			total_leaves = 0;
 
 					if (stmt->concurrent)
 						PreventInTransactionBlock(isTopLevel,
@@ -1502,7 +1503,8 @@ ProcessUtilitySlow(ParseState *pstate,
 						inheritors = find_all_inheritors(relid, lockmode, NULL);
 						foreach(lc, inheritors)
 						{
-							char		relkind = get_rel_relkind(lfirst_oid(lc));
+							Oid			partrelid = lfirst_oid(lc);
+							char		relkind = get_rel_relkind(partrelid);
 
 							if (relkind != RELKIND_RELATION &&
 								relkind != RELKIND_MATVIEW &&
@@ -1519,6 +1521,9 @@ ProcessUtilitySlow(ParseState *pstate,
 												stmt->relation->relname),
 										 errdetail("Table \"%s\" contains partitions that are foreign tables.",
 												   stmt->relation->relname)));
+
+							if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+								total_leaves++;
 						}
 						list_free(inheritors);
 					}
@@ -1545,6 +1550,7 @@ ProcessUtilitySlow(ParseState *pstate,
 									InvalidOid, /* no predefined OID */
 									InvalidOid, /* no parent index */
 									InvalidOid, /* no parent constraint */
+									total_leaves,
 									is_alter_table,
 									true,	/* check_rights */
 									true,	/* check_not_in_use */
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,34 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of the current backend.
+ *-----------
+ */
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64		val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 4f7f87fc62c..b774b3028ce 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -29,6 +29,7 @@ extern ObjectAddress DefineIndex(Oid relationId,
 								 Oid indexRelationId,
 								 Oid parentIndexId,
 								 Oid parentConstraintId,
+								 int total_leaves,
 								 bool is_alter_table,
 								 bool check_rights,
 								 bool check_not_in_use,
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.34.1

Reply via email to