On Tue, Nov 15, 2022 at 09:57:26AM +0900, Michael Paquier wrote:
> Anyway, multi-inserts are going to be solution better than
> CatalogTupleInsertWithInfo() in some cases, because we would just
> generate one WAL record of N inserts rather than N records with one
> INSERT each.

Something that you did not consider in the initial patch is that we
may finish by opening catalog indexes even in cases where this would
not have happened on HEAD, as we may finish by doing nothing when
copying the stats or updating them during an analyze, and that's not
fine IMO.  However it is easy enough to minimize the cost: just do a
CatalogOpenIndexes() when absolutely required, and close things only
if the indexes have been opened.

Then, there are the cases where it is worth switching to a
multi-insert logic as these are going to manipulate more than 2
entries all the time: enum list addition and two code paths of
tsearchcmds.c (where up to 16 entries can be lined up).  This is a
case-by-case analysis.  For example, in the case of the enums, the
number of elements is known in advance so it is possible to know the
number of slots that would be used and initialize them.  But that's
not something you would do for the first tsearch bits where the data
is built upon a scan so the slot init should be delayed.  The second
tsearch one can use a predictible approach, like the enums based on
the number of known elements to insert.

So I've given a try at all that, and finished with the attached.  This
patch finishes with a list of bullet points, so this had better be
split into different commits, I guess.
Thoughts?
--
Michael
From 4dd7d8a7670343132d8ba6fb032eaf86ec724378 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Tue, 15 Nov 2022 15:44:45 +0900
Subject: [PATCH v2] Avoid some overhead with some catalog inserts and updates

This commit makes lighter a couple of code paths that do catalog updates
and inserts.

First, CatalogTupleInsert() is costly when using it for the insertion or the
update of multiple tuples compared to CatalogTupleInsertWithInfo(), as
it would need to open and close the indexes of the catalog worked on
multiple times.  It happens that some code paths use
CatalogTupleInsert() while CatalogTupleInsertWithInfo() would be a
better choice, and the following ones are changed in this commit:
- REINDEX CONCURRENTLY when copying statistics from one index relation
to the other.  Multi-INSERTs are avoided here, as this would begin to
show benefits on indexes with multiple expressions, for example, which
may not be the most common pattern.  This happens to be noticeable on
profiles with indexes having many expressions in a worst case scenario.
- Update of statistics on ANALYZE, that mixes inserts and updates.
In each case, the catalog indexes are opened only if at least one
insertion and/or update is required, to minimize the cost of the
operation.

Second, a few code paths are switched from CatalogTupleInsert() to
a multi-insert approach through tuple slots:
- ALTER TEXT SEARCH CONFIGURATION ADD/ALTER MAPPING when inserting new
entries.  The replacement logic switches to CatalogTupleInsertWithInfo()
for the updates.
- CREATE TEXT SEARCH CONFIGURATION.
- CREATE/ALTER TYPE AS ENUM when adding a list of enum values.
These use a case-by-case logic, so as the initialization of tuple slots
is done only when necessary to minimize the cost of the operation.

Author: Ranier Vilela, Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/CAEudQAqh0F9y6Di_Wc8xW4zkWm_5SDd-nRfVsCn=h0nm1c_...@mail.gmail.com
---
 src/backend/catalog/heap.c         |  10 ++-
 src/backend/catalog/pg_enum.c      |  55 +++++++++----
 src/backend/commands/analyze.c     |  11 ++-
 src/backend/commands/tsearchcmds.c | 120 +++++++++++++++++++++++------
 4 files changed, 157 insertions(+), 39 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 5b49cc5a09..bdd413f01b 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2856,6 +2856,7 @@ CopyStatistics(Oid fromrelid, Oid torelid)
 	SysScanDesc scan;
 	ScanKeyData key[1];
 	Relation	statrel;
+	CatalogIndexState indstate = NULL;
 
 	statrel = table_open(StatisticRelationId, RowExclusiveLock);
 
@@ -2878,13 +2879,20 @@ CopyStatistics(Oid fromrelid, Oid torelid)
 
 		/* update the copy of the tuple and insert it */
 		statform->starelid = torelid;
-		CatalogTupleInsert(statrel, tup);
+
+		/* fetch index information when we know we need it */
+		if (indstate == NULL)
+			indstate = CatalogOpenIndexes(statrel);
+
+		CatalogTupleInsertWithInfo(statrel, tup, indstate);
 
 		heap_freetuple(tup);
 	}
 
 	systable_endscan(scan);
 
+	if (indstate != NULL)
+		CatalogCloseIndexes(indstate);
 	table_close(statrel, RowExclusiveLock);
 }
 
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index 114715498d..ee11f5f6f1 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -61,14 +61,14 @@ void
 EnumValuesCreate(Oid enumTypeOid, List *vals)
 {
 	Relation	pg_enum;
-	NameData	enumlabel;
 	Oid		   *oids;
 	int			elemno,
 				num_elems;
-	Datum		values[Natts_pg_enum];
-	bool		nulls[Natts_pg_enum];
 	ListCell   *lc;
-	HeapTuple	tup;
+	int			slotCount = 0;
+	int			nslots;
+	CatalogIndexState indstate;
+	TupleTableSlot **slot;
 
 	num_elems = list_length(vals);
 
@@ -111,12 +111,21 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 	qsort(oids, num_elems, sizeof(Oid), oid_cmp);
 
 	/* and make the entries */
-	memset(nulls, false, sizeof(nulls));
+	indstate = CatalogOpenIndexes(pg_enum);
+
+	/* allocate the slots to use and initialize them */
+	nslots = Min(num_elems,
+				 MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_enum));
+	slot = palloc(sizeof(TupleTableSlot *) * nslots);
+	for (int i = 0; i < nslots; i++)
+		slot[i] = MakeSingleTupleTableSlot(RelationGetDescr(pg_enum),
+										   &TTSOpsHeapTuple);
 
 	elemno = 0;
 	foreach(lc, vals)
 	{
 		char	   *lab = strVal(lfirst(lc));
+		NameData   *enumlabel = palloc0(NAMEDATALEN);
 
 		/*
 		 * labels are stored in a name field, for easier syscache lookup, so
@@ -129,22 +138,42 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 					 errdetail("Labels must be %d bytes or less.",
 							   NAMEDATALEN - 1)));
 
-		values[Anum_pg_enum_oid - 1] = ObjectIdGetDatum(oids[elemno]);
-		values[Anum_pg_enum_enumtypid - 1] = ObjectIdGetDatum(enumTypeOid);
-		values[Anum_pg_enum_enumsortorder - 1] = Float4GetDatum(elemno + 1);
-		namestrcpy(&enumlabel, lab);
-		values[Anum_pg_enum_enumlabel - 1] = NameGetDatum(&enumlabel);
+		ExecClearTuple(slot[slotCount]);
 
-		tup = heap_form_tuple(RelationGetDescr(pg_enum), values, nulls);
+		memset(slot[slotCount]->tts_isnull, false,
+			   slot[slotCount]->tts_tupleDescriptor->natts * sizeof(bool));
 
-		CatalogTupleInsert(pg_enum, tup);
-		heap_freetuple(tup);
+		slot[slotCount]->tts_values[Anum_pg_enum_oid - 1] = ObjectIdGetDatum(oids[elemno]);
+		slot[slotCount]->tts_values[Anum_pg_enum_enumtypid - 1] = ObjectIdGetDatum(enumTypeOid);
+		slot[slotCount]->tts_values[Anum_pg_enum_enumsortorder - 1] = Float4GetDatum(elemno + 1);
+
+		namestrcpy(enumlabel, lab);
+		slot[slotCount]->tts_values[Anum_pg_enum_enumlabel - 1] = NameGetDatum(enumlabel);
+
+		ExecStoreVirtualTuple(slot[slotCount]);
+		slotCount++;
+
+		/* if slots are full, insert a batch of tuples */
+		if (slotCount == nslots)
+		{
+			CatalogTuplesMultiInsertWithInfo(pg_enum, slot, slotCount,
+											 indstate);
+			slotCount = 0;
+		}
 
 		elemno++;
 	}
 
+	/* Insert any tuples left in the buffer */
+	if (slotCount > 0)
+		CatalogTuplesMultiInsertWithInfo(pg_enum, slot, slotCount,
+										 indstate);
+
 	/* clean up */
 	pfree(oids);
+	for (int i = 0; i < nslots; i++)
+		ExecDropSingleTupleTableSlot(slot[i]);
+	CatalogCloseIndexes(indstate);
 	table_close(pg_enum, RowExclusiveLock);
 }
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ff1354812b..bf0ec8b374 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1624,6 +1624,7 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 {
 	Relation	sd;
 	int			attno;
+	CatalogIndexState indstate = NULL;
 
 	if (natts <= 0)
 		return;					/* nothing to do */
@@ -1725,6 +1726,10 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 								 Int16GetDatum(stats->attr->attnum),
 								 BoolGetDatum(inh));
 
+		/* Open index information when we know we need it */
+		if (indstate == NULL)
+			indstate = CatalogOpenIndexes(sd);
+
 		if (HeapTupleIsValid(oldtup))
 		{
 			/* Yes, replace it */
@@ -1734,18 +1739,20 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 									 nulls,
 									 replaces);
 			ReleaseSysCache(oldtup);
-			CatalogTupleUpdate(sd, &stup->t_self, stup);
+			CatalogTupleUpdateWithInfo(sd, &stup->t_self, stup, indstate);
 		}
 		else
 		{
 			/* No, insert new tuple */
 			stup = heap_form_tuple(RelationGetDescr(sd), values, nulls);
-			CatalogTupleInsert(sd, stup);
+			CatalogTupleInsertWithInfo(sd, stup, indstate);
 		}
 
 		heap_freetuple(stup);
 	}
 
+	if (indstate != NULL)
+		CatalogCloseIndexes(indstate);
 	table_close(sd, RowExclusiveLock);
 }
 
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 9304c53d4b..67f723e908 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -1004,8 +1004,24 @@ DefineTSConfiguration(List *names, List *parameters, ObjectAddress *copied)
 		ScanKeyData skey;
 		SysScanDesc scan;
 		HeapTuple	maptup;
+		TupleDesc   mapDesc;
+		TupleTableSlot **slot;
+		CatalogIndexState indstate;
+		int         max_slots,
+			slot_init_count,
+			slot_stored_count;
 
 		mapRel = table_open(TSConfigMapRelationId, RowExclusiveLock);
+		mapDesc = RelationGetDescr(mapRel);
+
+		indstate = CatalogOpenIndexes(mapRel);
+
+		/*
+		 * Allocate the slots to use, but delay costly initialization until we
+		 * know that they will be used.
+		 */
+		max_slots = MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_ts_config_map);
+		slot = palloc(sizeof(TupleTableSlot *) * max_slots);
 
 		ScanKeyInit(&skey,
 					Anum_pg_ts_config_map_mapcfg,
@@ -1015,29 +1031,54 @@ DefineTSConfiguration(List *names, List *parameters, ObjectAddress *copied)
 		scan = systable_beginscan(mapRel, TSConfigMapIndexId, true,
 								  NULL, 1, &skey);
 
+		/* number of slots currently storing tuples */
+		slot_stored_count = 0;
+		/* number of slots currently initialized */
+		slot_init_count = 0;
+
 		while (HeapTupleIsValid((maptup = systable_getnext(scan))))
 		{
 			Form_pg_ts_config_map cfgmap = (Form_pg_ts_config_map) GETSTRUCT(maptup);
-			HeapTuple	newmaptup;
-			Datum		mapvalues[Natts_pg_ts_config_map];
-			bool		mapnulls[Natts_pg_ts_config_map];
 
-			memset(mapvalues, 0, sizeof(mapvalues));
-			memset(mapnulls, false, sizeof(mapnulls));
+			if (slot_init_count < max_slots)
+			{
+				slot[slot_stored_count] = MakeSingleTupleTableSlot(mapDesc,
+																   &TTSOpsHeapTuple);
+				slot_init_count++;
+			}
 
-			mapvalues[Anum_pg_ts_config_map_mapcfg - 1] = cfgOid;
-			mapvalues[Anum_pg_ts_config_map_maptokentype - 1] = cfgmap->maptokentype;
-			mapvalues[Anum_pg_ts_config_map_mapseqno - 1] = cfgmap->mapseqno;
-			mapvalues[Anum_pg_ts_config_map_mapdict - 1] = cfgmap->mapdict;
+			ExecClearTuple(slot[slot_stored_count]);
 
-			newmaptup = heap_form_tuple(mapRel->rd_att, mapvalues, mapnulls);
+			memset(slot[slot_stored_count]->tts_isnull, false,
+				   slot[slot_stored_count]->tts_tupleDescriptor->natts * sizeof(bool));
 
-			CatalogTupleInsert(mapRel, newmaptup);
+			slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_mapcfg - 1] = cfgOid;
+			slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_maptokentype - 1] = cfgmap->maptokentype;
+			slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_mapseqno - 1] = cfgmap->mapseqno;
+			slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_mapdict - 1] = cfgmap->mapdict;
 
-			heap_freetuple(newmaptup);
+			ExecStoreVirtualTuple(slot[slot_stored_count]);
+			slot_stored_count++;
+
+			/* If slots are full, insert a batch of tuples */
+			if (slot_stored_count == max_slots)
+			{
+				CatalogTuplesMultiInsertWithInfo(mapRel, slot, slot_stored_count,
+												 indstate);
+				slot_stored_count = 0;
+			}
 		}
 
+		/* Insert any tuples left in the buffer */
+		if (slot_stored_count > 0)
+			CatalogTuplesMultiInsertWithInfo(mapRel, slot, slot_stored_count,
+											 indstate);
+
+		for (int i = 0; i < slot_init_count; i++)
+			ExecDropSingleTupleTableSlot(slot[i]);
+
 		systable_endscan(scan);
+		CatalogCloseIndexes(indstate);
 	}
 
 	address = makeConfigurationDependencies(tup, false, mapRel);
@@ -1225,6 +1266,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
 	Oid		   *dictIds;
 	int			ndict;
 	ListCell   *c;
+	CatalogIndexState indstate;
 
 	tsform = (Form_pg_ts_config) GETSTRUCT(tup);
 	cfgId = tsform->oid;
@@ -1275,6 +1317,8 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
 		i++;
 	}
 
+	indstate = CatalogOpenIndexes(relMap);
+
 	if (stmt->replace)
 	{
 		/*
@@ -1334,7 +1378,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
 				newtup = heap_modify_tuple(maptup,
 										   RelationGetDescr(relMap),
 										   repl_val, repl_null, repl_repl);
-				CatalogTupleUpdate(relMap, &newtup->t_self, newtup);
+				CatalogTupleUpdateWithInfo(relMap, &newtup->t_self, newtup, indstate);
 			}
 		}
 
@@ -1342,6 +1386,18 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
 	}
 	else
 	{
+		TupleTableSlot **slot;
+		int			slotCount = 0;
+		int			nslots;
+
+		/* Allocate the slots to use and initialize them */
+		nslots = Min(ntoken * ndict,
+					 MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_ts_config_map));
+		slot = palloc(sizeof(TupleTableSlot *) * nslots);
+		for (i = 0; i < nslots; i++)
+			slot[i] = MakeSingleTupleTableSlot(RelationGetDescr(relMap),
+											   &TTSOpsHeapTuple);
+
 		/*
 		 * Insertion of new entries
 		 */
@@ -1349,23 +1405,41 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt,
 		{
 			for (j = 0; j < ndict; j++)
 			{
-				Datum		values[Natts_pg_ts_config_map];
-				bool		nulls[Natts_pg_ts_config_map];
+				ExecClearTuple(slot[slotCount]);
 
-				memset(nulls, false, sizeof(nulls));
-				values[Anum_pg_ts_config_map_mapcfg - 1] = ObjectIdGetDatum(cfgId);
-				values[Anum_pg_ts_config_map_maptokentype - 1] = Int32GetDatum(tokens[i]);
-				values[Anum_pg_ts_config_map_mapseqno - 1] = Int32GetDatum(j + 1);
-				values[Anum_pg_ts_config_map_mapdict - 1] = ObjectIdGetDatum(dictIds[j]);
+				memset(slot[slotCount]->tts_isnull, false,
+					   slot[slotCount]->tts_tupleDescriptor->natts * sizeof(bool));
 
-				tup = heap_form_tuple(relMap->rd_att, values, nulls);
-				CatalogTupleInsert(relMap, tup);
+				slot[slotCount]->tts_values[Anum_pg_ts_config_map_mapcfg - 1] = ObjectIdGetDatum(cfgId);
+				slot[slotCount]->tts_values[Anum_pg_ts_config_map_maptokentype - 1] = Int32GetDatum(tokens[i]);
+				slot[slotCount]->tts_values[Anum_pg_ts_config_map_mapseqno - 1] = Int32GetDatum(j + 1);
+				slot[slotCount]->tts_values[Anum_pg_ts_config_map_mapdict - 1] = ObjectIdGetDatum(dictIds[j]);
 
-				heap_freetuple(tup);
+				ExecStoreVirtualTuple(slot[slotCount]);
+				slotCount++;
+
+				/* If slots are full, insert a batch of tuples */
+				if (slotCount == nslots)
+				{
+					CatalogTuplesMultiInsertWithInfo(relMap, slot, slotCount,
+													 indstate);
+					slotCount = 0;
+				}
 			}
 		}
+
+		/* Insert any tuples left in the buffer */
+		if (slotCount > 0)
+			CatalogTuplesMultiInsertWithInfo(relMap, slot, slotCount,
+											 indstate);
+
+		for (i = 0; i < nslots; i++)
+			ExecDropSingleTupleTableSlot(slot[i]);
 	}
 
+	/* clean up */
+	CatalogCloseIndexes(indstate);
+
 	EventTriggerCollectAlterTSConfig(stmt, cfgId, dictIds, ndict);
 }
 
-- 
2.38.1

Attachment: signature.asc
Description: PGP signature

Reply via email to