On Tue, Nov 15, 2022 at 09:57:26AM +0900, Michael Paquier wrote: > Anyway, multi-inserts are going to be solution better than > CatalogTupleInsertWithInfo() in some cases, because we would just > generate one WAL record of N inserts rather than N records with one > INSERT each.
Something that you did not consider in the initial patch is that we may finish by opening catalog indexes even in cases where this would not have happened on HEAD, as we may finish by doing nothing when copying the stats or updating them during an analyze, and that's not fine IMO. However it is easy enough to minimize the cost: just do a CatalogOpenIndexes() when absolutely required, and close things only if the indexes have been opened. Then, there are the cases where it is worth switching to a multi-insert logic as these are going to manipulate more than 2 entries all the time: enum list addition and two code paths of tsearchcmds.c (where up to 16 entries can be lined up). This is a case-by-case analysis. For example, in the case of the enums, the number of elements is known in advance so it is possible to know the number of slots that would be used and initialize them. But that's not something you would do for the first tsearch bits where the data is built upon a scan so the slot init should be delayed. The second tsearch one can use a predictible approach, like the enums based on the number of known elements to insert. So I've given a try at all that, and finished with the attached. This patch finishes with a list of bullet points, so this had better be split into different commits, I guess. Thoughts? -- Michael
From 4dd7d8a7670343132d8ba6fb032eaf86ec724378 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Tue, 15 Nov 2022 15:44:45 +0900 Subject: [PATCH v2] Avoid some overhead with some catalog inserts and updates This commit makes lighter a couple of code paths that do catalog updates and inserts. First, CatalogTupleInsert() is costly when using it for the insertion or the update of multiple tuples compared to CatalogTupleInsertWithInfo(), as it would need to open and close the indexes of the catalog worked on multiple times. It happens that some code paths use CatalogTupleInsert() while CatalogTupleInsertWithInfo() would be a better choice, and the following ones are changed in this commit: - REINDEX CONCURRENTLY when copying statistics from one index relation to the other. Multi-INSERTs are avoided here, as this would begin to show benefits on indexes with multiple expressions, for example, which may not be the most common pattern. This happens to be noticeable on profiles with indexes having many expressions in a worst case scenario. - Update of statistics on ANALYZE, that mixes inserts and updates. In each case, the catalog indexes are opened only if at least one insertion and/or update is required, to minimize the cost of the operation. Second, a few code paths are switched from CatalogTupleInsert() to a multi-insert approach through tuple slots: - ALTER TEXT SEARCH CONFIGURATION ADD/ALTER MAPPING when inserting new entries. The replacement logic switches to CatalogTupleInsertWithInfo() for the updates. - CREATE TEXT SEARCH CONFIGURATION. - CREATE/ALTER TYPE AS ENUM when adding a list of enum values. These use a case-by-case logic, so as the initialization of tuple slots is done only when necessary to minimize the cost of the operation. Author: Ranier Vilela, Michael Paquier Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/CAEudQAqh0F9y6Di_Wc8xW4zkWm_5SDd-nRfVsCn=h0nm1c_...@mail.gmail.com --- src/backend/catalog/heap.c | 10 ++- src/backend/catalog/pg_enum.c | 55 +++++++++---- src/backend/commands/analyze.c | 11 ++- src/backend/commands/tsearchcmds.c | 120 +++++++++++++++++++++++------ 4 files changed, 157 insertions(+), 39 deletions(-) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 5b49cc5a09..bdd413f01b 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -2856,6 +2856,7 @@ CopyStatistics(Oid fromrelid, Oid torelid) SysScanDesc scan; ScanKeyData key[1]; Relation statrel; + CatalogIndexState indstate = NULL; statrel = table_open(StatisticRelationId, RowExclusiveLock); @@ -2878,13 +2879,20 @@ CopyStatistics(Oid fromrelid, Oid torelid) /* update the copy of the tuple and insert it */ statform->starelid = torelid; - CatalogTupleInsert(statrel, tup); + + /* fetch index information when we know we need it */ + if (indstate == NULL) + indstate = CatalogOpenIndexes(statrel); + + CatalogTupleInsertWithInfo(statrel, tup, indstate); heap_freetuple(tup); } systable_endscan(scan); + if (indstate != NULL) + CatalogCloseIndexes(indstate); table_close(statrel, RowExclusiveLock); } diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c index 114715498d..ee11f5f6f1 100644 --- a/src/backend/catalog/pg_enum.c +++ b/src/backend/catalog/pg_enum.c @@ -61,14 +61,14 @@ void EnumValuesCreate(Oid enumTypeOid, List *vals) { Relation pg_enum; - NameData enumlabel; Oid *oids; int elemno, num_elems; - Datum values[Natts_pg_enum]; - bool nulls[Natts_pg_enum]; ListCell *lc; - HeapTuple tup; + int slotCount = 0; + int nslots; + CatalogIndexState indstate; + TupleTableSlot **slot; num_elems = list_length(vals); @@ -111,12 +111,21 @@ EnumValuesCreate(Oid enumTypeOid, List *vals) qsort(oids, num_elems, sizeof(Oid), oid_cmp); /* and make the entries */ - memset(nulls, false, sizeof(nulls)); + indstate = CatalogOpenIndexes(pg_enum); + + /* allocate the slots to use and initialize them */ + nslots = Min(num_elems, + MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_enum)); + slot = palloc(sizeof(TupleTableSlot *) * nslots); + for (int i = 0; i < nslots; i++) + slot[i] = MakeSingleTupleTableSlot(RelationGetDescr(pg_enum), + &TTSOpsHeapTuple); elemno = 0; foreach(lc, vals) { char *lab = strVal(lfirst(lc)); + NameData *enumlabel = palloc0(NAMEDATALEN); /* * labels are stored in a name field, for easier syscache lookup, so @@ -129,22 +138,42 @@ EnumValuesCreate(Oid enumTypeOid, List *vals) errdetail("Labels must be %d bytes or less.", NAMEDATALEN - 1))); - values[Anum_pg_enum_oid - 1] = ObjectIdGetDatum(oids[elemno]); - values[Anum_pg_enum_enumtypid - 1] = ObjectIdGetDatum(enumTypeOid); - values[Anum_pg_enum_enumsortorder - 1] = Float4GetDatum(elemno + 1); - namestrcpy(&enumlabel, lab); - values[Anum_pg_enum_enumlabel - 1] = NameGetDatum(&enumlabel); + ExecClearTuple(slot[slotCount]); - tup = heap_form_tuple(RelationGetDescr(pg_enum), values, nulls); + memset(slot[slotCount]->tts_isnull, false, + slot[slotCount]->tts_tupleDescriptor->natts * sizeof(bool)); - CatalogTupleInsert(pg_enum, tup); - heap_freetuple(tup); + slot[slotCount]->tts_values[Anum_pg_enum_oid - 1] = ObjectIdGetDatum(oids[elemno]); + slot[slotCount]->tts_values[Anum_pg_enum_enumtypid - 1] = ObjectIdGetDatum(enumTypeOid); + slot[slotCount]->tts_values[Anum_pg_enum_enumsortorder - 1] = Float4GetDatum(elemno + 1); + + namestrcpy(enumlabel, lab); + slot[slotCount]->tts_values[Anum_pg_enum_enumlabel - 1] = NameGetDatum(enumlabel); + + ExecStoreVirtualTuple(slot[slotCount]); + slotCount++; + + /* if slots are full, insert a batch of tuples */ + if (slotCount == nslots) + { + CatalogTuplesMultiInsertWithInfo(pg_enum, slot, slotCount, + indstate); + slotCount = 0; + } elemno++; } + /* Insert any tuples left in the buffer */ + if (slotCount > 0) + CatalogTuplesMultiInsertWithInfo(pg_enum, slot, slotCount, + indstate); + /* clean up */ pfree(oids); + for (int i = 0; i < nslots; i++) + ExecDropSingleTupleTableSlot(slot[i]); + CatalogCloseIndexes(indstate); table_close(pg_enum, RowExclusiveLock); } diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index ff1354812b..bf0ec8b374 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1624,6 +1624,7 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats) { Relation sd; int attno; + CatalogIndexState indstate = NULL; if (natts <= 0) return; /* nothing to do */ @@ -1725,6 +1726,10 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats) Int16GetDatum(stats->attr->attnum), BoolGetDatum(inh)); + /* Open index information when we know we need it */ + if (indstate == NULL) + indstate = CatalogOpenIndexes(sd); + if (HeapTupleIsValid(oldtup)) { /* Yes, replace it */ @@ -1734,18 +1739,20 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats) nulls, replaces); ReleaseSysCache(oldtup); - CatalogTupleUpdate(sd, &stup->t_self, stup); + CatalogTupleUpdateWithInfo(sd, &stup->t_self, stup, indstate); } else { /* No, insert new tuple */ stup = heap_form_tuple(RelationGetDescr(sd), values, nulls); - CatalogTupleInsert(sd, stup); + CatalogTupleInsertWithInfo(sd, stup, indstate); } heap_freetuple(stup); } + if (indstate != NULL) + CatalogCloseIndexes(indstate); table_close(sd, RowExclusiveLock); } diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index 9304c53d4b..67f723e908 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -1004,8 +1004,24 @@ DefineTSConfiguration(List *names, List *parameters, ObjectAddress *copied) ScanKeyData skey; SysScanDesc scan; HeapTuple maptup; + TupleDesc mapDesc; + TupleTableSlot **slot; + CatalogIndexState indstate; + int max_slots, + slot_init_count, + slot_stored_count; mapRel = table_open(TSConfigMapRelationId, RowExclusiveLock); + mapDesc = RelationGetDescr(mapRel); + + indstate = CatalogOpenIndexes(mapRel); + + /* + * Allocate the slots to use, but delay costly initialization until we + * know that they will be used. + */ + max_slots = MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_ts_config_map); + slot = palloc(sizeof(TupleTableSlot *) * max_slots); ScanKeyInit(&skey, Anum_pg_ts_config_map_mapcfg, @@ -1015,29 +1031,54 @@ DefineTSConfiguration(List *names, List *parameters, ObjectAddress *copied) scan = systable_beginscan(mapRel, TSConfigMapIndexId, true, NULL, 1, &skey); + /* number of slots currently storing tuples */ + slot_stored_count = 0; + /* number of slots currently initialized */ + slot_init_count = 0; + while (HeapTupleIsValid((maptup = systable_getnext(scan)))) { Form_pg_ts_config_map cfgmap = (Form_pg_ts_config_map) GETSTRUCT(maptup); - HeapTuple newmaptup; - Datum mapvalues[Natts_pg_ts_config_map]; - bool mapnulls[Natts_pg_ts_config_map]; - memset(mapvalues, 0, sizeof(mapvalues)); - memset(mapnulls, false, sizeof(mapnulls)); + if (slot_init_count < max_slots) + { + slot[slot_stored_count] = MakeSingleTupleTableSlot(mapDesc, + &TTSOpsHeapTuple); + slot_init_count++; + } - mapvalues[Anum_pg_ts_config_map_mapcfg - 1] = cfgOid; - mapvalues[Anum_pg_ts_config_map_maptokentype - 1] = cfgmap->maptokentype; - mapvalues[Anum_pg_ts_config_map_mapseqno - 1] = cfgmap->mapseqno; - mapvalues[Anum_pg_ts_config_map_mapdict - 1] = cfgmap->mapdict; + ExecClearTuple(slot[slot_stored_count]); - newmaptup = heap_form_tuple(mapRel->rd_att, mapvalues, mapnulls); + memset(slot[slot_stored_count]->tts_isnull, false, + slot[slot_stored_count]->tts_tupleDescriptor->natts * sizeof(bool)); - CatalogTupleInsert(mapRel, newmaptup); + slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_mapcfg - 1] = cfgOid; + slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_maptokentype - 1] = cfgmap->maptokentype; + slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_mapseqno - 1] = cfgmap->mapseqno; + slot[slot_stored_count]->tts_values[Anum_pg_ts_config_map_mapdict - 1] = cfgmap->mapdict; - heap_freetuple(newmaptup); + ExecStoreVirtualTuple(slot[slot_stored_count]); + slot_stored_count++; + + /* If slots are full, insert a batch of tuples */ + if (slot_stored_count == max_slots) + { + CatalogTuplesMultiInsertWithInfo(mapRel, slot, slot_stored_count, + indstate); + slot_stored_count = 0; + } } + /* Insert any tuples left in the buffer */ + if (slot_stored_count > 0) + CatalogTuplesMultiInsertWithInfo(mapRel, slot, slot_stored_count, + indstate); + + for (int i = 0; i < slot_init_count; i++) + ExecDropSingleTupleTableSlot(slot[i]); + systable_endscan(scan); + CatalogCloseIndexes(indstate); } address = makeConfigurationDependencies(tup, false, mapRel); @@ -1225,6 +1266,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt, Oid *dictIds; int ndict; ListCell *c; + CatalogIndexState indstate; tsform = (Form_pg_ts_config) GETSTRUCT(tup); cfgId = tsform->oid; @@ -1275,6 +1317,8 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt, i++; } + indstate = CatalogOpenIndexes(relMap); + if (stmt->replace) { /* @@ -1334,7 +1378,7 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt, newtup = heap_modify_tuple(maptup, RelationGetDescr(relMap), repl_val, repl_null, repl_repl); - CatalogTupleUpdate(relMap, &newtup->t_self, newtup); + CatalogTupleUpdateWithInfo(relMap, &newtup->t_self, newtup, indstate); } } @@ -1342,6 +1386,18 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt, } else { + TupleTableSlot **slot; + int slotCount = 0; + int nslots; + + /* Allocate the slots to use and initialize them */ + nslots = Min(ntoken * ndict, + MAX_CATALOG_MULTI_INSERT_BYTES / sizeof(FormData_pg_ts_config_map)); + slot = palloc(sizeof(TupleTableSlot *) * nslots); + for (i = 0; i < nslots; i++) + slot[i] = MakeSingleTupleTableSlot(RelationGetDescr(relMap), + &TTSOpsHeapTuple); + /* * Insertion of new entries */ @@ -1349,23 +1405,41 @@ MakeConfigurationMapping(AlterTSConfigurationStmt *stmt, { for (j = 0; j < ndict; j++) { - Datum values[Natts_pg_ts_config_map]; - bool nulls[Natts_pg_ts_config_map]; + ExecClearTuple(slot[slotCount]); - memset(nulls, false, sizeof(nulls)); - values[Anum_pg_ts_config_map_mapcfg - 1] = ObjectIdGetDatum(cfgId); - values[Anum_pg_ts_config_map_maptokentype - 1] = Int32GetDatum(tokens[i]); - values[Anum_pg_ts_config_map_mapseqno - 1] = Int32GetDatum(j + 1); - values[Anum_pg_ts_config_map_mapdict - 1] = ObjectIdGetDatum(dictIds[j]); + memset(slot[slotCount]->tts_isnull, false, + slot[slotCount]->tts_tupleDescriptor->natts * sizeof(bool)); - tup = heap_form_tuple(relMap->rd_att, values, nulls); - CatalogTupleInsert(relMap, tup); + slot[slotCount]->tts_values[Anum_pg_ts_config_map_mapcfg - 1] = ObjectIdGetDatum(cfgId); + slot[slotCount]->tts_values[Anum_pg_ts_config_map_maptokentype - 1] = Int32GetDatum(tokens[i]); + slot[slotCount]->tts_values[Anum_pg_ts_config_map_mapseqno - 1] = Int32GetDatum(j + 1); + slot[slotCount]->tts_values[Anum_pg_ts_config_map_mapdict - 1] = ObjectIdGetDatum(dictIds[j]); - heap_freetuple(tup); + ExecStoreVirtualTuple(slot[slotCount]); + slotCount++; + + /* If slots are full, insert a batch of tuples */ + if (slotCount == nslots) + { + CatalogTuplesMultiInsertWithInfo(relMap, slot, slotCount, + indstate); + slotCount = 0; + } } } + + /* Insert any tuples left in the buffer */ + if (slotCount > 0) + CatalogTuplesMultiInsertWithInfo(relMap, slot, slotCount, + indstate); + + for (i = 0; i < nslots; i++) + ExecDropSingleTupleTableSlot(slot[i]); } + /* clean up */ + CatalogCloseIndexes(indstate); + EventTriggerCollectAlterTSConfig(stmt, cfgId, dictIds, ndict); } -- 2.38.1
signature.asc
Description: PGP signature