On Mon, 4 Mar 2019 at 07:29, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Andres Freund <and...@anarazel.de> writes: > > I still regularly see list overhead matter in production workloads. A > > lot of it being memory allocator overhead, which is why I'm concerned > > with a rewrite that doesn't reduce the number of memory allocations. > > Well, I did that in the v3 patch, and it still hasn't moved the needle > noticeably in any test case I've tried. At this point I'm really > struggling to see a reason why we shouldn't just mark this patch rejected > and move on. If you have test cases that suggest differently, please > show them don't just handwave.
I think we discussed this before, but... if this patch is not a win by itself (and we've already seen it's not really causing much in the way of regression, if any), then we need to judge it on what else we can do to exploit the new performance characteristics of List. For example list_nth() is now deadly fast. My primary interest here is getting rid of a few places where we build an array version of some List so that we can access the Nth element more quickly. What goes on in ExecInitRangeTable() is not particularly great for queries to partitioned tables with a large number of partitions where only one survives run-time pruning. I've hacked together a patch to show you what wins we can have with the new list implementation. Using the attached, (renamed to .txt to not upset CFbot) I get: setup: create table hashp (a int, b int) partition by hash (a); select 'create table hashp'||x||' partition of hashp for values with (modulus 10000, remainder '||x||');' from generate_Series(0,9999) x; \gexec alter table hashp add constraint hashp_pkey PRIMARY KEY (a); postgresql.conf plan_cache_mode = force_generic_plan max_parallel_workers_per_gather=0 max_locks_per_transaction=256 bench.sql \set p random(1,10000) select * from hashp where a = :p; master: tps = 189.499654 (excluding connections establishing) tps = 195.102743 (excluding connections establishing) tps = 194.338813 (excluding connections establishing) your List reimplementation v3 + attached tps = 12852.003735 (excluding connections establishing) tps = 12791.834617 (excluding connections establishing) tps = 12691.515641 (excluding connections establishing) The attached does include [1], but even with just that the performance is not as good as with the arraylist plus the follow-on exploits I added. Now that we have a much faster bms_next_member() some form of what in there might be okay. A profile shows that in this workload we're still spending 42% of the 12k TPS in hash_seq_search(). That's due to LockReleaseAll() having a hard time of it due to the bloated lock table from having to build the generic plan with 10k partitions. [2] aims to fix that, so likely we'll be closer to 18k TPS, or about 100x faster. In fact, I should test that... tps = 18763.977940 (excluding connections establishing) tps = 18589.531558 (excluding connections establishing) tps = 19011.295770 (excluding connections establishing) Yip, about 100x. I think these are worthy goals to aspire to. [1] https://commitfest.postgresql.org/22/1897/ [2] https://commitfest.postgresql.org/22/1993/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 2048d71535..63554b3057 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1610,6 +1610,7 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender, rte.rtekind = RTE_RELATION; rte.relid = relId; rte.relkind = RELKIND_RELATION; /* no need for exactness here */ + rte.delaylock = false; rte.rellockmode = AccessShareLock; context.rtables = list_make1(list_make1(&rte)); diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index 0208388af3..0ac564dd77 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -516,6 +516,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) rte->rtekind = RTE_RELATION; rte->relid = intoRelationAddr.objectId; rte->relkind = relkind; + rte->delaylock = false; rte->rellockmode = RowExclusiveLock; rte->requiredPerms = ACL_INSERT; diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c index fe99096efc..9b93c76aca 100644 --- a/src/backend/executor/execCurrent.c +++ b/src/backend/executor/execCurrent.c @@ -96,13 +96,13 @@ execCurrentOf(CurrentOfExpr *cexpr, { ExecRowMark *erm; Index i; - + int len = list_length(queryDesc->estate->es_range_table); /* * Here, the query must have exactly one FOR UPDATE/SHARE reference to * the target table, and we dig the ctid info out of that. */ erm = NULL; - for (i = 0; i < queryDesc->estate->es_range_table_size; i++) + for (i = 0; i < len; i++) { ExecRowMark *thiserm = queryDesc->estate->es_rowmarks[i]; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index a018925d4e..0cdd9ed44e 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -3962,7 +3962,7 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext) * perhaps other places.) */ if (econtext->ecxt_estate && - variable->varno <= econtext->ecxt_estate->es_range_table_size) + variable->varno <= list_length(econtext->ecxt_estate->es_range_table)) { RangeTblEntry *rte = exec_rt_fetch(variable->varno, econtext->ecxt_estate); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 61be56fe0b..c119197bc5 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -592,6 +592,40 @@ ExecCheckRTPerms(List *rangeTable, bool ereport_on_violation) return result; } +/* + * ExecCheckRTPermsFast + * As above, but only checks rtable entries that appear in checkEntries. + */ +bool +ExecCheckRTPermsFast(List *rangeTable, Bitmapset *checkEntries, + bool ereport_on_violation) +{ + int i; + bool result = true; + + i = -1; + while ((i = bms_next_member(checkEntries, i)) >= 0) + { + RangeTblEntry *rte = (RangeTblEntry *) list_nth(rangeTable, i); + + result = ExecCheckRTEPerms(rte); + if (!result) + { + Assert(rte->rtekind == RTE_RELATION); + if (ereport_on_violation) + aclcheck_error(ACLCHECK_NO_PRIV, get_relkind_objtype(get_rel_relkind(rte->relid)), + get_rel_name(rte->relid)); + return false; + } + } + + if (ExecutorCheckPerms_hook) + result = (*ExecutorCheckPerms_hook) (rangeTable, + ereport_on_violation); + return result; +} + + /* * ExecCheckRTEPerms * Check access permissions for a single RTE. @@ -816,7 +850,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) /* * Do permissions checks */ - ExecCheckRTPerms(rangeTable, true); + ExecCheckRTPermsFast(rangeTable, plannedstmt->rtablereqperm, true); /* * initialize the node's execution state @@ -912,7 +946,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) if (plannedstmt->rowMarks) { estate->es_rowmarks = (ExecRowMark **) - palloc0(estate->es_range_table_size * sizeof(ExecRowMark *)); + palloc0(list_length(estate->es_range_table) * sizeof(ExecRowMark *)); foreach(l, plannedstmt->rowMarks) { PlanRowMark *rc = (PlanRowMark *) lfirst(l); @@ -964,7 +998,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) ItemPointerSetInvalid(&(erm->curCtid)); erm->ermExtra = NULL; - Assert(erm->rti > 0 && erm->rti <= estate->es_range_table_size && + Assert(erm->rti > 0 && erm->rti <= list_length(estate->es_range_table) && estate->es_rowmarks[erm->rti - 1] == NULL); estate->es_rowmarks[erm->rti - 1] = erm; @@ -1571,7 +1605,7 @@ ExecEndPlan(PlanState *planstate, EState *estate) * close whatever rangetable Relations have been opened. We do not * release any locks we might hold on those rels. */ - num_relations = estate->es_range_table_size; + num_relations = list_length(estate->es_range_table); for (i = 0; i < num_relations; i++) { if (estate->es_relations[i]) @@ -2346,7 +2380,7 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo) ExecRowMark * ExecFindRowMark(EState *estate, Index rti, bool missing_ok) { - if (rti > 0 && rti <= estate->es_range_table_size && + if (rti > 0 && rti <= list_length(estate->es_range_table) && estate->es_rowmarks != NULL) { ExecRowMark *erm = estate->es_rowmarks[rti - 1]; @@ -2792,7 +2826,7 @@ EvalPlanQualSlot(EPQState *epqstate, { TupleTableSlot **slot; - Assert(rti > 0 && rti <= epqstate->estate->es_range_table_size); + Assert(rti > 0 && rti <= list_length(epqstate->estate->es_range_table)); slot = &epqstate->estate->es_epqTupleSlot[rti - 1]; if (*slot == NULL) @@ -2973,7 +3007,7 @@ EvalPlanQualBegin(EPQState *epqstate, EState *parentestate) /* * We already have a suitable child EPQ tree, so just reset it. */ - Index rtsize = parentestate->es_range_table_size; + Index rtsize = list_length(parentestate->es_range_table); PlanState *planstate = epqstate->planstate; MemSet(estate->es_epqScanDone, 0, rtsize * sizeof(bool)); @@ -3026,7 +3060,7 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree) MemoryContext oldcontext; ListCell *l; - rtsize = parentestate->es_range_table_size; + rtsize = list_length(parentestate->es_range_table); epqstate->estate = estate = CreateExecutorState(); @@ -3050,8 +3084,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree) estate->es_snapshot = parentestate->es_snapshot; estate->es_crosscheck_snapshot = parentestate->es_crosscheck_snapshot; estate->es_range_table = parentestate->es_range_table; - estate->es_range_table_array = parentestate->es_range_table_array; - estate->es_range_table_size = parentestate->es_range_table_size; estate->es_relations = parentestate->es_relations; estate->es_rowmarks = parentestate->es_rowmarks; estate->es_plannedstmt = parentestate->es_plannedstmt; diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index ef06b74a30..dfbfa294c3 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -111,8 +111,6 @@ CreateExecutorState(void) estate->es_snapshot = InvalidSnapshot; /* caller must initialize this */ estate->es_crosscheck_snapshot = InvalidSnapshot; /* no crosscheck */ estate->es_range_table = NIL; - estate->es_range_table_array = NULL; - estate->es_range_table_size = 0; estate->es_relations = NULL; estate->es_rowmarks = NULL; estate->es_plannedstmt = NULL; @@ -707,37 +705,21 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags) * ExecInitRangeTable * Set up executor's range-table-related data * - * We build an array from the range table list to allow faster lookup by RTI. - * (The es_range_table field is now somewhat redundant, but we keep it to - * avoid breaking external code unnecessarily.) - * This is also a convenient place to set up the parallel es_relations array. + * Allocate the es_relations array to store opened rels. */ void ExecInitRangeTable(EState *estate, List *rangeTable) { - Index rti; - ListCell *lc; - /* Remember the range table List as-is */ estate->es_range_table = rangeTable; - /* Set up the equivalent array representation */ - estate->es_range_table_size = list_length(rangeTable); - estate->es_range_table_array = (RangeTblEntry **) - palloc(estate->es_range_table_size * sizeof(RangeTblEntry *)); - rti = 0; - foreach(lc, rangeTable) - { - estate->es_range_table_array[rti++] = lfirst_node(RangeTblEntry, lc); - } - /* * Allocate an array to store an open Relation corresponding to each * rangetable entry, and initialize entries to NULL. Relations are opened * and stored here as needed. */ estate->es_relations = (Relation *) - palloc0(estate->es_range_table_size * sizeof(Relation)); + palloc0(list_length(rangeTable) * sizeof(Relation)); /* * es_rowmarks is also parallel to the es_range_table_array, but it's @@ -757,7 +739,7 @@ ExecGetRangeTableRelation(EState *estate, Index rti) { Relation rel; - Assert(rti > 0 && rti <= estate->es_range_table_size); + Assert(rti > 0 && rti <= list_length(estate->es_range_table)); rel = estate->es_relations[rti - 1]; if (rel == NULL) @@ -767,14 +749,15 @@ ExecGetRangeTableRelation(EState *estate, Index rti) Assert(rte->rtekind == RTE_RELATION); - if (!IsParallelWorker()) + if (!rte->delaylock && !IsParallelWorker()) { /* - * In a normal query, we should already have the appropriate lock, - * but verify that through an Assert. Since there's already an - * Assert inside table_open that insists on holding some lock, it - * seems sufficient to check this only when rellockmode is higher - * than the minimum. + * In a normal query, unless the planner set the delaylock flag, + * we should already have the appropriate lock, but verify that + * through an Assert. Since there's already an Assert inside + * heap_open that insists on holding some lock, it seems + * sufficient to check this only when rellockmode is higher than + * the minimum. */ rel = table_open(rte->relid, NoLock); Assert(rte->rellockmode == AccessShareLock || @@ -783,9 +766,10 @@ ExecGetRangeTableRelation(EState *estate, Index rti) else { /* - * If we are a parallel worker, we need to obtain our own local - * lock on the relation. This ensures sane behavior in case the - * parent process exits before we do. + * If we are a parallel worker or delaylock is set, we need to + * obtain a lock on the relation. For parallel workers, this + * ensures sane behavior in case the parent process exits before + * we do. */ rel = table_open(rte->relid, rte->rellockmode); } diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index f3be2429db..3b9b169960 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -107,7 +107,6 @@ ExecInitAppend(Append *node, EState *estate, int eflags) int firstvalid; int i, j; - ListCell *lc; /* check for unsupported flags */ Assert(!(eflags & EXEC_FLAG_MARK)); @@ -211,24 +210,21 @@ ExecInitAppend(Append *node, EState *estate, int eflags) * * While at it, find out the first valid partial plan. */ - j = i = 0; + j = 0; firstvalid = nplans; - foreach(lc, node->appendplans) + i = -1; + while ((i = bms_next_member(validsubplans, i)) >= 0) { - if (bms_is_member(i, validsubplans)) - { - Plan *initNode = (Plan *) lfirst(lc); + Plan *initNode = (Plan *) list_nth(node->appendplans, i); - /* - * Record the lowest appendplans index which is a valid partial - * plan. - */ - if (i >= node->first_partial_plan && j < firstvalid) - firstvalid = j; + /* + * Record the lowest appendplans index which is a valid partial + * plan. + */ + if (i >= node->first_partial_plan && j < firstvalid) + firstvalid = j; - appendplanstates[j++] = ExecInitNode(initNode, estate, eflags); - } - i++; + appendplanstates[j++] = ExecInitNode(initNode, estate, eflags); } appendstate->as_first_partial_plan = firstvalid; diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c index 7ba53ba185..18d13377dc 100644 --- a/src/backend/executor/nodeMergeAppend.c +++ b/src/backend/executor/nodeMergeAppend.c @@ -70,7 +70,6 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) int nplans; int i, j; - ListCell *lc; /* check for unsupported flags */ Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))); @@ -177,16 +176,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) * call ExecInitNode on each of the valid plans to be executed and save * the results into the mergeplanstates array. */ - j = i = 0; - foreach(lc, node->mergeplans) + j = 0; + i = -1; + while ((i = bms_next_member(validsubplans, i)) >= 0) { - if (bms_is_member(i, validsubplans)) - { - Plan *initNode = (Plan *) lfirst(lc); + Plan *initNode = (Plan *) list_nth(node->mergeplans, i); - mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags); - } - i++; + mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags); } mergestate->ps.ps_ProjInfo = NULL; diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index eba5866b1a..8462f58669 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -90,6 +90,8 @@ _copyPlannedStmt(const PlannedStmt *from) COPY_SCALAR_FIELD(jitFlags); COPY_NODE_FIELD(planTree); COPY_NODE_FIELD(rtable); + COPY_BITMAPSET_FIELD(rtablereqlock); + COPY_BITMAPSET_FIELD(rtablereqperm); COPY_NODE_FIELD(resultRelations); COPY_NODE_FIELD(rootResultRelations); COPY_NODE_FIELD(subplans); @@ -2353,6 +2355,7 @@ _copyRangeTblEntry(const RangeTblEntry *from) COPY_SCALAR_FIELD(rtekind); COPY_SCALAR_FIELD(relid); COPY_SCALAR_FIELD(relkind); + COPY_SCALAR_FIELD(delaylock); COPY_SCALAR_FIELD(rellockmode); COPY_NODE_FIELD(tablesample); COPY_NODE_FIELD(subquery); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 31499eb798..3140e6834b 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2630,6 +2630,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b) COMPARE_SCALAR_FIELD(rtekind); COMPARE_SCALAR_FIELD(relid); COMPARE_SCALAR_FIELD(relkind); + COMPARE_SCALAR_FIELD(delaylock); COMPARE_SCALAR_FIELD(rellockmode); COMPARE_NODE_FIELD(tablesample); COMPARE_NODE_FIELD(subquery); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 9f5edf487e..341560d5c8 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -308,6 +308,8 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node) WRITE_INT_FIELD(jitFlags); WRITE_NODE_FIELD(planTree); WRITE_NODE_FIELD(rtable); + WRITE_BITMAPSET_FIELD(rtablereqlock); + WRITE_BITMAPSET_FIELD(rtablereqperm); WRITE_NODE_FIELD(resultRelations); WRITE_NODE_FIELD(rootResultRelations); WRITE_NODE_FIELD(subplans); @@ -3030,6 +3032,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node) case RTE_RELATION: WRITE_OID_FIELD(relid); WRITE_CHAR_FIELD(relkind); + WRITE_BOOL_FIELD(delaylock); WRITE_INT_FIELD(rellockmode); WRITE_NODE_FIELD(tablesample); break; diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 5aa42242a9..11dcc08564 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -1363,6 +1363,7 @@ _readRangeTblEntry(void) case RTE_RELATION: READ_OID_FIELD(relid); READ_CHAR_FIELD(relkind); + READ_BOOL_FIELD(delaylock); READ_INT_FIELD(rellockmode); READ_NODE_FIELD(tablesample); break; @@ -1508,6 +1509,8 @@ _readPlannedStmt(void) READ_INT_FIELD(jitFlags); READ_NODE_FIELD(planTree); READ_NODE_FIELD(rtable); + READ_BITMAPSET_FIELD(rtablereqlock); + READ_BITMAPSET_FIELD(rtablereqperm); READ_NODE_FIELD(resultRelations); READ_NODE_FIELD(rootResultRelations); READ_NODE_FIELD(subplans); diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index f5b5fc5f0c..e31334e365 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -303,6 +303,8 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams) glob->subroots = NIL; glob->rewindPlanIDs = NULL; glob->finalrtable = NIL; + glob->rtablereqlock = NULL; + glob->rtablereqperm = NULL; glob->finalrowmarks = NIL; glob->resultRelations = NIL; glob->rootResultRelations = NIL; @@ -529,6 +531,8 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams) result->parallelModeNeeded = glob->parallelModeNeeded; result->planTree = top_plan; result->rtable = glob->finalrtable; + result->rtablereqlock = glob->rtablereqlock; + result->rtablereqperm = glob->rtablereqperm; result->resultRelations = glob->resultRelations; result->rootResultRelations = glob->rootResultRelations; result->subplans = glob->subplans; @@ -6068,6 +6072,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid) rte->rtekind = RTE_RELATION; rte->relid = tableOid; rte->relkind = RELKIND_RELATION; /* Don't be too picky. */ + rte->delaylock = false; rte->rellockmode = AccessShareLock; rte->lateral = false; rte->inh = false; @@ -6191,6 +6196,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid) rte->rtekind = RTE_RELATION; rte->relid = tableOid; rte->relkind = RELKIND_RELATION; /* Don't be too picky. */ + rte->delaylock = false; rte->rellockmode = AccessShareLock; rte->lateral = false; rte->inh = true; diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 0213a37670..2b1c9d5471 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -401,6 +401,20 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte) newrte->colcollations = NIL; newrte->securityQuals = NIL; + /* + * Build bitmapsets to allow us to more easily skip entries which don't + * require any initial locking during execution or permission checks. This + * is primarily an optimization aimed at table partitioning. + */ + if (!newrte->delaylock && rte->rtekind == RTE_RELATION) + glob->rtablereqlock = bms_add_member(glob->rtablereqlock, + list_length(glob->finalrtable)); + + if (newrte->requiredPerms != 0) + glob->rtablereqperm = bms_add_member(glob->rtablereqperm, + list_length(glob->finalrtable)); + + glob->finalrtable = lappend(glob->finalrtable, newrte); /* diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c index cdfdc31ca5..047a9f9b0b 100644 --- a/src/backend/optimizer/util/inherit.c +++ b/src/backend/optimizer/util/inherit.c @@ -361,6 +361,17 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, *childrte_p = childrte; childrte->relid = childOID; childrte->relkind = childrel->rd_rel->relkind; + + /* + * For leaf partitions, we've no need to obtain the lock on the relation + * during query execution until the partition is first required. This can + * drastically reduce the number of partitions we must lock when many + * partitions are run-time pruned. + */ + childrte->delaylock = (childOID != parentOID && + parentrte->relkind == RELKIND_PARTITIONED_TABLE && + childrte->relkind != RELKIND_PARTITIONED_TABLE); + /* A partitioned child will need to be expanded further. */ if (childOID != parentOID && childrte->relkind == RELKIND_PARTITIONED_TABLE) diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index b36e1b203a..15c2078bea 100644 --- a/src/backend/parser/parse_relation.c +++ b/src/backend/parser/parse_relation.c @@ -1229,6 +1229,7 @@ addRangeTableEntry(ParseState *pstate, rel = parserOpenTable(pstate, relation, lockmode); rte->relid = RelationGetRelid(rel); rte->relkind = rel->rd_rel->relkind; + rte->delaylock = false; rte->rellockmode = lockmode; /* @@ -1307,6 +1308,7 @@ addRangeTableEntryForRelation(ParseState *pstate, rte->alias = alias; rte->relid = RelationGetRelid(rel); rte->relkind = rel->rd_rel->relkind; + rte->delaylock = false; rte->rellockmode = lockmode; /* diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index a5e5007e81..3521bfe048 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -184,6 +184,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel) rte->rtekind = RTE_RELATION; rte->relid = RelationGetRelid(rel->localrel); rte->relkind = rel->localrel->rd_rel->relkind; + rte->delaylock = false; rte->rellockmode = AccessShareLock; ExecInitRangeTable(estate, list_make1(rte)); diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 7eb41ff026..3e47302833 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -1683,6 +1683,7 @@ ApplyRetrieveRule(Query *parsetree, /* Clear fields that should not be set in a subquery RTE */ rte->relid = InvalidOid; rte->relkind = 0; + rte->delaylock = false; rte->rellockmode = 0; rte->tablesample = NULL; rte->inh = false; /* must not be set for a subquery */ diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index ef04fa5009..25e34cd9ed 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -1311,6 +1311,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) pkrte->rtekind = RTE_RELATION; pkrte->relid = RelationGetRelid(pk_rel); pkrte->relkind = pk_rel->rd_rel->relkind; + pkrte->delaylock = false; pkrte->rellockmode = AccessShareLock; pkrte->requiredPerms = ACL_SELECT; @@ -1318,6 +1319,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) fkrte->rtekind = RTE_RELATION; fkrte->relid = RelationGetRelid(fk_rel); fkrte->relkind = fk_rel->rd_rel->relkind; + fkrte->delaylock = false; fkrte->rellockmode = AccessShareLock; fkrte->requiredPerms = ACL_SELECT; diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 8e0b5a7150..3dd4dfb4f6 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -1003,6 +1003,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty) oldrte->rtekind = RTE_RELATION; oldrte->relid = trigrec->tgrelid; oldrte->relkind = relkind; + oldrte->delaylock = false; oldrte->rellockmode = AccessShareLock; oldrte->alias = makeAlias("old", NIL); oldrte->eref = oldrte->alias; @@ -1014,6 +1015,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty) newrte->rtekind = RTE_RELATION; newrte->relid = trigrec->tgrelid; newrte->relkind = relkind; + newrte->delaylock = false; newrte->rellockmode = AccessShareLock; newrte->alias = makeAlias("new", NIL); newrte->eref = newrte->alias; @@ -3225,6 +3227,7 @@ deparse_context_for(const char *aliasname, Oid relid) rte->rtekind = RTE_RELATION; rte->relid = relid; rte->relkind = RELKIND_RELATION; /* no need for exactness here */ + rte->delaylock = false; rte->rellockmode = AccessShareLock; rte->alias = makeAlias(aliasname, NIL); rte->eref = rte->alias; diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 9851bd43d5..81cee7fa4b 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -1564,7 +1564,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire) foreach(lc1, stmt_list) { PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1); - ListCell *lc2; + int i; if (plannedstmt->commandType == CMD_UTILITY) { @@ -1582,18 +1582,26 @@ AcquireExecutorLocks(List *stmt_list, bool acquire) continue; } - foreach(lc2, plannedstmt->rtable) - { - RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2); + i = -1; + while ((i = bms_next_member(plannedstmt->rtablereqlock, i)) >= 0) + { + RangeTblEntry *rte = (RangeTblEntry *) list_nth(plannedstmt->rtable, i); if (rte->rtekind != RTE_RELATION) continue; /* - * Acquire the appropriate type of lock on each relation OID. Note - * that we don't actually try to open the rel, and hence will not - * fail if it's been dropped entirely --- we'll just transiently - * acquire a non-conflicting lock. + * delaylock relations will be locked only when they are going + * to be accessed for the first time. + */ + if (rte->delaylock) + continue; + + /* + * Otherwise, acquire the appropriate type of lock on the + * relation's OID. Note that we don't actually try to open the + * rel, and hence will not fail if it's been dropped entirely --- + * we'll just transiently acquire a non-conflicting lock. */ if (acquire) LockRelationOid(rte->relid, rte->rellockmode); diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index 9003f2ce58..ee844e392b 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -175,6 +175,8 @@ extern void ExecutorEnd(QueryDesc *queryDesc); extern void standard_ExecutorEnd(QueryDesc *queryDesc); extern void ExecutorRewind(QueryDesc *queryDesc); extern bool ExecCheckRTPerms(List *rangeTable, bool ereport_on_violation); +extern bool ExecCheckRTPermsFast(List *rangeTable, Bitmapset *checkEntries, + bool ereport_on_violation); extern void CheckValidResultRel(ResultRelInfo *resultRelInfo, CmdType operation); extern void InitResultRelInfo(ResultRelInfo *resultRelInfo, Relation resultRelationDesc, @@ -538,7 +540,7 @@ static inline RangeTblEntry * exec_rt_fetch(Index rti, EState *estate) { Assert(rti > 0 && rti <= estate->es_range_table_size); - return estate->es_range_table_array[rti - 1]; + return list_nth(estate->es_range_table, rti - 1); } extern Relation ExecGetRangeTableRelation(EState *estate, Index rti); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 996d872c56..2dd87ec023 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -493,8 +493,6 @@ typedef struct EState Snapshot es_snapshot; /* time qual to use */ Snapshot es_crosscheck_snapshot; /* crosscheck time qual for RI */ List *es_range_table; /* List of RangeTblEntry */ - struct RangeTblEntry **es_range_table_array; /* equivalent array */ - Index es_range_table_size; /* size of the range table arrays */ Relation *es_relations; /* Array of per-range-table-entry Relation * pointers, or NULL if not yet opened */ struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index a7e859dc90..3569c54534 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -989,6 +989,8 @@ typedef struct RangeTblEntry */ Oid relid; /* OID of the relation */ char relkind; /* relation kind (see pg_class.relkind) */ + bool delaylock; /* delay locking until executor needs to + * access this relation */ int rellockmode; /* lock level that query requires on the rel */ struct TableSampleClause *tablesample; /* sampling info, or NULL */ diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index a008ae07da..9006ec1175 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -117,6 +117,12 @@ typedef struct PlannerGlobal List *finalrtable; /* "flat" rangetable for executor */ + Bitmapset *rtablereqlock; /* Bitmap of finalrtable indexes which require + * a lock during executor startup */ + + Bitmapset *rtablereqperm; /* Bitmap of finalrtable indexes which require + * permission checks at executor startup */ + List *finalrowmarks; /* "flat" list of PlanRowMarks */ List *resultRelations; /* "flat" list of integer RT indexes */ diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 6d087c268f..76681384cd 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -65,6 +65,12 @@ typedef struct PlannedStmt List *rtable; /* list of RangeTblEntry nodes */ + Bitmapset *rtablereqlock; /* Bitmap of rtable indexes which require + * a lock during executor startup */ + + Bitmapset *rtablereqperm; /* Bitmap of rtable indexes which require + * permission checks at executor startup */ + /* rtable indexes of target relations for INSERT/UPDATE/DELETE */ List *resultRelations; /* integer list of RT indexes, or NIL */