On Mon, 4 Mar 2019 at 07:29, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Andres Freund <and...@anarazel.de> writes:
> > I still regularly see list overhead matter in production workloads. A
> > lot of it being memory allocator overhead, which is why I'm concerned
> > with a rewrite that doesn't reduce the number of memory allocations.
>
> Well, I did that in the v3 patch, and it still hasn't moved the needle
> noticeably in any test case I've tried.  At this point I'm really
> struggling to see a reason why we shouldn't just mark this patch rejected
> and move on.  If you have test cases that suggest differently, please
> show them don't just handwave.

I think we discussed this before, but... if this patch is not a win by
itself (and we've already seen it's not really causing much in the way
of regression, if any), then we need to judge it on what else we can
do to exploit the new performance characteristics of List.  For
example list_nth() is now deadly fast.

My primary interest here is getting rid of a few places where we build
an array version of some List so that we can access the Nth element
more quickly. What goes on in ExecInitRangeTable() is not particularly
great for queries to partitioned tables with a large number of
partitions where only one survives run-time pruning.  I've hacked
together a patch to show you what wins we can have with the new list
implementation.

Using the attached, (renamed to .txt to not upset CFbot) I get:

setup:

create table hashp (a int, b int) partition by hash (a);
select 'create table hashp'||x||' partition of hashp for values with
(modulus 10000, remainder '||x||');' from generate_Series(0,9999) x;
\gexec
alter table hashp add constraint hashp_pkey PRIMARY KEY (a);

postgresql.conf
plan_cache_mode = force_generic_plan
max_parallel_workers_per_gather=0
max_locks_per_transaction=256

bench.sql

\set p random(1,10000)
select * from hashp where a = :p;

master:

tps = 189.499654 (excluding connections establishing)
tps = 195.102743 (excluding connections establishing)
tps = 194.338813 (excluding connections establishing)

your List reimplementation v3 + attached

tps = 12852.003735 (excluding connections establishing)
tps = 12791.834617 (excluding connections establishing)
tps = 12691.515641 (excluding connections establishing)

The attached does include [1], but even with just that the performance
is not as good as with the arraylist plus the follow-on exploits I
added.  Now that we have a much faster bms_next_member() some form of
what in there might be okay.

A profile shows that in this workload we're still spending 42% of the
12k TPS in hash_seq_search().  That's due to LockReleaseAll() having a
hard time of it due to the bloated lock table from having to build the
generic plan with 10k partitions. [2] aims to fix that, so likely
we'll be closer to 18k TPS, or about 100x faster.

In fact, I should test that...

tps = 18763.977940 (excluding connections establishing)
tps = 18589.531558 (excluding connections establishing)
tps = 19011.295770 (excluding connections establishing)

Yip, about 100x.

I think these are worthy goals to aspire to.

[1] https://commitfest.postgresql.org/22/1897/
[2] https://commitfest.postgresql.org/22/1993/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 2048d71535..63554b3057 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1610,6 +1610,7 @@ recordDependencyOnSingleRelExpr(const ObjectAddress 
*depender,
        rte.rtekind = RTE_RELATION;
        rte.relid = relId;
        rte.relkind = RELKIND_RELATION; /* no need for exactness here */
+       rte.delaylock = false;
        rte.rellockmode = AccessShareLock;
 
        context.rtables = list_make1(list_make1(&rte));
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0208388af3..0ac564dd77 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -516,6 +516,7 @@ intorel_startup(DestReceiver *self, int operation, 
TupleDesc typeinfo)
        rte->rtekind = RTE_RELATION;
        rte->relid = intoRelationAddr.objectId;
        rte->relkind = relkind;
+       rte->delaylock = false;
        rte->rellockmode = RowExclusiveLock;
        rte->requiredPerms = ACL_INSERT;
 
diff --git a/src/backend/executor/execCurrent.c 
b/src/backend/executor/execCurrent.c
index fe99096efc..9b93c76aca 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -96,13 +96,13 @@ execCurrentOf(CurrentOfExpr *cexpr,
        {
                ExecRowMark *erm;
                Index           i;
-
+               int                     len = 
list_length(queryDesc->estate->es_range_table);
                /*
                 * Here, the query must have exactly one FOR UPDATE/SHARE 
reference to
                 * the target table, and we dig the ctid info out of that.
                 */
                erm = NULL;
-               for (i = 0; i < queryDesc->estate->es_range_table_size; i++)
+               for (i = 0; i < len; i++)
                {
                        ExecRowMark *thiserm = 
queryDesc->estate->es_rowmarks[i];
 
diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c
index a018925d4e..0cdd9ed44e 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3962,7 +3962,7 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, 
ExprContext *econtext)
                 * perhaps other places.)
                 */
                if (econtext->ecxt_estate &&
-                       variable->varno <= 
econtext->ecxt_estate->es_range_table_size)
+                       variable->varno <= 
list_length(econtext->ecxt_estate->es_range_table))
                {
                        RangeTblEntry *rte = exec_rt_fetch(variable->varno,
                                                                                
           econtext->ecxt_estate);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 61be56fe0b..c119197bc5 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -592,6 +592,40 @@ ExecCheckRTPerms(List *rangeTable, bool 
ereport_on_violation)
        return result;
 }
 
+/*
+ * ExecCheckRTPermsFast
+ *             As above, but only checks rtable entries that appear in 
checkEntries.
+ */
+bool
+ExecCheckRTPermsFast(List *rangeTable, Bitmapset *checkEntries,
+                                        bool ereport_on_violation)
+{
+       int                     i;
+       bool            result = true;
+
+       i = -1;
+       while ((i = bms_next_member(checkEntries, i)) >= 0)
+       {
+               RangeTblEntry *rte = (RangeTblEntry *) list_nth(rangeTable, i);
+
+               result = ExecCheckRTEPerms(rte);
+               if (!result)
+               {
+                       Assert(rte->rtekind == RTE_RELATION);
+                       if (ereport_on_violation)
+                               aclcheck_error(ACLCHECK_NO_PRIV, 
get_relkind_objtype(get_rel_relkind(rte->relid)),
+                                                          
get_rel_name(rte->relid));
+                       return false;
+               }
+       }
+
+       if (ExecutorCheckPerms_hook)
+               result = (*ExecutorCheckPerms_hook) (rangeTable,
+                                                                               
         ereport_on_violation);
+       return result;
+}
+
+
 /*
  * ExecCheckRTEPerms
  *             Check access permissions for a single RTE.
@@ -816,7 +850,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
        /*
         * Do permissions checks
         */
-       ExecCheckRTPerms(rangeTable, true);
+       ExecCheckRTPermsFast(rangeTable, plannedstmt->rtablereqperm, true);
 
        /*
         * initialize the node's execution state
@@ -912,7 +946,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
        if (plannedstmt->rowMarks)
        {
                estate->es_rowmarks = (ExecRowMark **)
-                       palloc0(estate->es_range_table_size * 
sizeof(ExecRowMark *));
+                       palloc0(list_length(estate->es_range_table) * 
sizeof(ExecRowMark *));
                foreach(l, plannedstmt->rowMarks)
                {
                        PlanRowMark *rc = (PlanRowMark *) lfirst(l);
@@ -964,7 +998,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
                        ItemPointerSetInvalid(&(erm->curCtid));
                        erm->ermExtra = NULL;
 
-                       Assert(erm->rti > 0 && erm->rti <= 
estate->es_range_table_size &&
+                       Assert(erm->rti > 0 && erm->rti <= 
list_length(estate->es_range_table) &&
                                   estate->es_rowmarks[erm->rti - 1] == NULL);
 
                        estate->es_rowmarks[erm->rti - 1] = erm;
@@ -1571,7 +1605,7 @@ ExecEndPlan(PlanState *planstate, EState *estate)
         * close whatever rangetable Relations have been opened.  We do not
         * release any locks we might hold on those rels.
         */
-       num_relations = estate->es_range_table_size;
+       num_relations = list_length(estate->es_range_table);
        for (i = 0; i < num_relations; i++)
        {
                if (estate->es_relations[i])
@@ -2346,7 +2380,7 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo)
 ExecRowMark *
 ExecFindRowMark(EState *estate, Index rti, bool missing_ok)
 {
-       if (rti > 0 && rti <= estate->es_range_table_size &&
+       if (rti > 0 && rti <= list_length(estate->es_range_table) &&
                estate->es_rowmarks != NULL)
        {
                ExecRowMark *erm = estate->es_rowmarks[rti - 1];
@@ -2792,7 +2826,7 @@ EvalPlanQualSlot(EPQState *epqstate,
 {
        TupleTableSlot **slot;
 
-       Assert(rti > 0 && rti <= epqstate->estate->es_range_table_size);
+       Assert(rti > 0 && rti <= list_length(epqstate->estate->es_range_table));
        slot = &epqstate->estate->es_epqTupleSlot[rti - 1];
 
        if (*slot == NULL)
@@ -2973,7 +3007,7 @@ EvalPlanQualBegin(EPQState *epqstate, EState 
*parentestate)
                /*
                 * We already have a suitable child EPQ tree, so just reset it.
                 */
-               Index           rtsize = parentestate->es_range_table_size;
+               Index           rtsize = 
list_length(parentestate->es_range_table);
                PlanState  *planstate = epqstate->planstate;
 
                MemSet(estate->es_epqScanDone, 0, rtsize * sizeof(bool));
@@ -3026,7 +3060,7 @@ EvalPlanQualStart(EPQState *epqstate, EState 
*parentestate, Plan *planTree)
        MemoryContext oldcontext;
        ListCell   *l;
 
-       rtsize = parentestate->es_range_table_size;
+       rtsize = list_length(parentestate->es_range_table);
 
        epqstate->estate = estate = CreateExecutorState();
 
@@ -3050,8 +3084,6 @@ EvalPlanQualStart(EPQState *epqstate, EState 
*parentestate, Plan *planTree)
        estate->es_snapshot = parentestate->es_snapshot;
        estate->es_crosscheck_snapshot = parentestate->es_crosscheck_snapshot;
        estate->es_range_table = parentestate->es_range_table;
-       estate->es_range_table_array = parentestate->es_range_table_array;
-       estate->es_range_table_size = parentestate->es_range_table_size;
        estate->es_relations = parentestate->es_relations;
        estate->es_rowmarks = parentestate->es_rowmarks;
        estate->es_plannedstmt = parentestate->es_plannedstmt;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ef06b74a30..dfbfa294c3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -111,8 +111,6 @@ CreateExecutorState(void)
        estate->es_snapshot = InvalidSnapshot;  /* caller must initialize this 
*/
        estate->es_crosscheck_snapshot = InvalidSnapshot;       /* no 
crosscheck */
        estate->es_range_table = NIL;
-       estate->es_range_table_array = NULL;
-       estate->es_range_table_size = 0;
        estate->es_relations = NULL;
        estate->es_rowmarks = NULL;
        estate->es_plannedstmt = NULL;
@@ -707,37 +705,21 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int 
eflags)
  * ExecInitRangeTable
  *             Set up executor's range-table-related data
  *
- * We build an array from the range table list to allow faster lookup by RTI.
- * (The es_range_table field is now somewhat redundant, but we keep it to
- * avoid breaking external code unnecessarily.)
- * This is also a convenient place to set up the parallel es_relations array.
+ * Allocate the es_relations array to store opened rels.
  */
 void
 ExecInitRangeTable(EState *estate, List *rangeTable)
 {
-       Index           rti;
-       ListCell   *lc;
-
        /* Remember the range table List as-is */
        estate->es_range_table = rangeTable;
 
-       /* Set up the equivalent array representation */
-       estate->es_range_table_size = list_length(rangeTable);
-       estate->es_range_table_array = (RangeTblEntry **)
-               palloc(estate->es_range_table_size * sizeof(RangeTblEntry *));
-       rti = 0;
-       foreach(lc, rangeTable)
-       {
-               estate->es_range_table_array[rti++] = 
lfirst_node(RangeTblEntry, lc);
-       }
-
        /*
         * Allocate an array to store an open Relation corresponding to each
         * rangetable entry, and initialize entries to NULL.  Relations are 
opened
         * and stored here as needed.
         */
        estate->es_relations = (Relation *)
-               palloc0(estate->es_range_table_size * sizeof(Relation));
+               palloc0(list_length(rangeTable) * sizeof(Relation));
 
        /*
         * es_rowmarks is also parallel to the es_range_table_array, but it's
@@ -757,7 +739,7 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 {
        Relation        rel;
 
-       Assert(rti > 0 && rti <= estate->es_range_table_size);
+       Assert(rti > 0 && rti <= list_length(estate->es_range_table));
 
        rel = estate->es_relations[rti - 1];
        if (rel == NULL)
@@ -767,14 +749,15 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 
                Assert(rte->rtekind == RTE_RELATION);
 
-               if (!IsParallelWorker())
+               if (!rte->delaylock && !IsParallelWorker())
                {
                        /*
-                        * In a normal query, we should already have the 
appropriate lock,
-                        * but verify that through an Assert.  Since there's 
already an
-                        * Assert inside table_open that insists on holding 
some lock, it
-                        * seems sufficient to check this only when rellockmode 
is higher
-                        * than the minimum.
+                        * In a normal query, unless the planner set the 
delaylock flag,
+                        * we should already have the appropriate lock, but 
verify that
+                        * through an Assert.  Since there's already an Assert 
inside
+                        * heap_open that insists on holding some lock, it seems
+                        * sufficient to check this only when rellockmode is 
higher than
+                        * the minimum.
                         */
                        rel = table_open(rte->relid, NoLock);
                        Assert(rte->rellockmode == AccessShareLock ||
@@ -783,9 +766,10 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
                else
                {
                        /*
-                        * If we are a parallel worker, we need to obtain our 
own local
-                        * lock on the relation.  This ensures sane behavior in 
case the
-                        * parent process exits before we do.
+                        * If we are a parallel worker or delaylock is set, we 
need to
+                        * obtain a lock on the relation.  For parallel 
workers, this
+                        * ensures sane behavior in case the parent process 
exits before
+                        * we do.
                         */
                        rel = table_open(rte->relid, rte->rellockmode);
                }
diff --git a/src/backend/executor/nodeAppend.c 
b/src/backend/executor/nodeAppend.c
index f3be2429db..3b9b169960 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -107,7 +107,6 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
        int                     firstvalid;
        int                     i,
                                j;
-       ListCell   *lc;
 
        /* check for unsupported flags */
        Assert(!(eflags & EXEC_FLAG_MARK));
@@ -211,24 +210,21 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
         *
         * While at it, find out the first valid partial plan.
         */
-       j = i = 0;
+       j = 0;
        firstvalid = nplans;
-       foreach(lc, node->appendplans)
+       i = -1;
+       while ((i = bms_next_member(validsubplans, i)) >= 0)
        {
-               if (bms_is_member(i, validsubplans))
-               {
-                       Plan       *initNode = (Plan *) lfirst(lc);
+               Plan       *initNode = (Plan *) list_nth(node->appendplans, i);
 
-                       /*
-                        * Record the lowest appendplans index which is a valid 
partial
-                        * plan.
-                        */
-                       if (i >= node->first_partial_plan && j < firstvalid)
-                               firstvalid = j;
+               /*
+                * Record the lowest appendplans index which is a valid partial
+                * plan.
+                */
+               if (i >= node->first_partial_plan && j < firstvalid)
+                       firstvalid = j;
 
-                       appendplanstates[j++] = ExecInitNode(initNode, estate, 
eflags);
-               }
-               i++;
+               appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
        }
 
        appendstate->as_first_partial_plan = firstvalid;
diff --git a/src/backend/executor/nodeMergeAppend.c 
b/src/backend/executor/nodeMergeAppend.c
index 7ba53ba185..18d13377dc 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -70,7 +70,6 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int 
eflags)
        int                     nplans;
        int                     i,
                                j;
-       ListCell   *lc;
 
        /* check for unsupported flags */
        Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -177,16 +176,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, 
int eflags)
         * call ExecInitNode on each of the valid plans to be executed and save
         * the results into the mergeplanstates array.
         */
-       j = i = 0;
-       foreach(lc, node->mergeplans)
+       j = 0;
+       i = -1;
+       while ((i = bms_next_member(validsubplans, i)) >= 0)
        {
-               if (bms_is_member(i, validsubplans))
-               {
-                       Plan       *initNode = (Plan *) lfirst(lc);
+               Plan       *initNode = (Plan *) list_nth(node->mergeplans, i);
 
-                       mergeplanstates[j++] = ExecInitNode(initNode, estate, 
eflags);
-               }
-               i++;
+               mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
        }
 
        mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index eba5866b1a..8462f58669 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -90,6 +90,8 @@ _copyPlannedStmt(const PlannedStmt *from)
        COPY_SCALAR_FIELD(jitFlags);
        COPY_NODE_FIELD(planTree);
        COPY_NODE_FIELD(rtable);
+       COPY_BITMAPSET_FIELD(rtablereqlock);
+       COPY_BITMAPSET_FIELD(rtablereqperm);
        COPY_NODE_FIELD(resultRelations);
        COPY_NODE_FIELD(rootResultRelations);
        COPY_NODE_FIELD(subplans);
@@ -2353,6 +2355,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
        COPY_SCALAR_FIELD(rtekind);
        COPY_SCALAR_FIELD(relid);
        COPY_SCALAR_FIELD(relkind);
+       COPY_SCALAR_FIELD(delaylock);
        COPY_SCALAR_FIELD(rellockmode);
        COPY_NODE_FIELD(tablesample);
        COPY_NODE_FIELD(subquery);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 31499eb798..3140e6834b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2630,6 +2630,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const 
RangeTblEntry *b)
        COMPARE_SCALAR_FIELD(rtekind);
        COMPARE_SCALAR_FIELD(relid);
        COMPARE_SCALAR_FIELD(relkind);
+       COMPARE_SCALAR_FIELD(delaylock);
        COMPARE_SCALAR_FIELD(rellockmode);
        COMPARE_NODE_FIELD(tablesample);
        COMPARE_NODE_FIELD(subquery);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9f5edf487e..341560d5c8 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -308,6 +308,8 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
        WRITE_INT_FIELD(jitFlags);
        WRITE_NODE_FIELD(planTree);
        WRITE_NODE_FIELD(rtable);
+       WRITE_BITMAPSET_FIELD(rtablereqlock);
+       WRITE_BITMAPSET_FIELD(rtablereqperm);
        WRITE_NODE_FIELD(resultRelations);
        WRITE_NODE_FIELD(rootResultRelations);
        WRITE_NODE_FIELD(subplans);
@@ -3030,6 +3032,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry 
*node)
                case RTE_RELATION:
                        WRITE_OID_FIELD(relid);
                        WRITE_CHAR_FIELD(relkind);
+                       WRITE_BOOL_FIELD(delaylock);
                        WRITE_INT_FIELD(rellockmode);
                        WRITE_NODE_FIELD(tablesample);
                        break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 5aa42242a9..11dcc08564 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1363,6 +1363,7 @@ _readRangeTblEntry(void)
                case RTE_RELATION:
                        READ_OID_FIELD(relid);
                        READ_CHAR_FIELD(relkind);
+                       READ_BOOL_FIELD(delaylock);
                        READ_INT_FIELD(rellockmode);
                        READ_NODE_FIELD(tablesample);
                        break;
@@ -1508,6 +1509,8 @@ _readPlannedStmt(void)
        READ_INT_FIELD(jitFlags);
        READ_NODE_FIELD(planTree);
        READ_NODE_FIELD(rtable);
+       READ_BITMAPSET_FIELD(rtablereqlock);
+       READ_BITMAPSET_FIELD(rtablereqperm);
        READ_NODE_FIELD(resultRelations);
        READ_NODE_FIELD(rootResultRelations);
        READ_NODE_FIELD(subplans);
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index f5b5fc5f0c..e31334e365 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -303,6 +303,8 @@ standard_planner(Query *parse, int cursorOptions, 
ParamListInfo boundParams)
        glob->subroots = NIL;
        glob->rewindPlanIDs = NULL;
        glob->finalrtable = NIL;
+       glob->rtablereqlock = NULL;
+       glob->rtablereqperm = NULL;
        glob->finalrowmarks = NIL;
        glob->resultRelations = NIL;
        glob->rootResultRelations = NIL;
@@ -529,6 +531,8 @@ standard_planner(Query *parse, int cursorOptions, 
ParamListInfo boundParams)
        result->parallelModeNeeded = glob->parallelModeNeeded;
        result->planTree = top_plan;
        result->rtable = glob->finalrtable;
+       result->rtablereqlock = glob->rtablereqlock;
+       result->rtablereqperm = glob->rtablereqperm;
        result->resultRelations = glob->resultRelations;
        result->rootResultRelations = glob->rootResultRelations;
        result->subplans = glob->subplans;
@@ -6068,6 +6072,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
        rte->rtekind = RTE_RELATION;
        rte->relid = tableOid;
        rte->relkind = RELKIND_RELATION;        /* Don't be too picky. */
+       rte->delaylock = false;
        rte->rellockmode = AccessShareLock;
        rte->lateral = false;
        rte->inh = false;
@@ -6191,6 +6196,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
        rte->rtekind = RTE_RELATION;
        rte->relid = tableOid;
        rte->relkind = RELKIND_RELATION;        /* Don't be too picky. */
+       rte->delaylock = false;
        rte->rellockmode = AccessShareLock;
        rte->lateral = false;
        rte->inh = true;
diff --git a/src/backend/optimizer/plan/setrefs.c 
b/src/backend/optimizer/plan/setrefs.c
index 0213a37670..2b1c9d5471 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -401,6 +401,20 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry 
*rte)
        newrte->colcollations = NIL;
        newrte->securityQuals = NIL;
 
+       /*
+        * Build bitmapsets to allow us to more easily skip entries which don't
+        * require any initial locking during execution or permission checks. 
This
+        * is primarily an optimization aimed at table partitioning.
+        */
+       if (!newrte->delaylock && rte->rtekind == RTE_RELATION)
+               glob->rtablereqlock = bms_add_member(glob->rtablereqlock,
+                                                                               
         list_length(glob->finalrtable));
+
+       if (newrte->requiredPerms != 0)
+               glob->rtablereqperm = bms_add_member(glob->rtablereqperm,
+                       list_length(glob->finalrtable));
+
+
        glob->finalrtable = lappend(glob->finalrtable, newrte);
 
        /*
diff --git a/src/backend/optimizer/util/inherit.c 
b/src/backend/optimizer/util/inherit.c
index cdfdc31ca5..047a9f9b0b 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -361,6 +361,17 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
        *childrte_p = childrte;
        childrte->relid = childOID;
        childrte->relkind = childrel->rd_rel->relkind;
+
+       /*
+        * For leaf partitions, we've no need to obtain the lock on the relation
+        * during query execution until the partition is first required.  This 
can
+        * drastically reduce the number of partitions we must lock when many
+        * partitions are run-time pruned.
+        */
+       childrte->delaylock = (childOID != parentOID &&
+               parentrte->relkind == RELKIND_PARTITIONED_TABLE &&
+               childrte->relkind != RELKIND_PARTITIONED_TABLE);
+
        /* A partitioned child will need to be expanded further. */
        if (childOID != parentOID &&
                childrte->relkind == RELKIND_PARTITIONED_TABLE)
diff --git a/src/backend/parser/parse_relation.c 
b/src/backend/parser/parse_relation.c
index b36e1b203a..15c2078bea 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1229,6 +1229,7 @@ addRangeTableEntry(ParseState *pstate,
        rel = parserOpenTable(pstate, relation, lockmode);
        rte->relid = RelationGetRelid(rel);
        rte->relkind = rel->rd_rel->relkind;
+       rte->delaylock = false;
        rte->rellockmode = lockmode;
 
        /*
@@ -1307,6 +1308,7 @@ addRangeTableEntryForRelation(ParseState *pstate,
        rte->alias = alias;
        rte->relid = RelationGetRelid(rel);
        rte->relkind = rel->rd_rel->relkind;
+       rte->delaylock = false;
        rte->rellockmode = lockmode;
 
        /*
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index a5e5007e81..3521bfe048 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -184,6 +184,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
        rte->rtekind = RTE_RELATION;
        rte->relid = RelationGetRelid(rel->localrel);
        rte->relkind = rel->localrel->rd_rel->relkind;
+       rte->delaylock = false;
        rte->rellockmode = AccessShareLock;
        ExecInitRangeTable(estate, list_make1(rte));
 
diff --git a/src/backend/rewrite/rewriteHandler.c 
b/src/backend/rewrite/rewriteHandler.c
index 7eb41ff026..3e47302833 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1683,6 +1683,7 @@ ApplyRetrieveRule(Query *parsetree,
        /* Clear fields that should not be set in a subquery RTE */
        rte->relid = InvalidOid;
        rte->relkind = 0;
+       rte->delaylock = false;
        rte->rellockmode = 0;
        rte->tablesample = NULL;
        rte->inh = false;                       /* must not be set for a 
subquery */
diff --git a/src/backend/utils/adt/ri_triggers.c 
b/src/backend/utils/adt/ri_triggers.c
index ef04fa5009..25e34cd9ed 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1311,6 +1311,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, 
Relation pk_rel)
        pkrte->rtekind = RTE_RELATION;
        pkrte->relid = RelationGetRelid(pk_rel);
        pkrte->relkind = pk_rel->rd_rel->relkind;
+       pkrte->delaylock = false;
        pkrte->rellockmode = AccessShareLock;
        pkrte->requiredPerms = ACL_SELECT;
 
@@ -1318,6 +1319,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, 
Relation pk_rel)
        fkrte->rtekind = RTE_RELATION;
        fkrte->relid = RelationGetRelid(fk_rel);
        fkrte->relkind = fk_rel->rd_rel->relkind;
+       fkrte->delaylock = false;
        fkrte->rellockmode = AccessShareLock;
        fkrte->requiredPerms = ACL_SELECT;
 
diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 8e0b5a7150..3dd4dfb4f6 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1003,6 +1003,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
                oldrte->rtekind = RTE_RELATION;
                oldrte->relid = trigrec->tgrelid;
                oldrte->relkind = relkind;
+               oldrte->delaylock = false;
                oldrte->rellockmode = AccessShareLock;
                oldrte->alias = makeAlias("old", NIL);
                oldrte->eref = oldrte->alias;
@@ -1014,6 +1015,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
                newrte->rtekind = RTE_RELATION;
                newrte->relid = trigrec->tgrelid;
                newrte->relkind = relkind;
+               newrte->delaylock = false;
                newrte->rellockmode = AccessShareLock;
                newrte->alias = makeAlias("new", NIL);
                newrte->eref = newrte->alias;
@@ -3225,6 +3227,7 @@ deparse_context_for(const char *aliasname, Oid relid)
        rte->rtekind = RTE_RELATION;
        rte->relid = relid;
        rte->relkind = RELKIND_RELATION;        /* no need for exactness here */
+       rte->delaylock = false;
        rte->rellockmode = AccessShareLock;
        rte->alias = makeAlias(aliasname, NIL);
        rte->eref = rte->alias;
diff --git a/src/backend/utils/cache/plancache.c 
b/src/backend/utils/cache/plancache.c
index 9851bd43d5..81cee7fa4b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1564,7 +1564,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
        foreach(lc1, stmt_list)
        {
                PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
-               ListCell   *lc2;
+               int                     i;
 
                if (plannedstmt->commandType == CMD_UTILITY)
                {
@@ -1582,18 +1582,26 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
                        continue;
                }
 
-               foreach(lc2, plannedstmt->rtable)
-               {
-                       RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+               i = -1;
+               while ((i = bms_next_member(plannedstmt->rtablereqlock, i)) >= 
0)
+               {
+                       RangeTblEntry *rte = (RangeTblEntry *) 
list_nth(plannedstmt->rtable, i);
 
                        if (rte->rtekind != RTE_RELATION)
                                continue;
 
                        /*
-                        * Acquire the appropriate type of lock on each 
relation OID. Note
-                        * that we don't actually try to open the rel, and 
hence will not
-                        * fail if it's been dropped entirely --- we'll just 
transiently
-                        * acquire a non-conflicting lock.
+                        * delaylock relations will be locked only when they 
are going
+                        * to be accessed for the first time.
+                        */
+                       if (rte->delaylock)
+                               continue;
+
+                       /*
+                        * Otherwise, acquire the appropriate type of lock on 
the
+                        * relation's OID.  Note that we don't actually try to 
open the
+                        * rel, and hence will not fail if it's been dropped 
entirely ---
+                        * we'll just transiently acquire a non-conflicting 
lock.
                         */
                        if (acquire)
                                LockRelationOid(rte->relid, rte->rellockmode);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9003f2ce58..ee844e392b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -175,6 +175,8 @@ extern void ExecutorEnd(QueryDesc *queryDesc);
 extern void standard_ExecutorEnd(QueryDesc *queryDesc);
 extern void ExecutorRewind(QueryDesc *queryDesc);
 extern bool ExecCheckRTPerms(List *rangeTable, bool ereport_on_violation);
+extern bool ExecCheckRTPermsFast(List *rangeTable, Bitmapset *checkEntries,
+                                        bool ereport_on_violation);
 extern void CheckValidResultRel(ResultRelInfo *resultRelInfo, CmdType 
operation);
 extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
                                  Relation resultRelationDesc,
@@ -538,7 +540,7 @@ static inline RangeTblEntry *
 exec_rt_fetch(Index rti, EState *estate)
 {
        Assert(rti > 0 && rti <= estate->es_range_table_size);
-       return estate->es_range_table_array[rti - 1];
+       return list_nth(estate->es_range_table, rti - 1);
 }
 
 extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 996d872c56..2dd87ec023 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -493,8 +493,6 @@ typedef struct EState
        Snapshot        es_snapshot;    /* time qual to use */
        Snapshot        es_crosscheck_snapshot; /* crosscheck time qual for RI 
*/
        List       *es_range_table; /* List of RangeTblEntry */
-       struct RangeTblEntry **es_range_table_array;    /* equivalent array */
-       Index           es_range_table_size;    /* size of the range table 
arrays */
        Relation   *es_relations;       /* Array of per-range-table-entry 
Relation
                                                                 * pointers, or 
NULL if not yet opened */
        struct ExecRowMark **es_rowmarks;       /* Array of 
per-range-table-entry
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a7e859dc90..3569c54534 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -989,6 +989,8 @@ typedef struct RangeTblEntry
         */
        Oid                     relid;                  /* OID of the relation 
*/
        char            relkind;                /* relation kind (see 
pg_class.relkind) */
+       bool            delaylock;              /* delay locking until executor 
needs to
+                                                                * access this 
relation */
        int                     rellockmode;    /* lock level that query 
requires on the rel */
        struct TableSampleClause *tablesample;  /* sampling info, or NULL */
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a008ae07da..9006ec1175 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -117,6 +117,12 @@ typedef struct PlannerGlobal
 
        List       *finalrtable;        /* "flat" rangetable for executor */
 
+       Bitmapset  *rtablereqlock;      /* Bitmap of finalrtable indexes which 
require
+                                                                * a lock 
during executor startup */
+
+       Bitmapset  *rtablereqperm;      /* Bitmap of finalrtable indexes which 
require
+                                                                * permission 
checks at executor startup */
+
        List       *finalrowmarks;      /* "flat" list of PlanRowMarks */
 
        List       *resultRelations;    /* "flat" list of integer RT indexes */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 6d087c268f..76681384cd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -65,6 +65,12 @@ typedef struct PlannedStmt
 
        List       *rtable;                     /* list of RangeTblEntry nodes 
*/
 
+       Bitmapset  *rtablereqlock;      /* Bitmap of rtable indexes which 
require
+                                                                * a lock 
during executor startup */
+
+       Bitmapset  *rtablereqperm;      /* Bitmap of rtable indexes which 
require
+                                                                * permission 
checks at executor startup */
+
        /* rtable indexes of target relations for INSERT/UPDATE/DELETE */
        List       *resultRelations;    /* integer list of RT indexes, or NIL */
 

Reply via email to