On 2018/06/19 2:05, Tom Lane wrote:
> Amit Langote <[email protected]> writes:
>> [ 0001-Open-partitioned-tables-during-Append-initialization.patch ]
>
> I took a look at this. While I'm in agreement with the general idea of
> holding open the partitioned relations' relcache entries throughout the
> query, I do not like anything about this patch:
Thanks for taking a look at it and sorry about the delay in replying.
> * It seems to be outright broken for the case that any of the partitioned
> relations are listed in nonleafResultRelations. If we're going to do it
> like this, we have to open every one of the partrels regardless of that.
Yes, that was indeed wrong.
> (I wonder whether we couldn't lose PlannedStmt.nonleafResultRelations
> altogether, in favor of merging the related code in InitPlan with this.
Hmm, PlannedStmt.nonleafResultRelations exists for the same reason as why
PlannedStmt.resultRelations does, that is,
/*
* initialize result relation stuff, and open/lock the result rels.
*
* We must do this before initializing the plan tree, else we might try to
* do a lock upgrade if a result rel is also a source rel.
*/
nonleafResultRelations contains members of partitioned_rels lists of all
ModifyTable nodes contained in a plan.
> That existing code is already a mighty ugly wart, and this patch makes
> it worse by adding new, related warts elsewhere.)
I just realized that there is a thinko in the following piece of code in
ExecLockNonLeafAppendTables
/* If this is a result relation, already locked in InitPlan */
foreach(l, stmt->nonleafResultRelations)
{
if (rti == lfirst_int(l))
{
is_result_rel = true;
break;
}
}
It should actually be:
/* If this is a result relation, already locked in InitPlan */
foreach(l, stmt->nonleafResultRelations)
{
Index nonleaf_rti = lfirst_int(l);
Oid nonleaf_relid = getrelid(nonleaf_rti,
estate->es_range_table);
if (relid == nonleaf_relid)
{
is_result_rel = true;
break;
}
}
RT indexes in, say, Append.partitioned_rels, are distinct from those in
PlannedStmt.nonleafResultRelations, so the existing test never succeeds,
as also evident from the coverage report:
https://coverage.postgresql.org/src/backend/executor/execUtils.c.gcov.html#864
I'm wondering if we couldn't just get rid of this code. If an input
partitioned tables is indeed also a result relation, then we would've
locked it in InitPlan with RowExclusiveLock and heap_opening it with a
weaker lock (RowShare/AccessShare) wouldn't hurt.
> * You've got *far* too much intimate knowledge of the possible callers
> in ExecOpenAppendPartitionedTables.
>
> Personally, what I would have this function do is return a List of
> the opened Relation pointers, and add a matching function to run through
> such a List and close the entries again, and make the callers responsible
> for stashing the List pointer in an appropriate field in their planstate.
OK, I rewrote it to work that way.
> Or maybe what we should do is drop ExecLockNonLeafAppendTables/
> ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.
Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
have to have all the partitioned tables (contained in partitioned_rels
fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
in a global list like rootResultRelations and nonleafResultRelations of
PlannedStmt.
Attached updated patch.
Thanks,
Amit
From 0fd93b0b2108d4d12f483b96aab2c25c120173cb Mon Sep 17 00:00:00 2001
From: amit <[email protected]>
Date: Thu, 14 Jun 2018 14:50:22 +0900
Subject: [PATCH v2] Fix opening/closing of partitioned tables in Append plans
---
src/backend/executor/execPartition.c | 49 ++++----------
src/backend/executor/execUtils.c | 114 +++++++++++++++++----------------
src/backend/executor/nodeAppend.c | 26 ++++----
src/backend/executor/nodeMergeAppend.c | 11 ++--
src/include/executor/execPartition.h | 4 +-
src/include/executor/executor.h | 4 +-
src/include/nodes/execnodes.h | 4 ++
7 files changed, 100 insertions(+), 112 deletions(-)
diff --git a/src/backend/executor/execPartition.c
b/src/backend/executor/execPartition.c
index 7a4665cc4e..b9bc157bfa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1399,13 +1399,19 @@ adjust_partition_tlist(List *tlist, TupleConversionMap
*map)
* PartitionPruningData for each item in that List. This data can be re-used
* each time we re-evaluate which partitions match the pruning steps provided
* in each PartitionPruneInfo.
+ *
+ * 'partitioned_rels' is a List of same number of elements as there are in
+ * 'partitionpruneinfo' containing the Relation pointers of corresponding
+ * partitioned tables.
*/
PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+ List
*partitioned_rels)
{
PartitionPruneState *prunestate;
PartitionPruningData *prunedata;
- ListCell *lc;
+ ListCell *lc1,
+ *lc2;
int i;
Assert(partitionpruneinfo != NIL);
@@ -1435,9 +1441,10 @@ ExecCreatePartitionPruneState(PlanState *planstate, List
*partitionpruneinfo)
ALLOCSET_DEFAULT_SIZES);
i = 0;
- foreach(lc, partitionpruneinfo)
+ forboth(lc1, partitionpruneinfo, lc2, partitioned_rels)
{
- PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo,
lfirst(lc));
+ PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo,
lfirst(lc1));
+ Relation rel = lfirst(lc2);
PartitionPruningData *pprune = &prunedata[i];
PartitionPruneContext *context = &pprune->context;
PartitionDesc partdesc;
@@ -1460,17 +1467,9 @@ ExecCreatePartitionPruneState(PlanState *planstate, List
*partitionpruneinfo)
/* present_parts is also subject to later modification */
pprune->present_parts = bms_copy(pinfo->present_parts);
- /*
- * We need to hold a pin on the partitioned table's relcache
entry so
- * that we can rely on its copies of the table's partition key
and
- * partition descriptor. We need not get a lock though; one
should
- * have been acquired already by InitPlan or
- * ExecLockNonLeafAppendTables.
- */
- context->partrel = relation_open(pinfo->reloid, NoLock);
-
- partkey = RelationGetPartitionKey(context->partrel);
- partdesc = RelationGetPartitionDesc(context->partrel);
+ Assert(RelationGetRelid(rel) == pinfo->reloid);
+ partkey = RelationGetPartitionKey(rel);
+ partdesc = RelationGetPartitionDesc(rel);
n_steps = list_length(pinfo->pruning_steps);
context->strategy = partkey->strategy;
@@ -1546,26 +1545,6 @@ ExecCreatePartitionPruneState(PlanState *planstate, List
*partitionpruneinfo)
}
/*
- * ExecDestroyPartitionPruneState
- * Release resources at plan shutdown.
- *
- * We don't bother to free any memory here, since the whole executor context
- * will be going away shortly. We do need to release our relcache pins.
- */
-void
-ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
-{
- int i;
-
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- PartitionPruningData *pprune = &prunestate->partprunedata[i];
-
- relation_close(pprune->context.partrel, NoLock);
- }
-}
-
-/*
* ExecFindInitialMatchingSubPlans
* Identify the set of subplans that cannot be eliminated by
initial
* pruning (disregarding any pruning constraints involving
PARAM_EXEC
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..f936f8c3da 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -704,6 +704,65 @@ ExecCloseScanRelation(Relation scanrel)
}
/*
+ * ExecOpenAppendPartitionedTables
+ *
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list of a given Append or MergeAppend
+ * node. These are the non-leaf tables in the partition tree controlled by
+ * the node.
+ */
+List *
+ExecOpenAppendPartitionedTables(List *partitioned_rels, EState *estate)
+{
+ PlannedStmt *stmt = estate->es_plannedstmt;
+ ListCell *lc;
+ List *result = NIL;
+
+ Assert(partitioned_rels != NIL);
+
+ foreach(lc, partitioned_rels)
+ {
+ ListCell *l;
+ Index rti = lfirst_int(lc);
+ Oid relid = getrelid(rti,
estate->es_range_table);
+ int lockmode = AccessShareLock;
+ PlanRowMark *rc = NULL;
+
+ /* Check if there is a RowMark that requires taking a
RowShareLock. */
+
+ foreach(l, stmt->rowMarks)
+ {
+ if (((PlanRowMark *) lfirst(l))->rti == rti)
+ {
+ rc = lfirst(l);
+ break;
+ }
+ }
+
+ if (rc && RowMarkRequiresRowShareLock(rc->markType))
+ lockmode = RowShareLock;
+
+ result = lappend(result, heap_open(relid, lockmode));
+ }
+
+ return result;
+}
+
+/* Close each Relation in the input list. */
+void
+ExecCloseAppendPartitionedTables(List *partitioned_rels)
+{
+ ListCell *lc;
+
+ /*
+ * Close partitioned rels that we may have opened for partition
+ * pruning.
+ */
+ foreach(lc, partitioned_rels)
+ heap_close(lfirst(lc), NoLock);
+}
+
+/*
* UpdateChangedParamSet
* Add changed parameters to a plan node's chgParam set
*/
@@ -856,61 +915,6 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
}
/*
- * ExecLockNonLeafAppendTables
- *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list. These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
- */
-void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
-{
- PlannedStmt *stmt = estate->es_plannedstmt;
- ListCell *lc;
-
- foreach(lc, partitioned_rels)
- {
- ListCell *l;
- Index rti = lfirst_int(lc);
- bool is_result_rel = false;
- Oid relid = getrelid(rti,
estate->es_range_table);
-
- /* If this is a result relation, already locked in InitPlan */
- foreach(l, stmt->nonleafResultRelations)
- {
- if (rti == lfirst_int(l))
- {
- is_result_rel = true;
- break;
- }
- }
-
- /*
- * Not a result relation; check if there is a RowMark that
requires
- * taking a RowShareLock on this rel.
- */
- if (!is_result_rel)
- {
- PlanRowMark *rc = NULL;
-
- foreach(l, stmt->rowMarks)
- {
- if (((PlanRowMark *) lfirst(l))->rti == rti)
- {
- rc = lfirst(l);
- break;
- }
- }
-
- if (rc && RowMarkRequiresRowShareLock(rc->markType))
- LockRelationOid(relid, RowShareLock);
- else
- LockRelationOid(relid, AccessShareLock);
- }
- }
-}
-
-/*
* GetAttributeByName
* GetAttributeByNum
*
diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..77285af194 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -113,18 +113,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
Assert(!(eflags & EXEC_FLAG_MARK));
/*
- * Lock the non-leaf tables in the partition tree controlled by this
node.
- * It's a no-op for non-partitioned parent tables.
- */
- ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
-
- /*
* create new AppendState for our append node
*/
appendstate->ps.plan = (Plan *) node;
appendstate->ps.state = estate;
appendstate->ps.ExecProcNode = ExecAppend;
+ /* Lock and open (if needed) the partitioned tables. */
+ if (node->partitioned_rels != NIL)
+ appendstate->as_partitioned_rels =
+ ExecOpenAppendPartitionedTables(node->partitioned_rels,
estate);
+
/* Let choose_next_subplan_* function handle setting the first subplan
*/
appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
@@ -137,8 +136,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
ExecAssignExprContext(estate, &appendstate->ps);
/* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-
node->part_prune_infos);
+
+ prunestate =
+ ExecCreatePartitionPruneState(&appendstate->ps,
+
node->part_prune_infos,
+
appendstate->as_partitioned_rels);
appendstate->as_prune_state = prunestate;
/* Perform an initial partition prune, if required. */
@@ -325,17 +327,13 @@ ExecEndAppend(AppendState *node)
appendplans = node->appendplans;
nplans = node->as_nplans;
+ ExecCloseAppendPartitionedTables(node->as_partitioned_rels);
+
/*
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
ExecEndNode(appendplans[i]);
-
- /*
- * release any resources associated with run-time pruning
- */
- if (node->as_prune_state)
- ExecDestroyPartitionPruneState(node->as_prune_state);
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c
b/src/backend/executor/nodeMergeAppend.c
index 118f4ef07d..8de956b129 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -72,11 +72,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int
eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
- /*
- * Lock the non-leaf tables in the partition tree controlled by this
node.
- * It's a no-op for non-partitioned parent tables.
- */
- ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+ /* Lock and open (if needed) the partitioned tables. */
+ if (node->partitioned_rels != NIL)
+ mergestate->ms_partitioned_rels =
+ ExecOpenAppendPartitionedTables(node->partitioned_rels,
estate);
/*
* Set up empty vector of subplan states
@@ -283,6 +282,8 @@ ExecEndMergeAppend(MergeAppendState *node)
mergeplans = node->mergeplans;
nplans = node->ms_nplans;
+ ExecCloseAppendPartitionedTables(node->ms_partitioned_rels);
+
/*
* shut down each of the subscans
*/
diff --git a/src/include/executor/execPartition.h
b/src/include/executor/execPartition.h
index 862bf65060..7b0744a2c9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -209,8 +209,8 @@ extern HeapTuple
ConvertPartitionTupleSlot(TupleConversionMap *map,
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- List
*partitionpruneinfo);
-extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
+ List
*partitionpruneinfo,
+ List
*partitioned_rels);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState
*prunestate,
int nsubplans);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f82b51667f..301e4b4059 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -524,7 +524,9 @@ extern void UnregisterExprContextCallback(ExprContext
*econtext,
ExprContextCallbackFunction function,
Datum arg);
-extern void ExecLockNonLeafAppendTables(List *partitioned_rels, EState
*estate);
+extern List *ExecOpenAppendPartitionedTables(List *partitioned_rels,
+ EState *estate);
+extern void ExecCloseAppendPartitionedTables(List *partitioned_rels);
extern Datum GetAttributeByName(HeapTupleHeader tuple, const char *attname,
bool *isNull);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da7f52cab0..55996b80d3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1073,6 +1073,7 @@ typedef struct ModifyTableState
* eliminated from the
scan, or NULL if not possible.
* valid_subplans for runtime pruning, valid appendplans
indexes to
* scan.
+ * partitioned_rels Relation pointers of partitioned tables
* ----------------
*/
@@ -1095,6 +1096,7 @@ struct AppendState
struct PartitionPruneState *as_prune_state;
Bitmapset *as_valid_subplans;
bool (*choose_next_subplan) (AppendState *);
+ List *as_partitioned_rels; /* List of Relation pointers */
};
/* ----------------
@@ -1106,6 +1108,7 @@ struct AppendState
* slots current output tuple of each subplan
* heap heap of active tuples
* initialized true if we have fetched first tuple
from each subplan
+ * partitioned_rels Relation pointers of partitioned tables
* ----------------
*/
typedef struct MergeAppendState
@@ -1118,6 +1121,7 @@ typedef struct MergeAppendState
TupleTableSlot **ms_slots; /* array of length ms_nplans */
struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
+ List *ms_partitioned_rels; /* List of Relation pointers */
} MergeAppendState;
/* ----------------
--
2.11.0