Hi,

One of the problems which the remaining parallel query patches need to
solve is to correlate Plan or PlanState nodes across multiple
backends.  This need arises in a couple of cases.  First, if a group
of workers are all executing the same plan nodes, and es_instrument !=
0, then we'll accumulate instrumentation data in each worker, but we
need to collect all of the instrumentation data from each individual
worker and add the totals from each counter in each worker to the
master's totals, so that what finally gets reported includes the
effort spent in all workers.  And of course the instrumentation data
for any given node in the worker needs to be added to the
*corresponding node* in the master, not any random node.

Second, for nodes that are actually parallel aware, like the proposed
Partial Seq Scan node, there's bound to be some shared state.  In the
case of a Partial Seq Scan, for example, we need to keep track of
which blocks are yet to be scanned.  Amit's patch is currently silly
about this: it assumes there will be only one Partial Seq Scan in a
plan, which makes it easy to find the relevant state.  But that's not
an assumption we want to be stuck with.  Here again, if there are two
or more Partial Seq Scans in the plan passed to every worker, perhaps
with an Append node on top of them, then we need a way to match them
up across all the backends, so that each group of corresponding nodes
is looking at the same bit of shared state.

To meet these needs, what I propose to do is have
set_plan_references() assign an integer ID to every plan node that is
unique across the entire plan tree.  Then, when we ship part of the
plan tree off to a worker to be executed, the integer IDs will also
get copied (by copyfuncs.c support yet to be added, but it should be
pretty boilerplate ... I think), so that whatever plan node #42 is in
the parallel group leader will also be plan node #42 in the worker.
When I started out, I had the idea of trying to number the PlanState
nodes rather than the Plan nodes, and count on the fact the master and
the worker would traverse the query tree in the same order; since the
worker had only a subtree of the whole plan, it's numbering would be
offset from the master's, but the idea was that you could fix this by
adding an offset.  This turns out to not to work, at least not the way
I tried to do it, because ExecInitNode() visits all initPlans
everywhere in the tree before recursing over the main tree, so you
only get identical numbering everywhere if you start with the same
exact tree.  In any case, relying on two traversals in separate
processes to visit everything in the same order seems a bit fragile;
putting the ID into the plan node - which is what will actually be
passed down to the worker - seems much more robust.

It would be nice if we could assign the integer IDs with no gaps, and
with the IDs of a node's children immediately following that of their
parent.  The advantage of this is that if you want to have a data
structure for every node in the tree passed to some worker - like a
struct Instrumentation in dynamic shared memory - you can just create
an array and index it by (node ID) - (node ID of uppermost child
passed to worker), and every slot will be in use, so no memory is
wasted and no lookup table is needed.  Assigning the IDs in
set_plan_references() ... almost succeeds at this.  The removal of
SubqueryScan nodes as part of that process could create gaps in the
numbering sequence, but probably few enough that we could just ignore
the problem and waste an entry in whatever arrays we create whenever
that optimization applies.

My main concern with this design is how future-proof it is.  I know
Tom is working on ripping the planner apart and putting it back
together again to allow the upper levels of the planner to use paths,
and the discussion of the SS_finalize_plan() changes made some
reference to possibly wanting to mess with set_plan_references().  I'm
not sure if any of those changes would disturb what I'm proposing
here.

Thoughts?  Better ideas?  Concept patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 62355aa..4b4ddec 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -112,6 +112,7 @@ CopyPlanFields(const Plan *from, Plan *newnode)
 	COPY_SCALAR_FIELD(total_cost);
 	COPY_SCALAR_FIELD(plan_rows);
 	COPY_SCALAR_FIELD(plan_width);
+	COPY_SCALAR_FIELD(plan_node_id);
 	COPY_NODE_FIELD(targetlist);
 	COPY_NODE_FIELD(qual);
 	COPY_NODE_FIELD(lefttree);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e1b49d5..d7d00c0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -269,6 +269,7 @@ _outPlanInfo(StringInfo str, const Plan *node)
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_FLOAT_FIELD(plan_rows, "%.0f");
 	WRITE_INT_FIELD(plan_width);
+	WRITE_INT_FIELD(plan_node_id);
 	WRITE_NODE_FIELD(targetlist);
 	WRITE_NODE_FIELD(qual);
 	WRITE_NODE_FIELD(lefttree);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06be922..e1ee67c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -196,6 +196,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	glob->nParamExec = 0;
 	glob->lastPHId = 0;
 	glob->lastRowMarkId = 0;
+	glob->lastPlanNodeId = 0;
 	glob->transientPlan = false;
 	glob->hasRowSecurity = false;
 
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index daeb584..e3e1a1d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -174,6 +174,8 @@ static bool extract_query_dependencies_walker(Node *node,
  * Currently, relations and user-defined functions are the only types of
  * objects that are explicitly tracked this way.
  *
+ * 7. We assign every plan node in the tree a unique ID.
+ *
  * We also perform one final optimization step, which is to delete
  * SubqueryScan plan nodes that aren't doing anything useful (ie, have
  * no qual and a no-op targetlist).  The reason for doing this last is that
@@ -437,6 +439,12 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 		return NULL;
 
 	/*
+	 * Assign this node a unique ID.  We want to do this before traversing
+	 * subplans so that every Plan has an ID greater than that of its parent.
+	 */
+	plan->plan_node_id = root->glob->lastPlanNodeId++;
+
+	/*
 	 * Plan-type-specific fixes
 	 */
 	switch (nodeTag(plan))
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index cc259f1..1e2d2bb 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -111,6 +111,7 @@ typedef struct Plan
 	/*
 	 * Common structural data for all Plan types.
 	 */
+	int			plan_node_id;	/* unique across entire final plan tree */
 	List	   *targetlist;		/* target list to be computed at this node */
 	List	   *qual;			/* implicitly-ANDed qual conditions */
 	struct Plan *lefttree;		/* input plan tree(s) */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 79bed33..961b5d1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -99,6 +99,8 @@ typedef struct PlannerGlobal
 
 	Index		lastRowMarkId;	/* highest PlanRowMark ID assigned */
 
+	int			lastPlanNodeId;	/* highest plan node ID assigned */
+
 	bool		transientPlan;	/* redo plan when TransactionXmin changes? */
 
 	bool		hasRowSecurity; /* row security applied? */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to