Re: [HACKERS] parallel joins, and better parallel explain

Robert Haas Fri, 18 Dec 2015 07:18:39 -0800

On Fri, Dec 18, 2015 at 3:54 AM, Dilip Kumar <[email protected]> wrote:
> On Fri, Dec 18, 2015 at 7.59 AM Robert Haas <[email protected]> Wrote,
>> Uh oh.  That's not supposed to happen.  A GatherPath is supposed to
>> have parallel_safe = false, which should prevent the planner from
>> using it to form new partial paths.  Is this with the latest version
>> of the patch?  The plan output suggests that we're somehow reaching
>> try_partial_hashjoin_path() with inner_path being a GatherPath, but I
>> don't immediately see how that's possible, because
>> create_gather_path() sets parallel_safe to false unconditionally, and
>> hash_inner_and_outer() never sets cheapest_safe_inner to a path unless
>> that path is parallel_safe.
>
> Yes, you are right, that create_gather_path() sets parallel_safe to false
> unconditionally but whenever we are building a non partial path, that time
> we should carry forward the parallel_safe state to its parent, and it seems
> like that part is missing here..


Ah, right.  Woops.  I can't exactly replicate your results, but I've
attempted to fix this in a systematic way in the new version attached
here (parallel-join-v3.patch).

>> Do you have a self-contained test case that reproduces this, or any
>> insight as to how it's happening here?
>
> This is TPC-H benchmark case:
> we can setup like this..
> 1. git clone https://[email protected]/tkejser/tpch-dbgen.git
> 2. complie using make
> 3. ./dbgen –v –s 5
> 4. ./qgen

Thanks.  After a bit of fiddling I was able to get this to work.  I'm
attaching two other patches that seem to help this case quite
considerably.  The first (parallel-reader-order-v1) cause Gather to
read from the same worker repeatedly until it can't get another tuple
from that worker without blocking, and only then move on to the next
worker.  With 4 workers, this seems to be drastically more efficient
than what's currently in master - I saw the time for Q5 drop from over
17 seconds to about 6 (this was an assert-enabled build running with
EXPLAIN ANALYZE, though, so take those numbers with a grain of salt).
The second (gather-disuse-physical-tlist.patch) causes Gather to force
underlying scan nodes to project, which is a good idea here for
reasons very similar to why it's a good idea for the existing node
types that use disuse_physical_tlist: forcing extra data through the
Gather node is bad.  That shaved another half second off this query.

The exact query I was using for testing was:

explain (analyze, verbose) select n_name, sum(l_extendedprice * (1 -
l_discount)) as revenue from customer, orders, lineitem, supplier,
nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey
and l_suppkey = s_suppkey and c_nationkey = s_nationkey and
s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name =
'EUROPE' and o_orderdate >= date '1995-01-01' and o_orderdate < date
'1995-01-01' + interval '1' year group by n_name order by revenue
desc;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From 088a96363231b441fe6aab744b04a522d01fbc17 Mon Sep 17 00:00:00 2001
From: Robert Haas <[email protected]>
Date: Thu, 19 Nov 2015 20:28:34 -0500
Subject: [PATCH 1/3] Gather pushdown for child tables, hash joins, nested
 loops.

Cost model fix for parallel seq scan.

Fixed this so it propagates the parallel_safe flag up the plan tree.
---
 src/backend/executor/execParallel.c     |  66 +++---
 src/backend/nodes/outfuncs.c            |   4 +-
 src/backend/optimizer/README            |  55 ++++-
 src/backend/optimizer/path/allpaths.c   | 164 +++++++++++----
 src/backend/optimizer/path/costsize.c   |  32 +--
 src/backend/optimizer/path/joinpath.c   | 253 +++++++++++++++++++++-
 src/backend/optimizer/path/joinrels.c   |   3 +-
 src/backend/optimizer/plan/createplan.c |   2 +-
 src/backend/optimizer/plan/planmain.c   |   3 +-
 src/backend/optimizer/util/pathnode.c   | 361 +++++++++++++++++++++++++++++---
 src/backend/optimizer/util/relnode.c    |   2 +
 src/include/nodes/relation.h            |   4 +-
 src/include/optimizer/cost.h            |   2 +-
 src/include/optimizer/pathnode.h        |  12 +-
 src/include/optimizer/paths.h           |   2 +
 15 files changed, 845 insertions(+), 120 deletions(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 30e6b3d..5bc8eef 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -167,25 +167,25 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 	e->nnodes++;
 
 	/* Call estimators for parallel-aware nodes. */
-	switch (nodeTag(planstate))
+	if (planstate->plan->parallel_aware)
 	{
-		case T_SeqScanState:
-			ExecSeqScanEstimate((SeqScanState *) planstate,
-								e->pcxt);
-			break;
-		default:
-			break;
+		switch (nodeTag(planstate))
+		{
+			case T_SeqScanState:
+				ExecSeqScanEstimate((SeqScanState *) planstate,
+									e->pcxt);
+				break;
+			default:
+				break;
+		}
 	}
 
 	return planstate_tree_walker(planstate, ExecParallelEstimate, e);
 }
 
 /*
- * Ordinary plan nodes won't do anything here, but parallel-aware plan nodes
- * may need to initialize shared state in the DSM before parallel workers
- * are available.  They can allocate the space they previous estimated using
- * shm_toc_allocate, and add the keys they previously estimated using
- * shm_toc_insert, in each case targeting pcxt->toc.
+ * Initialize the dynamic shared memory segment that will be used to control
+ * parallel execution.
  */
 static bool
 ExecParallelInitializeDSM(PlanState *planstate,
@@ -202,15 +202,26 @@ ExecParallelInitializeDSM(PlanState *planstate,
 	/* Count this node. */
 	d->nnodes++;
 
-	/* Call initializers for parallel-aware plan nodes. */
-	switch (nodeTag(planstate))
+	/*
+	 * Call initializers for parallel-aware plan nodes.
+	 *
+	 * Ordinary plan nodes won't do anything here, but parallel-aware plan
+	 * nodes may need to initialize shared state in the DSM before parallel
+	 * workers are available.  They can allocate the space they previously
+	 * estimated using shm_toc_allocate, and add the keys they previously
+	 * estimated using shm_toc_insert, in each case targeting pcxt->toc.
+	 */
+	if (planstate->plan->parallel_aware)
 	{
-		case T_SeqScanState:
-			ExecSeqScanInitializeDSM((SeqScanState *) planstate,
-									 d->pcxt);
-			break;
-		default:
-			break;
+		switch (nodeTag(planstate))
+		{
+			case T_SeqScanState:
+				ExecSeqScanInitializeDSM((SeqScanState *) planstate,
+										 d->pcxt);
+				break;
+			default:
+				break;
+		}
 	}
 
 	return planstate_tree_walker(planstate, ExecParallelInitializeDSM, d);
@@ -623,13 +634,16 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc)
 		return false;
 
 	/* Call initializers for parallel-aware plan nodes. */
-	switch (nodeTag(planstate))
+	if (planstate->plan->parallel_aware)
 	{
-		case T_SeqScanState:
-			ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc);
-			break;
-		default:
-			break;
+		switch (nodeTag(planstate))
+		{
+			case T_SeqScanState:
+				ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc);
+				break;
+			default:
+				break;
+		}
 	}
 
 	return planstate_tree_walker(planstate, ExecParallelInitializeWorker, toc);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 63fae82..6658a76 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1588,6 +1588,8 @@ _outPathInfo(StringInfo str, const Path *node)
 	else
 		_outBitmapset(str, NULL);
 	WRITE_BOOL_FIELD(parallel_aware);
+	WRITE_BOOL_FIELD(parallel_safe);
+	WRITE_INT_FIELD(parallel_degree);
 	WRITE_FLOAT_FIELD(rows, "%.0f");
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
@@ -1765,7 +1767,6 @@ _outGatherPath(StringInfo str, const GatherPath *node)
 	_outPathInfo(str, (const Path *) node);
 
 	WRITE_NODE_FIELD(subpath);
-	WRITE_INT_FIELD(num_workers);
 	WRITE_BOOL_FIELD(single_copy);
 }
 
@@ -1887,6 +1888,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_NODE_FIELD(reltargetlist);
 	WRITE_NODE_FIELD(pathlist);
 	WRITE_NODE_FIELD(ppilist);
+	WRITE_NODE_FIELD(partial_pathlist);
 	WRITE_NODE_FIELD(cheapest_startup_path);
 	WRITE_NODE_FIELD(cheapest_total_path);
 	WRITE_NODE_FIELD(cheapest_unique_path);
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 916a518..5019804 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -851,4 +851,57 @@ lateral reference.  (Perhaps now that that stuff works, we could relax the
 pullup restriction?)
 
 
--- bjm & tgl
+Parallel Query and Partial Paths
+--------------------------------
+
+Parallel query involves dividing up the work that needs to be performed
+either by an entire query or some portion of the query in such a way that
+some of that work can be done by one or more worker processes, which are
+called parallel workers.  Parallel workers are a subtype of dynamic
+background workers; see src/backend/access/transam/README.parallel for a
+fuller description.  Academic literature on parallel query suggests that
+that parallel execution strategies can be divided into essentially two
+categories: pipelined parallelism, where the execution of the query is
+divided into multiple stages and each stage is handled by a separate
+process; and partitioning parallelism, where the data is split between
+multiple processes and each process handles a subset of it.  The
+literature, however, suggests that gains from pipeline parallelism are
+often very limited due to the difficulty of avoiding pipeline stalls.
+Consequently, we do not currently attempt to generate query plans that
+use this technique.
+
+Instead, we focus on partitioning paralellism, which does not require
+that the underlying table be partitioned.  It only requires that (1)
+there is some method of dividing the data from at least one of the base
+tables involved in the relation across multiple processes, (2) allowing
+each process to handle its own portion of the data, and then (3)
+collecting the results.  Requirements (2) and (3) is satisfied by the
+executor node Gather, which launches any number of worker processes and
+executes its single child plan in all of them (and perhaps in the leader
+also, if the children aren't generating enough data to keep the leader
+busy).  Requirement (1) is handled by the SeqScan node: when invoked
+with parallel_aware = true, this node will, in effect, partition the
+table on a block by block basis, returning a subset of the tuples from
+the relation in each worker where that SeqScan is executed.  A similar
+scheme could be (and probably should be) implemented for bitmap heap
+scans.
+
+Just as we do for non-parallel access methods, we build Paths to
+represent access strategies that can be used in a parallel plan.  These
+are, in essence, the same strategies that are available in the
+non-parallel plan, but there is an important difference: a path that
+will run beneath a Gather node returns only a subset of the query
+results in each worker, not all of them.  To form a path that can
+actually be executed, the (rather large) cost of the Gather node must be
+accounted for.  For this reason among others, paths intended to run
+beneath a Gather node - which we call "partial" paths since they return
+only a subset of the results in each worker - must be kept separate from
+ordinary paths (see RelOptInfo's partial_pathlist and the function
+add_partial_path).
+
+One of the keys to making parallel query effective is to run as much of
+the query in parallel as possible.  Therefore, we expect it to generally
+be desirable to postpone the Gather stage until as near to the top of the
+plan as possible.  Expanding the range of cases in which more work can be
+pushed below the Gather (and costly them accurately) is likely to keep us
+busy for a long time to come.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4516cd3..326de0c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -72,6 +72,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				 Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
 				   RangeTblEntry *rte);
+static void create_parallel_paths(PlannerInfo *root, RelOptInfo *rel);
 static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
 						  RangeTblEntry *rte);
 static bool function_rte_parallel_ok(RangeTblEntry *rte);
@@ -612,7 +613,6 @@ static void
 set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
 	Relids		required_outer;
-	int			parallel_threshold = 1000;
 
 	/*
 	 * We don't support pushing join clauses into the quals of a seqscan, but
@@ -624,39 +624,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider sequential scan */
 	add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
 
-	/* Consider parallel sequential scan */
-	if (rel->consider_parallel && rel->pages > parallel_threshold &&
-		required_outer == NULL)
-	{
-		Path *path;
-		int parallel_degree = 1;
-
-		/*
-		 * Limit the degree of parallelism logarithmically based on the size
-		 * of the relation.  This probably needs to be a good deal more
-		 * sophisticated, but we need something here for now.
-		 */
-		while (rel->pages > parallel_threshold * 3 &&
-			   parallel_degree < max_parallel_degree)
-		{
-			parallel_degree++;
-			parallel_threshold *= 3;
-			if (parallel_threshold >= PG_INT32_MAX / 3)
-				break;
-		}
-
-		/*
-		 * Ideally we should consider postponing the gather operation until
-		 * much later, after we've pushed joins and so on atop the parallel
-		 * sequential scan path.  But we don't have the infrastructure for
-		 * that yet, so just do this for now.
-		 */
-		path = create_seqscan_path(root, rel, required_outer, parallel_degree);
-		path = (Path *)
-			create_gather_path(root, rel, path, required_outer,
-							   parallel_degree);
-		add_path(rel, path);
-	}
+	/* If appropriate, consider parallel sequential scan */
+	if (rel->consider_parallel && required_outer == NULL)
+		create_parallel_paths(root, rel);
 
 	/* Consider index scans */
 	create_index_paths(root, rel);
@@ -666,6 +636,54 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 }
 
 /*
+ * create_parallel_paths
+ *	  Build parallel access paths for a plain relation
+ */
+static void
+create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+	int		parallel_threshold = 1000;
+	int		parallel_degree = 1;
+
+	/*
+	 * If this relation is too small to be worth a parallel scan, just return
+	 * without doing anything ... unless it's an inheritance child.  In that case,
+	 * we want to generate a parallel path here anyway.  It might not be worthwhile
+	 * just for this relation, but when combined with all of its inheritance siblings
+	 * it may well pay off.
+	 */
+	if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
+		return;
+
+	/*
+	 * Limit the degree of parallelism logarithmically based on the size of the
+	 * relation.  This probably needs to be a good deal more sophisticated, but we
+	 * need something here for now.
+	 */
+	while (rel->pages > parallel_threshold * 3 &&
+		   parallel_degree < max_parallel_degree)
+	{
+		parallel_degree++;
+		parallel_threshold *= 3;
+		if (parallel_threshold >= PG_INT32_MAX / 3)
+			break;
+	}
+
+	/* Add an unordered partial path based on a parallel sequential scan. */
+	add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree));
+
+	/*
+	 * If this is a baserel, consider gathering any partial paths we may have
+	 * just created.  If we gathered an inheritance child, we could end up
+	 * with a very large number of gather nodes, each trying to grab its own
+	 * pool of workers, so don't do this in that case.  Instead, we'll
+	 * consider gathering partial paths for the appendrel.
+	 */
+	if (rel->reloptkind == RELOPT_BASEREL)
+		generate_gather_paths(root, rel);
+}
+
+/*
  * set_tablesample_rel_size
  *	  Set size estimates for a sampled relation
  */
@@ -1039,6 +1057,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	List	   *live_childrels = NIL;
 	List	   *subpaths = NIL;
 	bool		subpaths_valid = true;
+	List	   *partial_subpaths = NIL;
+	bool		partial_subpaths_valid = true;
 	List	   *all_child_pathkeys = NIL;
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
@@ -1093,6 +1113,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		else
 			subpaths_valid = false;
 
+		/* Same idea, but for a partial plan. */
+		if (childrel->partial_pathlist != NIL)
+			partial_subpaths = accumulate_append_subpath(partial_subpaths,
+									   linitial(childrel->partial_pathlist));
+		else
+			partial_subpaths_valid = false;
+
 		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
@@ -1164,7 +1191,39 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL));
+		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+
+	/*
+	 * Consider an append of partial unordered, unparameterized partial paths.
+	 */
+	if (partial_subpaths_valid)
+	{
+		AppendPath *appendpath;
+		ListCell   *lc;
+		int			parallel_degree = 0;
+
+		/*
+		 * Decide what parallel degree to request for this append path.  For
+		 * now, we just use the maximum parallel degree of any member.  It
+		 * might be useful to use a higher number if the Append node were
+		 * smart enough to spread out the workers, but it currently isn't.
+		 */
+		foreach(lc, partial_subpaths)
+		{
+			Path	   *path = lfirst(lc);
+
+			parallel_degree = Max(parallel_degree, path->parallel_degree);
+		}
+		Assert(parallel_degree > 0);
+
+		/* Generate a partial append path. */
+		appendpath = create_append_path(rel, partial_subpaths, NULL,
+										parallel_degree);
+		add_partial_path(rel, (Path *) appendpath);
+
+		/* Consider gathering it. */
+		generate_gather_paths(root, rel);
+	}
 
 	/*
 	 * Also build unparameterized MergeAppend paths based on the collected
@@ -1214,7 +1273,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer));
+					 create_append_path(rel, subpaths, required_outer, 0));
 	}
 }
 
@@ -1440,8 +1499,9 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 
 	/* Discard any pre-existing paths; no further need for them */
 	rel->pathlist = NIL;
+	rel->partial_pathlist = NIL;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
@@ -1844,6 +1904,36 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 }
 
 /*
+ * generate_gather_paths
+ *		Generate parallel access paths for a relation by pushing a Gather on
+ *		top of a partial path.
+ */
+void
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+	Path	   *cheapest_partial_path;
+	Path	   *simple_gather_path;
+
+	/* If there are no partial paths, there's nothing to do here. */
+	if (rel->partial_pathlist == NIL)
+		return;
+
+	/*
+	 * The output of Gather is currently always unsorted, so there's only one
+	 * partial path of interest: the cheapest one.
+	 *
+	 * Eventually, we should have a Gather Merge operation that can merge
+	 * multiple tuple streams together while preserving their ordering.  We
+	 * could usefully generate such a path from each partial path that has
+	 * non-NIL pathkeys.
+	 */
+	cheapest_partial_path = linitial(rel->partial_pathlist);
+	simple_gather_path = (Path *)
+		create_gather_path(root, rel, cheapest_partial_path, NULL);
+	add_path(rel, simple_gather_path);
+}
+
+/*
  * make_rel_from_joinlist
  *	  Build access paths using a "joinlist" to guide the join path search.
  *
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 990486c..1e76c03 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -186,24 +186,34 @@ clamp_row_est(double nrows)
  */
 void
 cost_seqscan(Path *path, PlannerInfo *root,
-			 RelOptInfo *baserel, ParamPathInfo *param_info,
-			 int nworkers)
+			 RelOptInfo *baserel, ParamPathInfo *param_info)
 {
 	Cost		startup_cost = 0;
 	Cost		run_cost = 0;
 	double		spc_seq_page_cost;
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
+	double		parallel_divisor = 1;
 
 	/* Should only be applied to base relations */
 	Assert(baserel->relid > 0);
 	Assert(baserel->rtekind == RTE_RELATION);
 
+	/*
+	 * Primitive parallel cost model.  Assume the leader will do half as much
+	 * work as a regular worker, because it will also need to read the tuples
+	 * returned by the workers when they percolate up to the gather ndoe.
+	 * This is almost certainly not exactly the right way to model this, so
+	 * this will probably need to be changed at some point...
+	 */
+	if (path->parallel_degree > 0)
+		parallel_divisor = path->parallel_degree + 0.5;
+
 	/* Mark the path with the correct row estimate */
 	if (param_info)
-		path->rows = param_info->ppi_rows;
+		path->rows = param_info->ppi_rows / parallel_divisor;
 	else
-		path->rows = baserel->rows;
+		path->rows = baserel->rows / parallel_divisor;
 
 	if (!enable_seqscan)
 		startup_cost += disable_cost;
@@ -216,24 +226,14 @@ cost_seqscan(Path *path, PlannerInfo *root,
 	/*
 	 * disk costs
 	 */
-	run_cost += spc_seq_page_cost * baserel->pages;
+	run_cost += spc_seq_page_cost * baserel->pages / parallel_divisor;
 
 	/* CPU costs */
 	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
 
 	startup_cost += qpqual_cost.startup;
 	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
-	run_cost += cpu_per_tuple * baserel->tuples;
-
-	/*
-	 * Primitive parallel cost model.  Assume the leader will do half as much
-	 * work as a regular worker, because it will also need to read the tuples
-	 * returned by the workers when they percolate up to the gather ndoe.
-	 * This is almost certainly not exactly the right way to model this, so
-	 * this will probably need to be changed at some point...
-	 */
-	if (nworkers > 0)
-		run_cost = run_cost / (nworkers + 0.5);
+	run_cost += cpu_per_tuple * baserel->tuples / parallel_divisor;
 
 	path->startup_cost = startup_cost;
 	path->total_cost = startup_cost + run_cost;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 53d8fdd..33268d1 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -34,6 +34,12 @@ static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
 					 RelOptInfo *outerrel, RelOptInfo *innerrel,
 					 JoinType jointype, JoinPathExtraData *extra);
+static void consider_parallel_nestloop(PlannerInfo *root,
+						   RelOptInfo *joinrel,
+						   RelOptInfo *outerrel,
+						   RelOptInfo *innerrel,
+						   JoinType jointype,
+						   JoinPathExtraData *extra);
 static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 					 RelOptInfo *outerrel, RelOptInfo *innerrel,
 					 JoinType jointype, JoinPathExtraData *extra);
@@ -216,7 +222,12 @@ add_paths_to_joinrel(PlannerInfo *root,
 												 jointype, &extra);
 
 	/*
-	 * 6. Finally, give extensions a chance to manipulate the path list.
+	 * 6. Consider gathering partial paths.
+	 */
+	generate_gather_paths(root, joinrel);
+
+	/*
+	 * 7. Finally, give extensions a chance to manipulate the path list.
 	 */
 	if (set_join_pathlist_hook)
 		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
@@ -330,6 +341,62 @@ try_nestloop_path(PlannerInfo *root,
 }
 
 /*
+ * try_partial_nestloop_path
+ *	  Consider a partial nestloop join path; if it appears useful, push it into
+ *	  the joinrel's partial_pathlist via add_partial_path().
+ */
+static void
+try_partial_nestloop_path(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  Path *outer_path,
+				  Path *inner_path,
+				  List *pathkeys,
+				  JoinType jointype,
+				  JoinPathExtraData *extra)
+{
+	JoinCostWorkspace workspace;
+
+	/*
+	 * If the inner path is parameterized, the parameterization must be fully
+	 * satisfied by the proposed outer path.  Parameterized partial paths are
+	 * not supported.  The caller should already have verified that no
+	 * extra_lateral_rels are required here.
+	 */
+	Assert(bms_is_empty(joinrel->lateral_relids));
+	if (inner_path->param_info != NULL)
+	{
+		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
+
+		if (!bms_is_subset(inner_paramrels, outer_path->parent->relids))
+			return;
+	}
+
+	/*
+	 * Before creating a path, get a quick lower bound on what it is likely
+	 * to cost.  Bail out right away if it looks terrible.
+	 */
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_path,
+						  extra->sjinfo, &extra->semifactors);
+	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+		return;
+
+	/* Might be good enough to be worth trying, so let's try it. */
+	add_partial_path(joinrel, (Path *)
+			 create_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  &workspace,
+								  extra->sjinfo,
+								  &extra->semifactors,
+								  outer_path,
+								  inner_path,
+								  extra->restrictlist,
+								  pathkeys,
+								  NULL));
+}
+
+/*
  * try_mergejoin_path
  *	  Consider a merge join path; if it appears useful, push it into
  *	  the joinrel's pathlist via add_path().
@@ -472,6 +539,62 @@ try_hashjoin_path(PlannerInfo *root,
 }
 
 /*
+ * try_partial_hashjoin_path
+ *	  Consider a partial hashjoin join path; if it appears useful, push it into
+ *	  the joinrel's partial_pathlist via add_partial_path().
+ */
+static void
+try_partial_hashjoin_path(PlannerInfo *root,
+						  RelOptInfo *joinrel,
+						  Path *outer_path,
+						  Path *inner_path,
+						  List *hashclauses,
+						  JoinType jointype,
+						  JoinPathExtraData *extra)
+{
+	JoinCostWorkspace workspace;
+
+	/*
+	 * If the inner path is parameterized, the parameterization must be fully
+	 * satisfied by the proposed outer path.  Parameterized partial paths are
+	 * not supported.  The caller should already have verified that no
+	 * extra_lateral_rels are required here.
+	 */
+	Assert(bms_is_empty(joinrel->lateral_relids));
+	if (inner_path->param_info != NULL)
+	{
+		Relids		inner_paramrels = inner_path->param_info->ppi_req_outer;
+
+		if (!bms_is_empty(inner_paramrels))
+			return;
+	}
+
+	/*
+	 * Before creating a path, get a quick lower bound on what it is likely
+	 * to cost.  Bail out right away if it looks terrible.
+	 */
+	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
+						  outer_path, inner_path,
+						  extra->sjinfo, &extra->semifactors);
+	if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
+		return;
+
+	/* Might be good enough to be worth trying, so let's try it. */
+	add_partial_path(joinrel, (Path *)
+			 create_hashjoin_path(root,
+								  joinrel,
+								  jointype,
+								  &workspace,
+								  extra->sjinfo,
+								  &extra->semifactors,
+								  outer_path,
+								  inner_path,
+								  extra->restrictlist,
+								  NULL,
+								  hashclauses));
+}
+
+/*
  * clause_sides_match_join
  *	  Determine whether a join clause is of the right form to use in this join.
  *
@@ -1063,6 +1186,85 @@ match_unsorted_outer(PlannerInfo *root,
 				break;
 		}
 	}
+
+	/*
+	 * If the joinrel is parallel-safe and the join type supports nested loops,
+	 * we may be able to consider a partial nestloop plan.  However, we can't
+	 * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
+	 * therefore we won't be able to properly guarantee uniqueness.  Nor can
+	 * we handle extra_lateral_rels, since partial paths must not be
+	 * parameterized.
+	 */
+	if (joinrel->consider_parallel && nestjoinOK &&
+		save_jointype != JOIN_UNIQUE_OUTER &&
+		bms_is_empty(joinrel->lateral_relids))
+		consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
+								   save_jointype, extra);
+}
+
+/*
+ * consider_parallel_nestloop
+ *	  Try to build partial paths for a joinrel by joining a partial path for the
+ *	  outer relation to a complete path for the inner relation.
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'jointype' is the type of join to do
+ * 'extra' contains additional input values
+ */
+static void
+consider_parallel_nestloop(PlannerInfo *root,
+						   RelOptInfo *joinrel,
+						   RelOptInfo *outerrel,
+						   RelOptInfo *innerrel,
+						   JoinType jointype,
+						   JoinPathExtraData *extra)
+{
+	ListCell   *lc1;
+
+	foreach(lc1, outerrel->partial_pathlist)
+	{
+		Path	   *outerpath = (Path *) lfirst(lc1);
+		List	   *pathkeys;
+		ListCell   *lc2;
+
+		/* Figure out what useful ordering any paths we create will have. */
+		pathkeys = build_join_pathkeys(root, joinrel, jointype,
+									   outerpath->pathkeys);
+
+		/*
+		 * Try the cheapest parameterized paths; only those which will
+		 * produce an unparameterized path when joined to this outerrel
+		 * will survive try_partial_nestloop_path.  The cheapest
+		 * unparameterized path is also in this list.
+		 */
+		foreach(lc2, innerrel->cheapest_parameterized_paths)
+		{
+			Path	   *innerpath = (Path *) lfirst(lc2);
+
+			/* Can't join to an inner path that is not parallel-safe */
+			if (!innerpath->parallel_safe)
+				continue;
+
+			/*
+			 * Like match_unsorted_outer, we only consider a single nestloop
+			 * path when the jointype is JOIN_UNIQUE_INNER.  But we have to scan
+			 * cheapest_parameterized_paths to find the one we want to consider,
+			 * because cheapest_total_path might not be parallel-safe.
+			 */
+			if (jointype == JOIN_UNIQUE_INNER)
+			{
+				if (!bms_is_empty(PATH_REQ_OUTER(innerpath)))
+					continue;
+				innerpath = (Path *) create_unique_path(root, innerrel,
+											   innerpath, extra->sjinfo);
+			}
+
+			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
+									  pathkeys, jointype, extra);
+		}
+	}
 }
 
 /*
@@ -1240,6 +1442,55 @@ hash_inner_and_outer(PlannerInfo *root,
 				}
 			}
 		}
+
+		/*
+		 * If the joinrel is parallel-safe, we may be able to consider a
+		 * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
+		 * because the outer path will be partial, and therefore we won't be
+		 * able to properly guarantee uniqueness.  Also, the resulting path
+		 * must not be parameterized.
+		 */
+		if (joinrel->consider_parallel && jointype != JOIN_UNIQUE_OUTER &&
+			outerrel->partial_pathlist != NIL &&
+			bms_is_empty(joinrel->lateral_relids))
+		{
+			Path   *cheapest_partial_outer;
+			Path   *cheapest_safe_inner = NULL;
+
+			cheapest_partial_outer =
+				(Path *) linitial(outerrel->partial_pathlist);
+
+			/*
+			 * Normally, given that the joinrel is parallel-safe, the cheapest
+			 * total inner path will also be parallel-safe, but if not, we'll
+			 * have to search cheapest_parameterized_paths for the cheapest
+			 * unparameterized inner path.
+			 */
+			if (cheapest_total_inner->parallel_safe)
+				cheapest_safe_inner = cheapest_total_inner;
+			else
+			{
+				ListCell   *lc;
+
+				foreach(lc, innerrel->cheapest_parameterized_paths)
+				{
+					Path	   *innerpath = (Path *) lfirst(lc);
+
+					if (innerpath->parallel_safe &&
+						bms_is_empty(PATH_REQ_OUTER(innerpath)))
+					{
+						cheapest_safe_inner = innerpath;
+						break;
+					}
+				}
+			}
+
+			if (cheapest_safe_inner != NULL)
+				try_partial_hashjoin_path(root, joinrel,
+										  cheapest_partial_outer,
+										  cheapest_safe_inner,
+										  hashclauses, jointype, extra);
+		}
 	}
 }
 
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index ad58058..e32f445 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1194,9 +1194,10 @@ mark_dummy_rel(RelOptInfo *rel)
 
 	/* Evict any previously chosen paths */
 	rel->pathlist = NIL;
+	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL));
+	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 32f903d..94e5f26 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1125,7 +1125,7 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path)
 
 	gather_plan = make_gather(subplan->targetlist,
 							  NIL,
-							  best_path->num_workers,
+							  best_path->path.parallel_degree,
 							  best_path->single_copy,
 							  subplan);
 
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 894f968..a8ff2ff 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -84,7 +84,8 @@ query_planner(PlannerInfo *root, List *tlist,
 
 		/* The only path for it is a trivial Result path */
 		add_path(final_rel, (Path *)
-				 create_result_path((List *) parse->jointree->quals));
+				 create_result_path(final_rel,
+									(List *) parse->jointree->quals));
 
 		/* Select cheapest path (pretty easy in this case...) */
 		set_cheapest(final_rel);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index ec0910d..d1c8e00 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -217,7 +217,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
  * The cheapest_parameterized_paths list collects all parameterized paths
  * that have survived the add_path() tournament for this relation.  (Since
  * add_path ignores pathkeys for a parameterized path, these will be paths
- * that have best cost or best row count for their parameterization.)
+ * that have best cost or best row count for their parameterization.  We
+ * may also have both a parallel-safe and a non-parallel-safe path in some
+ * cases for the same parameterization in some cases, but this should be
+ * relatively rare since, most typically, all paths for the same relation
+ * will be parallel-safe or none of them will.)
+ *
  * cheapest_parameterized_paths always includes the cheapest-total
  * unparameterized path, too, if there is one; the users of that list find
  * it more convenient if that's included.
@@ -352,11 +357,12 @@ set_cheapest(RelOptInfo *parent_rel)
  *	  A path is worthy if it has a better sort order (better pathkeys) or
  *	  cheaper cost (on either dimension), or generates fewer rows, than any
  *	  existing path that has the same or superset parameterization rels.
+ *	  We also consider parallel-safe paths more worthy than others.
  *
  *	  We also remove from the rel's pathlist any old paths that are dominated
  *	  by new_path --- that is, new_path is cheaper, at least as well ordered,
- *	  generates no more rows, and requires no outer rels not required by the
- *	  old path.
+ *	  generates no more rows, requires no outer rels not required by the old
+ *	  path, and is no less parallel-safe.
  *
  *	  In most cases, a path with a superset parameterization will generate
  *	  fewer rows (since it has more join clauses to apply), so that those two
@@ -470,14 +476,16 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 						{
 							if ((outercmp == BMS_EQUAL ||
 								 outercmp == BMS_SUBSET1) &&
-								new_path->rows <= old_path->rows)
+								new_path->rows <= old_path->rows &&
+								new_path->parallel_safe >= old_path->parallel_safe)
 								remove_old = true;		/* new dominates old */
 						}
 						else if (keyscmp == PATHKEYS_BETTER2)
 						{
 							if ((outercmp == BMS_EQUAL ||
 								 outercmp == BMS_SUBSET2) &&
-								new_path->rows >= old_path->rows)
+								new_path->rows >= old_path->rows &&
+								new_path->parallel_safe <= old_path->parallel_safe)
 								accept_new = false;		/* old dominates new */
 						}
 						else	/* keyscmp == PATHKEYS_EQUAL */
@@ -487,19 +495,25 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 								/*
 								 * Same pathkeys and outer rels, and fuzzily
 								 * the same cost, so keep just one; to decide
-								 * which, first check rows and then do a fuzzy
-								 * cost comparison with very small fuzz limit.
-								 * (We used to do an exact cost comparison,
-								 * but that results in annoying
-								 * platform-specific plan variations due to
-								 * roundoff in the cost estimates.)  If things
-								 * are still tied, arbitrarily keep only the
-								 * old path.  Notice that we will keep only
-								 * the old path even if the less-fuzzy
-								 * comparison decides the startup and total
-								 * costs compare differently.
+								 * which, first check parallel-safety, then
+								 * rows, then do a fuzzy cost comparison with
+								 * very small fuzz limit.  (We used to do an
+								 * exact cost comparison, but that results in
+								 * annoying platform-specific plan variations
+								 * due to roundoff in the cost estimates.)	If
+								 * things are still tied, arbitrarily keep
+								 * only the old path.  Notice that we will
+								 * keep only the old path even if the
+								 * less-fuzzy comparison decides the startup
+								 * and total costs compare differently.
 								 */
-								if (new_path->rows < old_path->rows)
+								if (new_path->parallel_safe >
+									old_path->parallel_safe)
+									remove_old = true;	/* new dominates old */
+								else if (new_path->parallel_safe <
+										 old_path->parallel_safe)
+									accept_new = false; /* old dominates new */
+								else if (new_path->rows < old_path->rows)
 									remove_old = true;	/* new dominates old */
 								else if (new_path->rows > old_path->rows)
 									accept_new = false; /* old dominates new */
@@ -512,10 +526,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 														 * dominates new */
 							}
 							else if (outercmp == BMS_SUBSET1 &&
-									 new_path->rows <= old_path->rows)
+									 new_path->rows <= old_path->rows &&
+									 new_path->parallel_safe >= old_path->parallel_safe)
 								remove_old = true;		/* new dominates old */
 							else if (outercmp == BMS_SUBSET2 &&
-									 new_path->rows >= old_path->rows)
+									 new_path->rows >= old_path->rows &&
+									 new_path->parallel_safe <= old_path->parallel_safe)
 								accept_new = false;		/* old dominates new */
 							/* else different parameterizations, keep both */
 						}
@@ -527,7 +543,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 												   PATH_REQ_OUTER(old_path));
 							if ((outercmp == BMS_EQUAL ||
 								 outercmp == BMS_SUBSET1) &&
-								new_path->rows <= old_path->rows)
+								new_path->rows <= old_path->rows &&
+								new_path->parallel_safe >= old_path->parallel_safe)
 								remove_old = true;		/* new dominates old */
 						}
 						break;
@@ -538,7 +555,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 												   PATH_REQ_OUTER(old_path));
 							if ((outercmp == BMS_EQUAL ||
 								 outercmp == BMS_SUBSET2) &&
-								new_path->rows >= old_path->rows)
+								new_path->rows >= old_path->rows &&
+								new_path->parallel_safe <= old_path->parallel_safe)
 								accept_new = false;		/* old dominates new */
 						}
 						break;
@@ -685,6 +703,214 @@ add_path_precheck(RelOptInfo *parent_rel,
 	return true;
 }
 
+/*
+ * add_partial_path
+ *	  Like add_path, our goal here is to consider whether a path is worthy
+ *	  of being kept around, but the considerations here are a bit different.
+ *	  A partial path is one which can be executed in any number of workers in
+ *	  parallel such that each worker will generate a subset of the path's
+ *	  overall result.
+ *
+ *	  We don't generate parameterized partial paths for several reasons.  Most
+ *	  importantly, they're not safe to execute, because there's nothing to
+ *	  make sure that a parallel scan within the parameterized portion of the
+ *	  plan is running with the same value in every worker at the same time.
+ *	  Fortunately, it seems unlikely to be worthwhile anyway, because having
+ *	  each worker scan the entire outer relation and a subset of the inner
+ *	  relation will generally be a terrible plan.  The inner (parameterized)
+ *	  side of the plan will be small anyway.  There could be rare cases where
+ *	  this wins big - e.g. if join order constraints put a 1-row relation on
+ *	  the outer side of the topmost join with a parameterized plan on the inner
+ *	  side - but we'll have to be content not to handle such cases until somebody
+ *	  builds an executor infrastructure that can cope with them.
+ *
+ *	  Because we don't consider parameterized paths here, we also don't
+ *	  need to consider the row counts as a measure of quality: every path will
+ *	  produce the same number of rows.  Neither do we need to consider startup
+ *	  costs: parallelism is only used for plans that will be run to completion.
+ *	  Therefore, this routine is much simpler than add_path: it needs to
+ *	  consider only pathkeys and total cost.
+ */
+void
+add_partial_path(RelOptInfo *parent_rel, Path *new_path)
+{
+	bool		accept_new = true;		/* unless we find a superior old path */
+	ListCell   *insert_after = NULL;	/* where to insert new item */
+	ListCell   *p1;
+	ListCell   *p1_prev;
+	ListCell   *p1_next;
+
+	/* Check for query cancel. */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * As in add_path, throw out any paths which are dominated by the new
+	 * path, but throw out the new path if some existing path dominates it.
+	 */
+	p1_prev = NULL;
+	for (p1 = list_head(parent_rel->partial_pathlist); p1 != NULL;
+		 p1 = p1_next)
+	{
+		Path	   *old_path = (Path *) lfirst(p1);
+		bool		remove_old = false; /* unless new proves superior */
+		PathKeysComparison keyscmp;
+
+		p1_next = lnext(p1);
+
+		/* Compare pathkeys. */
+		keyscmp = compare_pathkeys(new_path->pathkeys, old_path->pathkeys);
+
+		/* Unless pathkeys are incompable, keep just one of the two paths. */
+		if (keyscmp != PATHKEYS_DIFFERENT)
+		{
+			if (new_path->total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
+			{
+				/* New path costs more; keep it only if pathkeys are better. */
+				if (keyscmp != PATHKEYS_BETTER1)
+					accept_new = false;
+			}
+			else if (old_path->total_cost > new_path->total_cost
+					 * STD_FUZZ_FACTOR)
+			{
+				/* Old path costs more; keep it only if pathkeys are better. */
+				if (keyscmp != PATHKEYS_BETTER2)
+					remove_old = true;
+			}
+			else if (keyscmp == PATHKEYS_BETTER1)
+			{
+				/* Costs are about the same, new path has better pathkeys. */
+				remove_old = true;
+			}
+			else if (keyscmp == PATHKEYS_BETTER2)
+			{
+				/* Costs are about the same, old path has better pathkeys. */
+				accept_new = false;
+			}
+			else if (old_path->total_cost > new_path->total_cost * 1.0000000001)
+			{
+				/* Pathkeys are the same, and the old path costs more. */
+				remove_old = true;
+			}
+			else
+			{
+				/*
+				 * Pathkeys are the same, and new path isn't materially
+				 * cheaper.
+				 */
+				accept_new = false;
+			}
+		}
+
+		/*
+		 * Remove current element from partial_pathlist if dominated by new.
+		 */
+		if (remove_old)
+		{
+			parent_rel->partial_pathlist =
+				list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
+			/* add_path has a special case for IndexPath; we don't need it */
+			Assert(!IsA(old_path, IndexPath));
+			pfree(old_path);
+			/* p1_prev does not advance */
+		}
+		else
+		{
+			/* new belongs after this old path if it has cost >= old's */
+			if (new_path->total_cost >= old_path->total_cost)
+				insert_after = p1;
+			/* p1_prev advances */
+			p1_prev = p1;
+		}
+
+		/*
+		 * If we found an old path that dominates new_path, we can quit
+		 * scanning the partial_pathlist; we will not add new_path, and we
+		 * assume new_path cannot dominate any later path.
+		 */
+		if (!accept_new)
+			break;
+	}
+
+	if (accept_new)
+	{
+		/* Accept the new path: insert it at proper place */
+		if (insert_after)
+			lappend_cell(parent_rel->partial_pathlist, insert_after, new_path);
+		else
+			parent_rel->partial_pathlist =
+				lcons(new_path, parent_rel->partial_pathlist);
+	}
+	else
+	{
+		/* add_path has a special case for IndexPath; we don't need it */
+		Assert(!IsA(new_path, IndexPath));
+		/* Reject and recycle the new path */
+		pfree(new_path);
+	}
+}
+
+/*
+ * add_partial_path_precheck
+ *	  Check whether a proposed new partial path could possibly get accepted.
+ *
+ * Unlike add_path_precheck, we can ignore startup cost and parameterization,
+ * since they don't matter for partial paths (see add_partial_path).  But
+ * we do want to make sure we don't add a partial path if there's already
+ * a complete path that dominates it, since in that case the proposed path
+ * is surely a loser.
+ */
+bool
+add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
+						  List *pathkeys)
+{
+	ListCell   *p1;
+
+	/*
+	 * Our goal here is twofold.  First, we want to find out whether this path
+	 * is clearly inferior to some existing partial path.  If so, we want to
+	 * reject it immediately.  Second, we want to find out whether this path
+	 * is clearly superior to some existing partial path -- at least, modulo
+	 * final cost computations.  If so, we definitely want to consider it.
+	 *
+	 * Unlike add_path(), we always compare pathkeys here.  This is because we
+	 * expect partial_pathlist to be very short, and getting a definitive
+	 * answer at this stage avoids the need to call add_path_precheck.
+	 */
+	foreach(p1, parent_rel->partial_pathlist)
+	{
+		Path	   *old_path = (Path *) lfirst(p1);
+		PathKeysComparison keyscmp;
+
+		keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys);
+		if (keyscmp != PATHKEYS_DIFFERENT)
+		{
+			if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR &&
+				keyscmp != PATHKEYS_BETTER1)
+				return false;
+			if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR &&
+				keyscmp != PATHKEYS_BETTER2)
+				return true;
+		}
+	}
+
+	/*
+	 * This path is neither clearly inferior to an existing partial path nor
+	 * clearly good enough that it might replace one.  Compare it to
+	 * non-parallel plans.  If it loses even before accounting for the cost of
+	 * the Gather node, we should definitely reject it.
+	 *
+	 * Note that we pass the total_cost to add_path_precheck twice.  This is
+	 * because it's never advantageous to consider the startup cost of a
+	 * partial path; the resulting plans, if run in parallel, will be run to
+	 * completion.
+	 */
+	if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
+						   NULL))
+		return false;
+
+	return true;
+}
+
 
 /*****************************************************************************
  *		PATH NODE CREATION ROUTINES
@@ -697,7 +923,7 @@ add_path_precheck(RelOptInfo *parent_rel,
  */
 Path *
 create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
-					Relids required_outer, int nworkers)
+					Relids required_outer, int parallel_degree)
 {
 	Path	   *pathnode = makeNode(Path);
 
@@ -705,10 +931,12 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->parent = rel;
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
-	pathnode->parallel_aware = nworkers > 0 ? true : false;
+	pathnode->parallel_aware = parallel_degree > 0 ? true : false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = parallel_degree;
 	pathnode->pathkeys = NIL;	/* seqscan has unordered result */
 
-	cost_seqscan(pathnode, root, rel, pathnode->param_info, nworkers);
+	cost_seqscan(pathnode, root, rel, pathnode->param_info);
 
 	return pathnode;
 }
@@ -727,6 +955,8 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
 	pathnode->parallel_aware = false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = 0;
 	pathnode->pathkeys = NIL;	/* samplescan has unordered result */
 
 	cost_samplescan(pathnode, root, rel, pathnode->param_info);
@@ -781,6 +1011,8 @@ create_index_path(PlannerInfo *root,
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.pathkeys = pathkeys;
 
 	/* Convert clauses to indexquals the executor can handle */
@@ -827,6 +1059,8 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = bitmapqual->parallel_safe;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.pathkeys = NIL;		/* always unordered */
 
 	pathnode->bitmapqual = bitmapqual;
@@ -852,7 +1086,17 @@ create_bitmap_and_path(PlannerInfo *root,
 	pathnode->path.pathtype = T_BitmapAnd;
 	pathnode->path.parent = rel;
 	pathnode->path.param_info = NULL;	/* not used in bitmap trees */
+
+	/*
+	 * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be
+	 * parallel-safe if and only if rel->consider_parallel is set.  So, we can
+	 * set the flag for this path based only on the relation-level flag,
+	 * without actually iterating over the list of children.
+	 */
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
+
 	pathnode->path.pathkeys = NIL;		/* always unordered */
 
 	pathnode->bitmapquals = bitmapquals;
@@ -877,7 +1121,17 @@ create_bitmap_or_path(PlannerInfo *root,
 	pathnode->path.pathtype = T_BitmapOr;
 	pathnode->path.parent = rel;
 	pathnode->path.param_info = NULL;	/* not used in bitmap trees */
+
+	/*
+	 * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be
+	 * parallel-safe if and only if rel->consider_parallel is set.  So, we can
+	 * set the flag for this path based only on the relation-level flag,
+	 * without actually iterating over the list of children.
+	 */
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
+
 	pathnode->path.pathkeys = NIL;		/* always unordered */
 
 	pathnode->bitmapquals = bitmapquals;
@@ -903,6 +1157,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.pathkeys = NIL;		/* always unordered */
 
 	pathnode->tidquals = tidquals;
@@ -921,7 +1177,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer)
+create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
+				   int parallel_degree)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
@@ -931,6 +1188,8 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer)
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = parallel_degree;
 	pathnode->path.pathkeys = NIL;		/* result is always considered
 										 * unsorted */
 	pathnode->subpaths = subpaths;
@@ -955,6 +1214,8 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer)
 		if (l == list_head(subpaths))	/* first node? */
 			pathnode->path.startup_cost = subpath->startup_cost;
 		pathnode->path.total_cost += subpath->total_cost;
+		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
+			subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
@@ -985,6 +1246,8 @@ create_merge_append_path(PlannerInfo *root,
 	pathnode->path.param_info = get_appendrel_parampathinfo(rel,
 															required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->subpaths = subpaths;
 
@@ -1008,6 +1271,8 @@ create_merge_append_path(PlannerInfo *root,
 		Path	   *subpath = (Path *) lfirst(l);
 
 		pathnode->path.rows += subpath->rows;
+		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
+			subpath->parallel_safe;
 
 		if (pathkeys_contained_in(pathkeys, subpath->pathkeys))
 		{
@@ -1052,7 +1317,7 @@ create_merge_append_path(PlannerInfo *root,
  *	  This is only used for the case of a query with an empty jointree.
  */
 ResultPath *
-create_result_path(List *quals)
+create_result_path(RelOptInfo *rel, List *quals)
 {
 	ResultPath *pathnode = makeNode(ResultPath);
 
@@ -1060,6 +1325,8 @@ create_result_path(List *quals)
 	pathnode->path.parent = NULL;
 	pathnode->path.param_info = NULL;	/* there are no other rels... */
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.pathkeys = NIL;
 	pathnode->quals = quals;
 
@@ -1094,6 +1361,8 @@ create_material_path(RelOptInfo *rel, Path *subpath)
 	pathnode->path.parent = rel;
 	pathnode->path.param_info = subpath->param_info;
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = subpath->parallel_safe;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.pathkeys = subpath->pathkeys;
 
 	pathnode->subpath = subpath;
@@ -1155,6 +1424,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parent = rel;
 	pathnode->path.param_info = subpath->param_info;
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = subpath->parallel_safe;
+	pathnode->path.parallel_degree = 0;
 
 	/*
 	 * Assume the output is unsorted, since we don't necessarily have pathkeys
@@ -1328,19 +1599,30 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
  */
 GatherPath *
 create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
-				   Relids required_outer, int nworkers)
+				   Relids required_outer)
 {
 	GatherPath *pathnode = makeNode(GatherPath);
 
+	Assert(subpath->parallel_safe);
+
 	pathnode->path.pathtype = T_Gather;
 	pathnode->path.parent = rel;
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = false;
+	pathnode->path.parallel_degree = subpath->parallel_degree;
 	pathnode->path.pathkeys = NIL;		/* Gather has unordered result */
 
 	pathnode->subpath = subpath;
-	pathnode->num_workers = nworkers;
+	pathnode->single_copy = false;
+
+	if (pathnode->path.parallel_degree == 0)
+	{
+		pathnode->path.parallel_degree = 1;
+		pathnode->path.pathkeys = subpath->pathkeys;
+		pathnode->single_copy = true;
+	}
 
 	cost_gather(pathnode, root, rel, pathnode->path.param_info);
 
@@ -1393,6 +1675,8 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
 	pathnode->parallel_aware = false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = 0;
 	pathnode->pathkeys = pathkeys;
 
 	cost_subqueryscan(pathnode, root, rel, pathnode->param_info);
@@ -1416,6 +1700,8 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
 	pathnode->parallel_aware = false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = 0;
 	pathnode->pathkeys = pathkeys;
 
 	cost_functionscan(pathnode, root, rel, pathnode->param_info);
@@ -1439,6 +1725,8 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
 	pathnode->parallel_aware = false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
 
 	cost_valuesscan(pathnode, root, rel, pathnode->param_info);
@@ -1461,6 +1749,8 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
 	pathnode->parallel_aware = false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = 0;
 	pathnode->pathkeys = NIL;	/* XXX for now, result is always unordered */
 
 	cost_ctescan(pathnode, root, rel, pathnode->param_info);
@@ -1484,6 +1774,8 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->param_info = get_baserel_parampathinfo(root, rel,
 													 required_outer);
 	pathnode->parallel_aware = false;
+	pathnode->parallel_safe = rel->consider_parallel;
+	pathnode->parallel_degree = 0;
 	pathnode->pathkeys = NIL;	/* result is always unordered */
 
 	/* Cost is the same as for a regular CTE scan */
@@ -1517,6 +1809,8 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel;
+	pathnode->path.parallel_degree = 0;
 	pathnode->path.rows = rows;
 	pathnode->path.startup_cost = startup_cost;
 	pathnode->path.total_cost = total_cost;
@@ -1653,6 +1947,10 @@ create_nestloop_path(PlannerInfo *root,
 								  required_outer,
 								  &restrict_clauses);
 	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = joinrel->consider_parallel &&
+		outer_path->parallel_safe && inner_path->parallel_safe;
+	/* This is a foolish way to estimate parallel_degree, but for now... */
+	pathnode->path.parallel_degree = outer_path->parallel_degree;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->jointype = jointype;
 	pathnode->outerjoinpath = outer_path;
@@ -1711,6 +2009,9 @@ create_mergejoin_path(PlannerInfo *root,
 								  required_outer,
 								  &restrict_clauses);
 	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
+		outer_path->parallel_safe && inner_path->parallel_safe;
+	pathnode->jpath.path.parallel_degree = 0;
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.outerjoinpath = outer_path;
@@ -1768,6 +2069,10 @@ create_hashjoin_path(PlannerInfo *root,
 								  required_outer,
 								  &restrict_clauses);
 	pathnode->jpath.path.parallel_aware = false;
+	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
+		outer_path->parallel_safe && inner_path->parallel_safe;
+	/* This is a foolish way to estimate parallel_degree, but for now... */
+	pathnode->jpath.path.parallel_degree = outer_path->parallel_degree;
 
 	/*
 	 * A hashjoin never has pathkeys, since its output ordering is
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f2bdfcc..29ce75f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -107,6 +107,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->reltargetlist = NIL;
 	rel->pathlist = NIL;
 	rel->ppilist = NIL;
+	rel->partial_pathlist = NIL;
 	rel->cheapest_startup_path = NULL;
 	rel->cheapest_total_path = NULL;
 	rel->cheapest_unique_path = NULL;
@@ -370,6 +371,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->reltargetlist = NIL;
 	joinrel->pathlist = NIL;
 	joinrel->ppilist = NIL;
+	joinrel->partial_pathlist = NIL;
 	joinrel->cheapest_startup_path = NULL;
 	joinrel->cheapest_total_path = NULL;
 	joinrel->cheapest_unique_path = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5393005..f9f13b4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -458,6 +458,7 @@ typedef struct RelOptInfo
 	List	   *reltargetlist;	/* Vars to be output by scan of relation */
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
+	List	   *partial_pathlist;	/* partial Paths */
 	struct Path *cheapest_startup_path;
 	struct Path *cheapest_total_path;
 	struct Path *cheapest_unique_path;
@@ -759,6 +760,8 @@ typedef struct Path
 	RelOptInfo *parent;			/* the relation this path can build */
 	ParamPathInfo *param_info;	/* parameterization info, or NULL if none */
 	bool		parallel_aware; /* engage parallel-aware logic? */
+	bool		parallel_safe;	/* OK to use as part of parallel plan? */
+	int			parallel_degree; /* desired parallel degree; 0 = not parallel */
 
 	/* estimated size/costs for path (see costsize.c for more info) */
 	double		rows;			/* estimated number of result tuples */
@@ -1062,7 +1065,6 @@ typedef struct GatherPath
 {
 	Path		path;
 	Path	   *subpath;		/* path for each worker */
-	int			num_workers;	/* number of workers sought to help */
 	bool		single_copy;	/* path must not be executed >1x */
 } GatherPath;
 
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ac21a3a..25a7303 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -72,7 +72,7 @@ extern double clamp_row_est(double nrows);
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
 					double index_pages, PlannerInfo *root);
 extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
-			 ParamPathInfo *param_info, int nworkers);
+			 ParamPathInfo *param_info);
 extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 				ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8fb9eda..4b24da0 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -29,9 +29,12 @@ extern void add_path(RelOptInfo *parent_rel, Path *new_path);
 extern bool add_path_precheck(RelOptInfo *parent_rel,
 				  Cost startup_cost, Cost total_cost,
 				  List *pathkeys, Relids required_outer);
+extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
+extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
+						  Cost total_cost, List *pathkeys);
 
 extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
-					Relids required_outer, int nworkers);
+					Relids required_outer, int parallel_degree);
 extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
 					   Relids required_outer);
 extern IndexPath *create_index_path(PlannerInfo *root,
@@ -59,19 +62,18 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
 extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer);
+				   Relids required_outer, int parallel_degree);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
 						 List *pathkeys,
 						 Relids required_outer);
-extern ResultPath *create_result_path(List *quals);
+extern ResultPath *create_result_path(RelOptInfo *rel, List *quals);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 				   Path *subpath, SpecialJoinInfo *sjinfo);
 extern GatherPath *create_gather_path(PlannerInfo *root,
-				   RelOptInfo *rel, Path *subpath, Relids required_outer,
-				   int nworkers);
+				   RelOptInfo *rel, Path *subpath, Relids required_outer);
 extern Path *create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel,
 						 List *pathkeys, Relids required_outer);
 extern Path *create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7757741..4b01850 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -50,6 +50,8 @@ extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+
 #ifdef OPTIMIZER_DEBUG
 extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
 #endif
-- 
2.5.4 (Apple Git-61)

From 0e006c14df600419b56efc9ce8d2eaffb1891a7a Mon Sep 17 00:00:00 2001
From: Robert Haas <[email protected]>
Date: Fri, 18 Dec 2015 10:02:35 -0500
Subject: [PATCH 2/3] Don't advance to the next reader until we can't read
 anything from the current one.

---
 src/backend/executor/nodeGather.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index f32da1e..db5883d 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -359,14 +359,20 @@ gather_readnext(GatherState *gatherstate)
 			continue;
 		}
 
-		/* Advance nextreader pointer in round-robin fashion. */
-		gatherstate->nextreader =
-			(gatherstate->nextreader + 1) % gatherstate->nreaders;
-
 		/* If we got a tuple, return it. */
 		if (tup)
 			return tup;
 
+		/*
+		 * Advance nextreader pointer in round-robin fashion.  Note that we
+		 * only reach this code if we weren't able to get a tuple from the
+		 * current worker.  We used to advance the nextreader pointer after
+		 * every tuple, but it turns out to be much more efficient to keep
+		 * reading from the same queue until that would require blocking.
+		 */
+		gatherstate->nextreader =
+			(gatherstate->nextreader + 1) % gatherstate->nreaders;
+
 		/* Have we visited every TupleQueueReader? */
 		if (gatherstate->nextreader == waitpos)
 		{
-- 
2.5.4 (Apple Git-61)

From 5deb0e7bc1d289616d6fc9b94b83377a47d221fe Mon Sep 17 00:00:00 2001
From: Robert Haas <[email protected]>
Date: Fri, 18 Dec 2015 10:06:21 -0500
Subject: [PATCH 3/3] Make Gather disuse_physical_tlist.

---
 src/backend/optimizer/plan/createplan.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 94e5f26..0c82529 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -558,7 +558,8 @@ use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
  * If the plan node immediately above a scan would prefer to get only
  * needed Vars and not a physical tlist, it must call this routine to
  * undo the decision made by use_physical_tlist().  Currently, Hash, Sort,
- * and Material nodes want this, so they don't have to store useless columns.
+ * Material, and Gather nodes want this, so they don't have to store or
+ * transfer useless columns.
  */
 static void
 disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
@@ -1123,6 +1124,8 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path)
 
 	subplan = create_plan_recurse(root, best_path->subpath);
 
+	disuse_physical_tlist(root, subplan, best_path->subpath);
+
 	gather_plan = make_gather(subplan->targetlist,
 							  NIL,
 							  best_path->path.parallel_degree,
-- 
2.5.4 (Apple Git-61)

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

Reply via email to