Amit-san,

(2019/01/09 9:30), Amit Langote wrote:
(sorry about the repeated email, but my previous attempt failed due to
trying to send to the -hackers and -performance lists at the same time, so
trying again after removing -performance)

Thanks!  (Actually, I also failed to send my post to those lists...)

On 2019/01/08 20:07, Etsuro Fujita wrote:
(2018/12/07 20:14), Ashutosh Bapat wrote:
On Fri, Dec 7, 2018 at 11:13 AM Ashutosh Bapat
<ashutosh.bapat....@gmail.com<mailto:ashutosh.bapat....@gmail.com>>  wrote:

         Robert, Ashutosh, any comments on this?  I'm unfamiliar with the
         partitionwise join code.

     As the comment says it has to do with the equivalence classes being
     used during merge append. EC's are used to create pathkeys used for
     sorting. Creating a sort node which has column on the nullable side
     of an OUTER join will fail if it doesn't find corresponding
     equivalence class. You may not notice this if both the partitions
     being joined are pruned for some reason. Amit's idea to make
     partition-wise join code do this may work, but will add a similar
     overhead esp. in N-way partition-wise join once those equivalence
     classes are added.

I looked at the patch. The problem there is that for a given relation,
we will add child ec member multiple times, as many times as the number
of joins it participates in. We need to avoid that to keep ec_member
list length in check.

Amit-san, are you still working on this, perhaps as part of the
speeding-up-planning-with-partitions patch [1]?

I had tried to continue working on it after PGConf.ASIA last month, but
got distracted by something else.

So, while the patch at [1] can take care of this issue as I also mentioned
upthread, I was trying to come up with a solution that can be back-patched
to PG 11.  The patch I posted above is one such solution and as Ashutosh
points out it's perhaps not the best, because it can result in potentially
creating many copies of the same child EC member if we do it in joinrel.c,
as the patch proposes.  I will try to respond to the concerns he raised in
the next week if possible.

Thanks for working on this!

I like your patch in general. I think one way to address Ashutosh's concerns would be to use the consider_partitionwise_join flag: originally, that was introduced for partitioned relations to show that they can be partitionwise-joined, but I think that flag could also be used for non-partitioned relations to show that they have been set up properly for partitionwise-joins, and I think by checking that flag we could avoid creating those copies for child dummy rels in try_partitionwise_join. Please find attached an updated version of the patch. I modified your version so that building tlists for child dummy rels are also postponed until after they actually participate in partitionwise-joins, to avoid that possibly-useless work as well. I haven't done any performance tests yet though.

Best regards,
Etsuro Fujita
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 256fe16..feccc60 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1017,38 +1017,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
 
 		/*
-		 * Copy/Modify targetlist. Even if this child is deemed empty, we need
-		 * its targetlist in case it falls on nullable side in a child-join
-		 * because of partitionwise join.
-		 *
-		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
-		 * expressions, which otherwise would not occur in a rel's targetlist.
-		 * Code that might be looking at an appendrel child must cope with
-		 * such.  (Normally, a rel's targetlist would only include Vars and
-		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
-		 * fields of childrel->reltarget; not clear if that would be useful.
-		 */
-		childrel->reltarget->exprs = (List *)
-			adjust_appendrel_attrs(root,
-								   (Node *) rel->reltarget->exprs,
-								   1, &appinfo);
-
-		/*
-		 * We have to make child entries in the EquivalenceClass data
-		 * structures as well.  This is needed either if the parent
-		 * participates in some eclass joins (because we will want to consider
-		 * inner-indexscan joins on the individual children) or if the parent
-		 * has useful pathkeys (because we should try to build MergeAppend
-		 * paths that produce those sort orderings). Even if this child is
-		 * deemed dummy, it may fall on nullable side in a child-join, which
-		 * in turn may participate in a MergeAppend, where we will need the
-		 * EquivalenceClass data structures.
-		 */
-		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
-		childrel->has_eclass_joins = rel->has_eclass_joins;
-
-		/*
 		 * We have to copy the parent's quals to the child, with appropriate
 		 * substitution of variables.  However, only the baserestrictinfo
 		 * quals are needed before we can check for constraint exclusion; so
@@ -1186,11 +1154,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			continue;
 		}
 
-		/* CE failed, so finish copying/modifying join quals. */
+		/*
+		 * CE failed, so finish copying/modifying targetlist and join quals.
+		 *
+		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+		 * expressions, which otherwise would not occur in a rel's targetlist.
+		 * Code that might be looking at an appendrel child must cope with
+		 * such.  (Normally, a rel's targetlist would only include Vars and
+		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
+		 * fields of childrel->reltarget; not clear if that would be useful.
+		 */
 		childrel->joininfo = (List *)
 			adjust_appendrel_attrs(root,
 								   (Node *) rel->joininfo,
 								   1, &appinfo);
+		childrel->reltarget->exprs = (List *)
+			adjust_appendrel_attrs(root,
+								   (Node *) rel->reltarget->exprs,
+								   1, &appinfo);
+
+		/*
+		 * We have to make child entries in the EquivalenceClass data
+		 * structures as well.  This is needed either if the parent
+		 * participates in some eclass joins (because we will want to consider
+		 * inner-indexscan joins on the individual children) or if the parent
+		 * has useful pathkeys (because we should try to build MergeAppend
+		 * paths that produce those sort orderings).
+		 */
+		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+			add_child_rel_equivalences(root, appinfo, rel, childrel);
+		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
 		 * Note: we could compute appropriate attr_needed data for the child's
@@ -1204,8 +1197,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * If we consider partitionwise joins with the parent rel, do the same
 		 * for partitioned child rels.
 		 */
-		if (rel->consider_partitionwise_join &&
-			childRTE->relkind == RELKIND_PARTITIONED_TABLE)
+		if (rel->consider_partitionwise_join)
 			childrel->consider_partitionwise_join = true;
 
 		/*
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 67b9ca8..6645c5a 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -44,6 +44,8 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 					   RelOptInfo *rel2, RelOptInfo *joinrel,
 					   SpecialJoinInfo *parent_sjinfo,
 					   List *parent_restrictlist);
+static void update_child_rel_info(PlannerInfo *root,
+					  RelOptInfo *rel, RelOptInfo *childrel);
 static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
 							 bool strict_op);
 
@@ -1312,6 +1314,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 					   RelOptInfo *joinrel, SpecialJoinInfo *parent_sjinfo,
 					   List *parent_restrictlist)
 {
+	bool		rel1_is_simple = IS_SIMPLE_REL(rel1);
+	bool		rel2_is_simple = IS_SIMPLE_REL(rel2);
 	int			nparts;
 	int			cnt_parts;
 
@@ -1376,6 +1380,27 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		AppendRelInfo **appinfos;
 		int			nappinfos;
 
+		/*
+		 * If a child table has consider_partitionwise_join=false, it means
+		 * that it's a dummy relation for which we skipped setting up tlist
+		 * expressions and adding EC members in set_append_rel_size(), so do
+		 * that now for use later.
+		 */
+		if (rel1_is_simple && !child_rel1->consider_partitionwise_join)
+		{
+			Assert(child_rel1->reloptkind == RELOPT_OTHER_MEMBER_REL);
+			Assert(IS_DUMMY_REL(child_rel1));
+			update_child_rel_info(root, rel1, child_rel1);
+			child_rel1->consider_partitionwise_join = true;
+		}
+		if (rel2_is_simple && !child_rel2->consider_partitionwise_join)
+		{
+			Assert(child_rel2->reloptkind == RELOPT_OTHER_MEMBER_REL);
+			Assert(IS_DUMMY_REL(child_rel2));
+			update_child_rel_info(root, rel2, child_rel2);
+			child_rel2->consider_partitionwise_join = true;
+		}
+
 		/* We should never try to join two overlapping sets of rels. */
 		Assert(!bms_overlap(child_rel1->relids, child_rel2->relids));
 		child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids);
@@ -1418,6 +1443,28 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 }
 
 /*
+ * Set up tlist expressions for the childrel, and add EC members referencing
+ * the childrel.
+ */
+static void
+update_child_rel_info(PlannerInfo *root,
+					  RelOptInfo *rel, RelOptInfo *childrel)
+{
+	AppendRelInfo *appinfo = root->append_rel_array[childrel->relid];
+
+	/* Make child tlist expressions */
+	childrel->reltarget->exprs = (List *)
+		adjust_appendrel_attrs(root,
+							   (Node *) rel->reltarget->exprs,
+							   1, &appinfo);
+
+	/* Make child entries in the EquivalenceClass as well */
+	if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+		add_child_rel_equivalences(root, appinfo, rel, childrel);
+	childrel->has_eclass_joins = rel->has_eclass_joins;
+}
+
+/*
  * Returns true if there exists an equi-join condition for each pair of
  * partition keys from given relations being joined.
  */

Reply via email to