Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

Etsuro Fujita Wed, 04 Jul 2018 05:06:59 -0700

(2018/07/04 19:04), Ashutosh Bapat wrote:

On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
<[email protected]>  wrote:

(2018/06/22 22:54), Ashutosh Bapat wrote:

+       if (enable_partitionwise_join&&   rel->top_parent_is_partitioned)
+       {
+           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
+       }


Why do we need rel->top_parent_is_partitioned? If a relation is
partitioned (if (rel->part_scheme), it's either the top parent or is
partition of some other partitioned table. In either case this
condition will be true.



This would be needed to avoid unnecessarily applying build_childrel_tlist to
child rels of a partitioned table for which we don't consider partitionwise
join.  Consider:

postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
CREATE TABLE
postgres=# create table lpt_p1 partition of lpt for values in (1);
CREATE TABLE
postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
CREATE TABLE
postgres=# create table test (c1 int, c2 text);
CREATE TABLE
postgres=# explain verbose select * from (select * from lpt union all select
* from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);


--- plan clipped


In this example, the top parent is not a partitioned table and we don't need
to consider partitionwise join for that, so we don't need to apply that to
the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).


FWIW, the test is not sufficient. In the above example, a simple
select * from lpt inner join test where lpt.c1 = test.c1 would not use
partition-wise join technique. Why not to avoid build_childrel_tlist()
in that case as well?

I might misunderstand your words, but in the above example the patchdoesn't apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason forthat is because we can avoid adjusting the tlists for the correspondingsubplans at plan creation time so that whole-row Vars in the tlists aretransformed into CREs. I think the overhead of the adjustment is notthat big, but not zero, so it would be worth avoiding applyingbuild_childrel_tlist to partitions if the top parent won't participatein a partitionwise-join at all.

Worst we can change the criteria to use
partition-wise join in future e.g. above case would use partition-wise
join by 1. merging lpt_p1 into corresponding partition of lpt so that
ss is partitioned and 2. repartitioning test or joining test with each
partition of lpt separately. In those cases the changes you are doing
here need to be reverted and put somewhere else. There's already a
patch to reverse the order of join and grouping out there. How would
this work with that.


Interesting, but that seems like a matter of PG12+.

In general I think set_append_rel_size() or build_simple_rel() is not
the place where we should decide whether the given relation will
participate in a PWJ. Also we should not selectively add a
ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
We should do it for all the child rels or shouldn't do it at all.

One thing I thought was to apply build_childrel_tlist for all the childrels whether its top parent is a partitioned table or not, but as Imentioned above, that would incur needless overhead for adjusting thetlists for the child rels that don't involve in a partitionwise joinsuch as lpt_p1 and lpt_p2, which I think is not good.

An
in-between state will produce a hell lot of confusion for any further
optimization. Whenever we change the code around partition-wise
operations in future, we will have to check whether or not a given
child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
have mentioned earlier, I am also not comfortable with the targetlist
of child relations being type inconsistent with that of the parent,
which is a fundamental rule in inheritance. Worst keep them
inconsistent during path creation and make them consistent at the time
of creating plans. A child's whole-row Var has datatype of the child
where as that of parent has datatype of parent.

I don't see any critical issue here. Could you elaborate a bit more onthat point?

A ConvertRowtypeExpr
is used to fix this inconsistency. That's why I chose to use
pull_var_clause() as a place to fix the problem and not fix
ConvertRowtypeExpr in targetlist itself.

I think the biggest issue in the original patch you proposed is that wespend extra cycles where partitioning is not involved, which is thebiggest reason why I think the original patch isn't the right way to go.

I am also not comfortable in just avoiding ConvertRowtypeExprs in
targetlist and leave them as is in the conditions and ECs etc. This
means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
targetlist will fail at the time of path creation but will succeed at
the time of plan creation.

This is turning more invasive that my approach of fixing it through PVC.

Sorry, I don't understand this. Could you show me places where theproblem happens?

+   /* No work if the child plan is an Append or MergeAppend */
+   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
+       return;

Why? Probably it's related to the answer to the first question, But I
don't see the connection. Given that partition-wise join will be
applicable level by level, we need to recurse in
adjust_subplan_tlist().



I don't think so; I think if the child plan is an Append or MergeAppend, the
tlist for each subplan of the Append or MergeAppend would have already been
adjusted while create_plan_recurse before we are called here.


Ok. Thanks for the clarification. I think we should add a comment.


Will do.

+   /* The child plan should be able to do projection */
+   Assert(is_projection_capable_plan(subplan));
+
Again why? A MergeAppend's subplan could be a Sort plan, which will
not be projection capable.



IIUC, since we are called here right after create_plan_recurse, the child
plan would be a scan or join unless it's neither Append nor MergeAppend.  So
it should be projection-capable.  Maybe I'm missing something, though.


That's not true. add_paths_to_append_rel() adds sort paths on scan if
necessary and creates merge append path.

Really? I think create_merge_append_path called fromgenerate_mergeappend_paths called from add_paths_to_append_rel uses adummy sort path just to compute the cost, but I don't thinkcreate_merge_append_path (or generate_mergeappend_paths oradd_paths_to_append_rel) insert a sort path to a scan (or join) path.


Thanks for the comments!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

Reply via email to