Re: [HACKERS] Push down more full joins in postgres_fdw

Etsuro Fujita Tue, 25 Oct 2016 05:27:55 -0700

On 2016/10/25 18:58, Ashutosh Bapat wrote:

You wrote:

13. The comment below is missing the main purpose i.e. creating a a unique
alias, in case the relation gets converted into a subquery. Lowest or
highest
relid will create a unique alias at given level of join and that would be
more
future proof. If we decide to consider paths for every join order,
following
comment will no more be true.
+    /*
+     * Set the relation index.  This is defined as the position of this
+     * joinrel in the join_rel_list list plus the length of the rtable
list.
+     * Note that since this joinrel is at the end of the list when we are
+     * called, we can get the position by list_length.
+     */
+    fpinfo->relation_index =
+        list_length(root->parse->rtable) +
list_length(root->join_rel_list);


I wrote:

I don't agree on that point.  As I said before, the approach using the
lowest/highest relid would make a remote query having nested subqueries
difficult to read.  And even if we decided to consider paths for every join
order, we could define the relation_index safely, by modifying
postgresGetForeignJoinPaths so that it (1) sets the relation_index the
proposed way at the first time it is called and (2) avoids setting the
relation_index after the first call, by checking to see
fpinfo->relation_index > 0.

OK. Although, the alias which contains a number, which apparently
doesn't show any relation with the relation (no pun intended :)) being
deparsed, won't make much sense in the deparsed query. But I am fine
leaving this to the committer's judgement.


Fine with me.

May be you want to add an
assert here making sure that relation_index is set only once.
Something like Assert(fpinfo->relation_index == 0) just before setting
it.


Will do.

15. While deparsing every Var, we are descending the RelOptInfo hierarchy.

Right.  I think that not only avoids creating redundant data to find the
alias of a given Var node but also would be able to find it in optimal cost.

Instead of that, we should probably gather all the alias information once
and
store it in context as an array indexed by relid. While deparsing a Var we
can
get the targetlist and alias for a given relation by using var->varno as
index.
And then search for given Var node in the targetlist to deparse the column
name
by its position in the targetlist. For the relations that are not
converted
into subqueries, this array will not hold any information and the Var will
be
deparsed as it's being done today.

Sorry, I don't understand this part.  How does that work when creating a
remote query having nested subqueries?  Is that able to search for a given
Var node efficiently?

For every relation that is deparsed as a subquery, we will create a
separate context. Each such context will have an array described
above. This array will contain the targetlist and aliases for all the
relations, covered by that relation, that are required to be deparsed
as subqueries. These arrays bubble up to topmost relation that is not
required to be deparsed as subquery. When a relation is required to be
deparsed as a subquery, its immediate upper relation sets the alias
and targetlist chosen for that relation at all the indexes
corresponding to all the base relations covered by that relation.


Thanks for the explanation!

For
example, let's assume that a relation (1, 2, 3) is required to be
deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
(thus the other joining relation being (4,5)). While deparsing for
relation (1,2,3,4,5), the context will contain a 5 element array, with
positions 1, 2, 3 filled by same targetlist and alias whereas
positions 4 and 5 will not be filled as those relations are not
deparsed as subqueries.

Sorry, I don't understand this. In that case, the immediate upperrelation (1, 2, 3, 4, 5) would need to fill the targetlist and aliasesfor *the join relation (1, 2, 3) somewhere*, not the targetlist andaliases for each of the component relations 1, 2, and 3, because thejoin relation is deparsed as a subquery. Maybe I'm missing something,though.

Let's assume in relation (1, 2, 3), (1, 3) in
turn requires subquery but (2) does not. Thus the context created
while deparsing (1, 2, 3) will have a 3 element array with positions 1
and 3 containing the same targetlist and alias, where as position 2
will be empty.

When deparsing a Var node with varno = N and varattno =
m, if the nth position in the array in the context is empty, that Var
node will be deparsed as rN.<column name>.

What happens when deparsing eg, a Var with varno = 2 at the topmostrelation (1, 2, 3, 4, 5)? The second position of the array is empty,but the join relation (1, 2, 3) is deparsed as a subquery, so the Varshould be deparsed as an alias of an output column of the subquery atthe topmost relation, I think.

But if that position is has
alias sZ, then we search for that Var node in the targetlist and if
it's found at kth position in the targetlist, we will deparse it as
sZ.ck. The search in the targetlist can be performed using
tlist_member, and then fetching the position by TargetEntry::resno.

This does not require any recursion and thus saves stack space and
some CPU cycles required for recursion.


Is that true?

some CPU cycles required for recursion. I guess, the arrays need to be
computed only once for any relation when the query for that relation
is deparsed the first time.

Does this algorithm extend to the case where we consider paths for everyjoin order?

Thanks for considering other suggestions.


Your suggestions are really helpful to improve the patch.  Thanks!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

Reply via email to