Re: [HACKERS] Push down more full joins in postgres_fdw

Etsuro Fujita Thu, 08 Sep 2016 03:57:16 -0700

On 2016/09/06 22:07, Ashutosh Bapat wrote:

On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita
<[email protected] <mailto:[email protected]>> wrote:

    On 2016/08/22 15:49, Ashutosh Bapat wrote:
        1. deparsePlaceHolderVar looks odd - each of the deparse*
        function is
        named as deparse + <name of the parser node the string would parse
        into>. PlaceHolderVar is not a parser node, so no string is
        going to be
        parsed as a PlaceHolderVar. May be you want to name it as
        deparseExerFromPlaceholderVar or something like that.

    The name "deparseExerFromPlaceholderVar" seems long to me.  How
    about "deparsePlaceHolderExpr"?

There isn't any node with name PlaceHolderExpr.


I'll rename it to "deparseExerInPlaceholderVar", then.

        3. I think registerAlias stuff should happen really at the time of
        creating paths and should be stored in fpinfo. Without that it
        will be
        computed every time we deparse the query. This means every time
        we try
        to EXPLAIN the query at various levels in the join tree and once
        for the
        query to be sent to the foreign server.

    Hmm.  I think the overhead in calculating aliases would be
    negligible compared to the overhead in explaining each remote query
    for costing or sending the remote query for execution.  So, I
    created aliases in the same way as remote params created and stored
    into params_list in deparse_expr_cxt.  I'm not sure it's worth
    complicating the code.

We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse
the query. Creating them at the time of path creation and storing them
in fpinfo is not possible because a. those present in the joining
relations will conflict with each other and need some kind of
serialization at the time of deparsing b. those defer for differently
parameterized paths or paths with different pathkeys. We don't defer it
because it's very light on performance.

That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique
aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
can have alias r123 without conflicting with any other alias.

Hmm. But another concern about the approach of generating an subselectalias for a path, if needed, at the path creation time would be that thepath might be rejected by add_path, which would result in totallywasting cycles for generating the subselect alias for the path.

However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.

Exactly. As I said above, I think the overhead would be negligiblecompared to the overhead in explaining each remote query for costing orthe overhead in sending the final remote query for execution, though.

        5. The blocks related to inner and outer relations in
        deparseFromExprForRel() look same. We should probably separate
        that code
        out into a function and call it at two places.

    Done.

I see you have created function deparseOperandRelation() for the
same. I guess, this should be renamed as deparseRangeTblRef() since it
will create RangeTblRef node when parsed back.

OK, if there no opinions of others, I'll rename it to the name proposedby you, "deparseRangeTblRef".

        6.
        !     if (is_placeholder)
        !         errcontext("placeholder expression at position %d in
        select list",
        !                    errpos->cur_attno);
        A user wouldn't know what a placeholder expression is, there is
        no such
        term in the documentation. We have to device a better way to
        provide an
        error context for an expression in general.

    Though I proposed that, I don't think that it's that important to
    let users know that the expression is from a PHV.  How about just
    saying "expression", not "placeholder expression", so that we have
    the message "expression at position %d in select list" in the context?

Hmm, that's better than placeholder expression, but not as explanatory
as it should be since we won't be printing the "select list" in the
error context and user won't have a clue as to what error context is
about.

I don't think so. Consider an example of the conversion error message,which is from the regression test:

SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 ANDft1.c1 = 1;

ERROR:  invalid input syntax for integer: "foo"
CONTEXT:  whole-row reference to foreign table "ft1"

As shown in the example, the error message is displayed under a remotequery for execution. So, ISTM it's reasonable to print something like"expression at position %d in select list" in the context if anexpression in a PHV.

But I don't have any good suggestions right now. May be we should
print the whole expression? But that can be very long, I don't know.

ISTM that it's a bit too expensive to print the whole expression in theerror context.

This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some
system columns. To make reviews easy, I think we should split the patch
into two 1. supporting subqueries to be deparsed with support for one of
the above (I will suggest FULL join support) 2. support for the other.


OK, will try.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

Reply via email to