Re: [HACKERS] Performance improvement for joins where outer side is unique

Tomas Vondra Wed, 16 Dec 2015 08:04:24 -0800

Hi,

On 12/16/2015 01:27 AM, David Rowley wrote:


I've attached a rebased patch against current master as there were
some conflicts from the recent changes to LATERAL join.


Thanks. I've looked at the rebased patch and have a few minor comments.

0) I know the patch does not tweak costing - any plans in this
   direction? Would it be possible to simply use the costing used by
   semijoin?

1) nodeHashjoin.c (and other join nodes)

I've noticed we have this in the ExecHashJoin() method:

 /*
  * When the inner side is unique or we're performing a
  * semijoin, we'll consider returning the first match, but
  * after that we're done with this outer tuple.
  */
 if (node->js.unique_inner)
     node->hj_JoinState = HJ_NEED_NEW_OUTER;

That seems a bit awkward because the comment speaks about unique joins*OR* semijoins, but the check only references unique_inner. That ofcourse works because we do this in ExecInitHashJoin():


        switch (node->join.jointype)
        {
                case JOIN_SEMI:
                        hjstate->js.unique_inner = true;
                        /* fall through */

Either some semijoins may not be unique - in that case the naming ismisleading at best, and we either need to find a better name or simplycheck two conditions like this:


 if (node->js.unique_inner || node->join.type == JOIN_SEMI)
    node->hj_JoinState = HJ_NEED_NEW_OUTER;

Or all semijoins are unique joins, and in that case the comment may needrephrasing. But more importantly, it begs the question why we'redetecting this in the executor and not in the planner? Because if wedetect it in executor, we can't use this bit in costing, for example.

FWIW the misleading naming was actually mentioned in the thread byHoriguchi-san, but I don't see any response to that (might have missedit though).


2) analyzejoins.c

I see that this code in join_is_removable()

    innerrel = find_base_rel(root, innerrelid);

    if (innerrel->reloptkind != RELOPT_BASEREL)
        return false;

was rewritten like this:

    innerrel = find_base_rel(root, innerrelid);

    Assert(innerrel->reloptkind == RELOPT_BASEREL);

That suggests that previously the function expected cases wherereloptkind may not be RELOPT_BASEREL, but now it'll error out int suchcases. I haven't noticed any changes in the surrounding code that'dguarantee this won't happen, but I also haven't been able to come upwith an example triggering the assert (haven't been trying too hard).How do we know the assert() makes sense?



3) joinpath.c

- either "were" or "been" seems redundant

/* left joins were already been analyzed for uniqueness inmark_unique_joins() */



4) analyzejoins.c

comment format broken

/*
 * mark_unique_joins
        Analyze joins in order to determine if their inner side is unique based
        on the join condition.
 */

5) analyzejoins.c

missing line before the comment

}
/*
 * rel_supports_distinctness
 *              Returns true if rel has some properties which can prove the 
relation
 *              to be unique over some set of columns.
 *

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

Reply via email to