Re: [HACKERS] Performance improvement for joins where outer side is unique

David Rowley Sat, 14 Mar 2015 03:06:49 -0700

On 14 March 2015 at 14:51, David Rowley <[email protected]> wrote:

> On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <
> [email protected]> wrote:
>
>> Unfortunately I can't decide this to be 'ready for commiter' for
>>
> now. I think we should have this on smaller footprint, in a
>> method without separate exhauxtive searching. I also had very
>> similar problem in the past but I haven't find such a way for my
>> problem..
>>
>>
> I don't think it's ready yet either. I've just been going over a few
> things and looking at Tom's recent commit b557226 in pathnode.c I've got a
> feeling that this patch would need to re-factor some code that's been
> modified around the usage of relation_has_unique_index_for() as when this
> code is called, the semi joins have already been analysed to see if they're
> unique, so it could be just a case of ripping all of that out
> of create_unique_path() and just putting a check to say rel->is_unique_join
> in there. But if I do that then I'm not quite sure if
> SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would
> still be needed at all. These were only added in b557226. Changing this
> would help reduce the extra planning time when the query contains
> semi-joins. To be quite honest, this type of analysis belongs in
> analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd
> rather Tom had a quick glance at what I'm trying to do here first.
>
>

I decided to hack away any change the code Tom added in b557226. I've
changed it so that create_unique_path() now simply just uses if
(rel->is_unique_join), instead off all the calls to
relation_has_unique_index_for() and query_is_distinct_for().  This vastly
simplifies that code. One small change is that Tom's checks for uniqueness
on semi joins included checks for volatile functions, this check didn't
exist in the original join removal code, so I've left it out. We'll never
match a expression with a volatile function to a unique index as indexes
don't allow volatile function expressions anyway. So as I understand it
this only serves as a fast path out if the join condition has a volatile
function... But I'd assume that check is not all that cheap.

I ended up making query_supports_distinctness() and query_is_distinct_for()
static in analyzejoins.c as they're not used in any other files.
relation_has_unique_index_for() is also now only used in analyzejoins.c,
but I've not moved it into that file yet as I don't want to bloat the
patch. I just added a comment to say it needs moved.

I've also added a small amount of code to query_is_distinct_for() which
allows subqueries such as (select 1 a offset 0) to be marked as unique. I
thought it was a little silly that these were not being detected as unique,
so I fixed it. This has the side effect of allowing left join removals for
queries such as: select t1.* from t1 left join (select 1 a offset 0) a on
t1.id=a.a;

Updated patch attached.

Regards

David Rowley

unijoin_2015-03-14_81bd96a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

Reply via email to