Re: UniqueKey v2

Antonin Houska Tue, 04 Jun 2024 07:14:42 -0700

Andy Fan <[email protected]> wrote:
> Antonin Houska <[email protected]> writes:
> 
> >> Could you make the reason clearer for adding 'List *opfamily_lists;'
> >> into UniqueKey?  You said "This is needed to create ECs in the parent
> >> query if the upper relation represents a subquery." but I didn't get the
> >> it. Since we need to maintain the UniqueKey in the many places, I'd like
> >> to keep it as simple as possbile. Of course, anything essentical should
> >> be added for sure. 
> >
> > If unique keys are generated for a subquery output, they also need to be
> > created for the corresponding relation in the upper query ("sub" in the
> > following example):
> 
> OK.
> >
> > select * from tab1 left join (select * from tab2) sub;
> >
> > However, to create an unique key for "sub", you need an EC for each 
> > expression
> > of the key.
> 
> OK.
> > And to create an EC, you in turn need the list of operator
> > families.
> 
> I'm thinking if we need to "create" any EC. Can you find out a user case
> where the outer EC is missed and the UniqueKey is still interesting? I
> don't have an example now.  
> 
> convert_subquery_pathkeys has a similar sistuation and has the following
> codes:
> 
>                               outer_ec =
>                                       get_eclass_for_sort_expr(root,
>                                                                               
>          (Expr *) outer_var,
>                                                                               
>          sub_eclass->ec_opfamilies,
>                                                                               
>          sub_member->em_datatype,
>                                                                               
>          sub_eclass->ec_collation,
>                                                                               
>          0,
>                                                                               
>          rel->relids,
>                                                                               
>          NULL,
>                                                                               
>          false);
> 
>                               /*
>                                * If we don't find a matching EC, sub-pathkey 
> isn't
>                                * interesting to the outer query
>                                */
>                               if (outer_ec)
>                                       best_pathkey =
>                                               make_canonical_pathkey(root,
>                                                                               
>            outer_ec,
>                                                                               
>            sub_pathkey->pk_opfamily,
>                                                                               
>            sub_pathkey->pk_strategy,
>                                                                               
>            sub_pathkey->pk_nulls_first);
>                       }


I think that convert_subquery_pathkeys() just does not try that hard to
achieve its goal.

The example where it's important to create the EC in the outer query is what I
added to the subselect.sql regression test in the 0004- diff in [1]:

create table tabx as select * from generate_series(1,100) idx;
create table taby as select * from generate_series(1,100) idy;
create unique index on taby using btree (idy);
create view view_barrier with (security_barrier=true) as select * from taby;
analyze tabx, taby;
explain (costs off, verbose on) select * from tabx x left join view_barrier y 
on idy = idx;

If you modify find_ec_position_matching_expr() to return -1 instead of
creating the EC, you will get this plan

Hash Left Join
   Output: x.idx, taby.idy
   Hash Cond: (x.idx = taby.idy)
   ->  Seq Scan on public.tabx x
         Output: x.idx
   ->  Hash
         Output: taby.idy
         ->  Seq Scan on public.taby
               Output: taby.idy

instead of this

Hash Left Join
   Output: x.idx, taby.idy
   Inner Unique: true
   Hash Cond: (x.idx = taby.idy)
   ->  Seq Scan on public.tabx x
         Output: x.idx
   ->  Hash
         Output: taby.idy
         ->  Seq Scan on public.taby
               Output: taby.idy

> >> > * uniquekey_useful_for_merging()
> >> >
> >> >   How does uniqueness relate to merge join? In README.uniquekey you seem 
> >> > to
> >> >   point out that a single row is always sorted, but I don't think this
> >> >   function is related to that fact. (Instead, I'd expect that pathkeys 
> >> > are
> >> >   added to all paths for a single-row relation, but I'm not sure you do 
> >> > that
> >> >   in the current version of the patch.)
> >> 
> >> The merging is for "mergejoinable join clauses", see function
> >> eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a";
> >
> > My question is: why is the uniqueness important specifically to merge join? 
> > I
> > understand that join evaluation can be more efficient if we know that one
> > input relation is unique (i.e. we only scan that relation until we find the
> > first match), but this is not specific to merge join.
> 
> So the answer is the "merging" in uniquekey_useful_for_merging() has
> nothing with merge join. 

I don't understand. The function comment does mention merge join:

/*
 * uniquekey_useful_for_merging
 *      Check if the uniquekey is useful for mergejoins above the given 
relation.
 *
 * similar with pathkeys_useful_for_merging.
 */
static bool
uniquekey_useful_for_merging(PlannerInfo *root, UniqueKey * ukey, RelOptInfo 
*rel)


[1] https://www.postgresql.org/message-id/7971.1713526758%40antos

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: UniqueKey v2

Reply via email to