On Tue, Jul 6, 2021 at 5:34 PM David Rowley <dgrowle...@gmail.com> wrote:
>
> On Sun, 4 Jul 2021 at 02:08, Andy Fan <zhihui.fan1...@gmail.com> wrote:
> >  I'd start to work on UniqueKey again, it would be great that we can target 
> > it
> >  to PG 15. The attached patch is just for the notnull_attrs. Since we can't 
> > say
> >  a column is nullable or not without saying in which resultset, So I think 
> > attaching
> > it to RelOptInfo is unavoidable. Here is how my patch works.
>
> I'd also like to see this work progress for PG15.


Thank you David!

I am re-designing/implementing the UniqueKey, but it is better to have
a design review as soon as possible. This writing is for that. To make the
review easier, I also uploaded my in-completed patch (Correct, runable
with testcase).

Main changes are:
1. Use EC instead of expr, to cover more UniqueKey case.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo
* subquery outputs */

List   *eq_classes; /* list of active EquivalenceClasses */
+ List   *unique_exprs; /* List of unique expr */

  bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs.  Unique Exprs is a set of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT,  pk INT primary key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For example, 10 tables
join. Each table has a primary key (a, b).  Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4, 5} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}

This would be memory consuming and building such UniqueKey is CPU consuming as
well. With the new design, we can store it as

PlannerInfo.unique_exprs =
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} -  Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} -  Bitmapset{0,1,2,3,4} -- one bitmapword.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

PlannerInfo.unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}
JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
exprs.

4. Cut the useless UniqueKey totally on the baserel stage based on
   root->distinct_pathkey.  If we want to use it anywhere else, I think this
   design is OK as well. for example: group by UniqueKey.

5. Implemented the relation_is_distinct_for(root, rel, distinct_pathkey)
   effectively.  Here I used distinct_pathkey rather than
   Query->distinctClause.

Since I implemented the EC in PlannerInfo.unique_exprs point to the
PathKey.pk_eqclass, so we can compare the address directly with '=', rather than
equal(a, b). (since qual would check the address as well, so even I use equal,
the performance is good as well). SingleRow is handled as well for this case.

You can check the more details in the attached patch.  Any feedback is welcome.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachment: v1-0001-design-uniquekey-v2.patch
Description: Binary data

Reply via email to