On Tue, Jul 6, 2021 at 5:34 PM David Rowley <dgrowle...@gmail.com> wrote: > > On Sun, 4 Jul 2021 at 02:08, Andy Fan <zhihui.fan1...@gmail.com> wrote: > > I'd start to work on UniqueKey again, it would be great that we can target > > it > > to PG 15. The attached patch is just for the notnull_attrs. Since we can't > > say > > a column is nullable or not without saying in which resultset, So I think > > attaching > > it to RelOptInfo is unavoidable. Here is how my patch works. > > I'd also like to see this work progress for PG15.
Thank you David! I am re-designing/implementing the UniqueKey, but it is better to have a design review as soon as possible. This writing is for that. To make the review easier, I also uploaded my in-completed patch (Correct, runable with testcase). Main changes are: 1. Use EC instead of expr, to cover more UniqueKey case. 2. Redesign the UniqueKey as below: @@ -246,6 +246,7 @@ struct PlannerInfo * subquery outputs */ List *eq_classes; /* list of active EquivalenceClasses */ + List *unique_exprs; /* List of unique expr */ bool ec_merging_done; /* set true once ECs are canonical */ +typedef struct UniqueKey +{ + NodeTag type; + Bitmapset *unique_expr_indexes; + bool multi_nulls; +} UniqueKey; + PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of EquivalenceClass. for example: CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key); CREATE UNIQUE INDEX ON t1(a, b); SELECT DISTINCT * FROM T1 WHERE a = c; Then we would have PlannerInfo.unique_exprs as below [ [EC(a, c), EC(b)], [EC(pk)] ] RelOptInfo(t1) would have 2 UniqueKeys. UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false] UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false] The design will benefit many table joins cases. For example, 10 tables join. Each table has a primary key (a, b). Then we would have a UniqueKey like this. JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b} JoinRel{1,2,3,4, 5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b} This would be memory consuming and building such UniqueKey is CPU consuming as well. With the new design, we can store it as PlannerInfo.unique_exprs = [ [t1.a, t1.b], -- EC is ignored in document. [t2.a, t2.b], [t3.a, t3.b], [t4.a, t4.b], [t5.a, t5.b], [t6.a, t6.b], [t7.a, t7.b], [t8.a, t8.b], [t9.a, t9.b], [t10.a, t10.b], ] JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword. JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword. 3. Define a new SingleRow node and use it in joinrel as well. +typedef struct SingleRow +{ + NodeTag type; + Index relid; +} SingleRow; SELECT * FROM t1, t2 WHERE t2.pk = 1; PlannerInfo.unique_exprs [ [t1.a, t1.b], SingleRow{relid=2} ] JoinRel{t1} - Bitmapset{0} JoinRel{t2} - Bitmapset{1} JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated exprs. 4. Cut the useless UniqueKey totally on the baserel stage based on root->distinct_pathkey. If we want to use it anywhere else, I think this design is OK as well. for example: group by UniqueKey. 5. Implemented the relation_is_distinct_for(root, rel, distinct_pathkey) effectively. Here I used distinct_pathkey rather than Query->distinctClause. Since I implemented the EC in PlannerInfo.unique_exprs point to the PathKey.pk_eqclass, so we can compare the address directly with '=', rather than equal(a, b). (since qual would check the address as well, so even I use equal, the performance is good as well). SingleRow is handled as well for this case. You can check the more details in the attached patch. Any feedback is welcome. -- Best Regards Andy Fan (https://www.aliyun.com/)
v1-0001-design-uniquekey-v2.patch
Description: Binary data