Hi Zhihong, On Mon, Aug 16, 2021 at 12:35 AM Zhihong Yu <z...@yugabyte.com> wrote: > > > > On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <zhihui.fan1...@gmail.com> wrote: >> >> Hi: >> >> I have finished the parts for baserel, joinrel, subquery, distinctrel. I >> think >> the hardest ones have been verified. Here is the design overview. >> >> 1. Use EC instead of expr to cover more UniqueKey cases. >> 2. Redesign the UniqueKey as below: >> >> @@ -246,6 +246,7 @@ struct PlannerInfo >> >> List *eq_classes; /* list of active EquivalenceClasses */ >> + List *unique_exprs; /* List of unique expr */ >> >> bool ec_merging_done; /* set true once ECs are canonical */ >> >> +typedef struct UniqueKey >> +{ >> + NodeTag type; >> + Bitmapset *unique_expr_indexes; >> + bool multi_nulls; >> +} UniqueKey; >> + >> >> PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of >> EquivalenceClass. for example: >> >> CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key); >> CREATE UNIQUE INDEX ON t1(a, b); >> >> SELECT DISTINCT * FROM T1 WHERE a = c; >> >> Then we would have PlannerInfo.unique_exprs as below >> [ >> [EC(a, c), EC(b)], >> [EC(pk)] >> ] >> >> RelOptInfo(t1) would have 2 UniqueKeys. >> UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false] >> UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false] >> >> The design will benefit many table joins cases. For instance a 10- tables >> join, >> each table has a primary key (a, b). Then we would have a UniqueKey like >> this. >> >> JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b} >> JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a >> t5.b} >> >> When more tables are joined, the list would be longer and longer, build the >> list >> consumes both CPU cycles and memory. >> >> With the method as above, we can store it as: >> >> root->unique_exprs = /* All the UniqueKey is stored once */ >> [ >> [t1.a, t1.b], -- EC is ignored in document. >> [t2.a, t2.b], >> [t3.a, t3.b], >> [t4.a, t4.b], >> [t5.a, t5.b], >> [t6.a, t6.b], >> [t7.a, t7.b], >> [t8.a, t8.b], >> [t9.a, t9.b], >> [t10.a, t10.b], >> ] >> >> JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword. >> JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword. >> >> The member in the bitmap is the index of root->unique_exprs rather than >> root->eq_classes because we need to store the SingleRow case in >> root->unqiue_exprs as well. >> >> 3. Define a new SingleRow node and use it in joinrel as well. >> >> +typedef struct SingleRow >> +{ >> + NodeTag type; >> + Index relid; >> +} SingleRow; >> >> SELECT * FROM t1, t2 WHERE t2.pk = 1; >> >> root->unique_exprs >> [ >> [t1.a, t1.b], >> SingleRow{relid=2} >> ] >> >> JoinRel{t1} - Bitmapset{0} >> JoinRel{t2} - Bitmapset{1} >> >> JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to >> dedicated >> exprs. >> >> 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible. >> >> The overall idea is similar with PathKey, I distinguish the UniqueKey for >> distinct purpose and useful_for_merging purpose. >> >> SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like >> root->query_pathkey. >> >> SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is >> use before t1 join t2, but not useful after it. >> >> Currently the known issue I didn't pass the "interesting UniqueKey" info to >> subquery well [2], I'd like to talk more about this when we discuss about >> UnqiueKey on subquery part. >> >> >> 5. relation_is_distinct_for >> >> Now I design the function as >> >> + bool >> + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List >> *distinct_pathkey) >> >> It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey >> has EC in it, and all the UniqueKey has EC as well. if so, the subset-like >> checking is very effective. As for the distinct/group as no-op case, we have >> pathkey all the time. The only drawback of it is some clauses are >> not-sortable, >> in this case, the root->distinct_pathkey and root->group_pathkey is not >> set. However it should be rare by practice, so I ignore this part for >> now. After all, I can have a relation_is_disticnt_for version for Exprs. I >> just >> not implemented it so far. >> >> 6. EC overhead in UnqiueKey & UNION case. >> >> Until now I didn't create any new EC for the UniqueKey case, I just used the >> existing ones. However I can't handle the case like >> >> SELECT a, b FROM t1 >> UNION >> SELECT x, y FROM t2; >> >> In this case, there is no EC created with existing code. and I don't want to >> create them for the UniqueKey case as well. so my plan is just not to handle >> the case for UNION. >> >> Since we need some big effort from the reviewer, I split the patch into many >> smaller chunks. >> >> Patch 1 / Patch 2: I just split some code which UniqueKey uses but not very >> interrelated. Splitting them out to reduce the core patch size. >> Patch 3: still the notnull stuff. This one doesn't play a big role overall, >> even if the design is changed at last, we can just modify some small stuff. >> IMO, >> I don't think it is a blocker issue to move on. >> Patch 4: Support the UniqueKey for baserel. >> Patch 5: Support the UniqueKey for joinrel. >> Patch 6: Support the UniqueKey for some upper relation, like distinctrel, >> groupby rel. >> >> I'd suggest moving on like this: >> 1. Have an overall review to see if any outstanding issues the patch has. >> 2. If not, we can review and commit patch 1 & patch 2 to reduce the patch >> size. >> 3. Decide which method to take for not null stuff. IMO Tom's method >> would be a bit >> complex and the benefit is not very clear to me[1]. So the choices >> include: a). move on UniqueKey stuff until Tom's method is ready. b). Move >> on the UniqueKey with my notnull way, and changes to Tom's method when >> necessary. I prefer method b). >> 4. Review & Commit the UniqueKey for BaseRel part. >> 5. Review & Commit the UniqueKey for JoinRel part. >> 6. Review & Commit the UniqueKey for SubQuery part *without* the Interesting >> UniqueKey well handled. >> 7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting >> UniqueKey well handled. >> 8. Discuss about the UniqueKey on partitioned rel case and >> develop/review/commit >> it. >> 9. Apply UniqueKey stuff on more user cases rather than DISTINCT. >> >> What do you think about this? >> >> [1] >> https://www.postgresql.org/message-id/CAApHDvo5b2pYX%2BdbPy%2BysGBSarezRSfXthX32TZNFm0wPdfKGQ%40mail.gmail.com >> [2] >> https://www.postgresql.org/message-id/CAKU4AWo6-%3D9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN%3D%2Bhb_w%40mail.gmail.com >> >> >> Thanks > > Hi, > For v3-0005-Support-UniqueKey-on-JoinRel.patch : > > +static void populate_joinrel_composited_uniquekey(PlannerInfo *root, > > populate_joinrel_composited_uniquekey -> populate_joinrel_composite_uniquekey > (without the trailing d for composite) > > For populate_joinrel_uniquekeys(): > > + foreach(lc, outerrel->uniquekeys) > + { > ... > + return; > > Should the remaining unique keys be considered ? > > For populate_joinrel_uniquekey_for_rel(): > > + else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != > NULL) > + { > + other_ecs = lappend(other_ecs, r->right_ec); > + other_relids = bms_add_members(other_relids, r->left_relids); > > It seems the append to other_ecs is the same as the one for the > `bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is this > correct ? >
Correct, both of them are bugs, I will fix them in the next version, including the "tailing d". Thanks for your review! -- Best Regards Andy Fan (https://www.aliyun.com/)