Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Andy Fan Sun, 15 Aug 2021 19:24:57 -0700

Hi Zhihong,

On Mon, Aug 16, 2021 at 12:35 AM Zhihong Yu <[email protected]> wrote:
>
>
>
> On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <[email protected]> wrote:
>>
>> Hi:
>>
>> I have finished the parts for baserel, joinrel, subquery, distinctrel. I 
>> think
>> the hardest ones have been verified.  Here is the design overview.
>>
>> 1. Use EC instead of expr to cover more UniqueKey cases.
>> 2. Redesign the UniqueKey as below:
>>
>> @@ -246,6 +246,7 @@ struct PlannerInfo
>>
>> List   *eq_classes; /* list of active EquivalenceClasses */
>> + List   *unique_exprs; /* List of unique expr */
>>
>>   bool ec_merging_done; /* set true once ECs are canonical */
>>
>> +typedef struct UniqueKey
>> +{
>> + NodeTag type;
>> + Bitmapset *unique_expr_indexes;
>> + bool multi_nulls;
>> +} UniqueKey;
>> +
>>
>> PlannerInfo.unique_exprs is a List of unique exprs.  Unique Exprs is a set of
>> EquivalenceClass. for example:
>>
>> CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT,  pk INT primary key);
>> CREATE UNIQUE INDEX ON t1(a, b);
>>
>> SELECT DISTINCT * FROM T1 WHERE a = c;
>>
>> Then we would have PlannerInfo.unique_exprs as below
>> [
>> [EC(a, c), EC(b)],
>> [EC(pk)]
>> ]
>>
>> RelOptInfo(t1) would have 2 UniqueKeys.
>> UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
>> UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]
>>
>> The design will benefit many table joins cases. For instance a 10- tables 
>> join,
>> each table has a primary key (a, b).  Then we would have a UniqueKey like
>> this.
>>
>> JoinRel{1,2,3,4} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
>> JoinRel{1,2,3,4,5} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a 
>> t5.b}
>>
>> When more tables are joined, the list would be longer and longer, build the 
>> list
>> consumes both CPU cycles and memory.
>>
>> With the method as above, we can store it as:
>>
>> root->unique_exprs =  /* All the UniqueKey is stored once */
>> [
>> [t1.a, t1.b], -- EC is ignored in document.
>> [t2.a, t2.b],
>> [t3.a, t3.b],
>> [t4.a, t4.b],
>> [t5.a, t5.b],
>> [t6.a, t6.b],
>> [t7.a, t7.b],
>> [t8.a, t8.b],
>> [t9.a, t9.b],
>> [t10.a, t10.b],
>> ]
>>
>> JoinRel{1,2,3,4} -  Bitmapset{0,1,2,3} -- one bitmapword.
>> JoinRel{1,2,3,4,5} -  Bitmapset{0,1,2,3,4} -- one bitmapword.
>>
>> The member in the bitmap is the index of root->unique_exprs rather than
>> root->eq_classes because we need to store the SingleRow case in
>> root->unqiue_exprs as well.
>>
>> 3. Define a new SingleRow node and use it in joinrel as well.
>>
>> +typedef struct SingleRow
>> +{
>> + NodeTag type;
>> + Index relid;
>> +} SingleRow;
>>
>> SELECT * FROM t1, t2 WHERE t2.pk = 1;
>>
>> root->unique_exprs
>> [
>> [t1.a, t1.b],
>> SingleRow{relid=2}
>> ]
>>
>> JoinRel{t1} - Bitmapset{0}
>> JoinRel{t2} - Bitmapset{1}
>>
>> JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to 
>> dedicated
>> exprs.
>>
>> 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible.
>>
>> The overall idea is similar with PathKey, I distinguish the UniqueKey for
>> distinct purpose and useful_for_merging purpose.
>>
>> SELECT DISTINCT pk FROM  t; -- OK, maintain it all the time, just like
>> root->query_pathkey.
>>
>> SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is
>> use before t1 join t2, but not useful after it.
>>
>> Currently the known issue I didn't pass the "interesting UniqueKey" info to
>> subquery well [2], I'd like to talk more about this when we discuss about
>> UnqiueKey on subquery part.
>>
>>
>> 5. relation_is_distinct_for
>>
>> Now I design the function as
>>
>> + bool
>> + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
>>   *distinct_pathkey)
>>
>> It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey
>> has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
>> checking is very effective.  As for the distinct/group as no-op case, we have
>> pathkey all the time. The only drawback of it is some clauses are 
>> not-sortable,
>> in this case, the root->distinct_pathkey and root->group_pathkey is not
>> set. However it should be rare by practice, so I ignore this part for
>> now. After all, I can have a relation_is_disticnt_for version for Exprs. I 
>> just
>> not implemented it so far.
>>
>> 6. EC overhead in UnqiueKey & UNION case.
>>
>> Until now I didn't create any new EC for the UniqueKey case, I just used the
>> existing ones. However I can't handle the case like
>>
>> SELECT a, b FROM t1
>> UNION
>> SELECT x, y FROM t2;
>>
>> In this case, there is no EC created with existing code. and I don't want to
>> create them for the UniqueKey case as well.  so my plan is just not to handle
>> the case for UNION.
>>
>> Since we need some big effort from the reviewer, I split the patch into many
>> smaller chunks.
>>
>> Patch 1 / Patch 2:  I just split some code which UniqueKey uses but not very
>> interrelated. Splitting them out to reduce the core patch size.
>> Patch 3:  still the notnull stuff.  This one doesn't play a big role overall,
>> even if the design is changed at last, we can just modify some small stuff. 
>> IMO,
>> I don't think it is a blocker issue to move on.
>> Patch 4:  Support the UniqueKey for baserel.
>> Patch 5: Support the UniqueKey for joinrel.
>> Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
>> groupby rel.
>>
>> I'd suggest moving on like this:
>> 1. Have an overall review to see if any outstanding issues the patch has.
>> 2. If not, we can review and commit patch 1 & patch 2 to reduce the patch 
>> size.
>> 3. Decide which method to take for not null stuff. IMO Tom's method
>> would be a bit
>>    complex and the benefit is not very clear to me[1].  So the choices
>>    include: a). move on UniqueKey stuff until Tom's method is ready. b). Move
>>    on the UniqueKey with my notnull way, and changes to Tom's method when
>>    necessary. I prefer method b).
>> 4. Review & Commit the UniqueKey for BaseRel part.
>> 5. Review & Commit the UniqueKey for JoinRel part.
>> 6. Review & Commit the UniqueKey for SubQuery part *without* the Interesting
>>    UniqueKey well handled.
>> 7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting
>>    UniqueKey well handled.
>> 8. Discuss about the UniqueKey on partitioned rel case and 
>> develop/review/commit
>>    it.
>> 9. Apply UniqueKey stuff on more user cases rather than DISTINCT.
>>
>> What do you think about this?
>>
>> [1] 
>> https://www.postgresql.org/message-id/CAApHDvo5b2pYX%2BdbPy%2BysGBSarezRSfXthX32TZNFm0wPdfKGQ%40mail.gmail.com
>> [2] 
>> https://www.postgresql.org/message-id/CAKU4AWo6-%3D9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN%3D%2Bhb_w%40mail.gmail.com
>>
>>
>> Thanks
>
> Hi,
> For v3-0005-Support-UniqueKey-on-JoinRel.patch :
>
> +static void populate_joinrel_composited_uniquekey(PlannerInfo *root,
>
> populate_joinrel_composited_uniquekey -> populate_joinrel_composite_uniquekey 
> (without the trailing d for composite)
>
> For populate_joinrel_uniquekeys():
>
> +       foreach(lc, outerrel->uniquekeys)
> +       {
> ...
> +           return;
>
> Should the remaining unique keys be considered ?
>
> For populate_joinrel_uniquekey_for_rel():
>
> +       else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != 
> NULL)
> +       {
> +           other_ecs = lappend(other_ecs, r->right_ec);
> +           other_relids = bms_add_members(other_relids, r->left_relids);
>
> It seems the append to other_ecs is the same as the one for the 
> `bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is this 
> correct ?
>


Correct, both of them are bugs, I will fix them in the next version,
including the "tailing d".
Thanks for your review!

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Reply via email to