On 03/26/2018 09:01 PM, Dean Rasheed wrote: > On 18 March 2018 at 23:57, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: >> Attached is an updated version of the patch series, addressing issues >> pointed out by Alvaro. > > I've just been reading the new code in > statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and > I'm having a hard time convincing myself that it's correct. > > This code in statext_clauselist_selectivity() looks a bit odd: > > /* > * Evaluate the MCV selectivity. See if we got a full match and the > * minimal selectivity. > */ > if (stat->kind == STATS_EXT_MCV) > s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid, > jointype, sjinfo, rel, > &fullmatch, &mcv_lowsel); > > /* > * If we got a full equality match on the MCV list, we're done (and the > * estimate is likely pretty good). > */ > if (fullmatch && (s1 > 0.0)) > return s1; > > /* > * If it's a full match (equalities on all columns) but we haven't found > * it in the MCV, then we limit the selectivity by frequency of the last > * MCV item. Otherwise reset it to 1.0. > */ > if (!fullmatch) > mcv_lowsel = 1.0; > > return Min(s1, mcv_lowsel); > > So if fullmatch is true and s1 is greater than 0, it will return s1. > If fullmatch is true and s1 equals 0, it will return Min(s1, > mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel > will be set to 1 and it will return Min(s1, mcv_lowsel) which will > also be s1. So it always just returns s1, no? Maybe there's no point > in computing fullmatch. >
Hmmm, I think you're right. It probably got broken in the last rebase, when I moved a bunch of code from the histogram part to the MCV one. I'll take a look. > Also, wouldn't mcv_lowsel potentially be a significant overestimate > anyway? Perhaps 1 minus the sum of the MCV frequencies might be > closer, but even that ought to take into account the number of > distinct values remaining, although that information may not always be > available. > That is definitely true. 1 minus the sum of the MCV frequencies, and I suppose we might even improve that if we had some ndistinct estimate on those columns to compute an average. > Also, just above that, in statext_clauselist_selectivity(), it > computes the list stat_clauses, then doesn't appear to use it > anywhere. I think that would have been the appropriate thing to pass > to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses > into mcv_clauselist_selectivity() will cause it to fail to find any > matches and then underestimate. > Will check. > I've also come across a few incorrect/out-of-date comments: > > /* > * mcv_clauselist_selectivity > * Return the estimated selectivity of the given clauses using MCV list > * statistics, or 1.0 if no useful MCV list statistic exists. > */ > > -- I can't see any code path that returns 1.0 if there are no MCV > stats. The last part of that comment is probably more appropriate to > statext_clauselist_selectivity() > > > /* > * mcv_update_match_bitmap > * [snip] > * The function returns the number of items currently marked as 'match', and > * ... > > -- it doesn't seem to return the number of items marked as 'match'. > > Then inside that function, this comment is wrong (copied from the > preceding comment): > > /* AND clauses assume nothing matches, initially */ > memset(bool_matches, STATS_MATCH_FULL, sizeof(char) * > mcvlist->nitems); > > Still reading... > > Regards, > Dean > Yeah, sorry about that - I forgot to fix those comments after removing the match counting to simplify the patches. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services