On 28 March 2018 at 15:50, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > After thinking about this a bit more, I'm not sure if updating the info > based on recursive calls makes sense. The fullmatch flag was supposed to > answer a simple question - can there be just a single matching item? > > If there are equality conditions on all columns, there can be just a > single matching item - if we have found it in the MCV (i.e. s1 > 0.0), > then we don't need to inspect the non-MCV part. > > But handling this in recursive manner breaks this entirely, because with > something like > > (a=1) AND (b=1 OR b=2) > > you suddenly can have multiple matching items. Which makes the fullmatch > flag somewhat useless. > > So I think we should be looking at top-level equality clauses only, just > like number_of_groups() does. >
I'm not quite sure what you mean by that, but it sounds a bit limiting in terms of the kinds of user queries that would be supported. > I think we can remove the fullmatch flag from mcv_update_bitmap > entirely. All we need to know is the presence of equality clauses and > whether there was a match in MCV (which we know from s1 > 0.0). > I agree with removing the fullmatch flag, but I don't think we actually need to know about the presence of equality clauses: The way that mcv_update_bitmap() recursively computes the set of matching MCVs seems to be correct. That gives us a value (call it mcv_matchsel) for the proportion of the table's rows that are in the MCV list and satisfy the clauses in stat_clauses. We can also estimate that there are (1-mcv_totalsel)*N rows that are not in the MCV list, for which the MCV stats therefore tell us nothing. The best way to estimate those rows would seem to be to use the logic from the guts of clauselist_selectivity(), without consulting any extended MCV stats (but still consulting other extended stats, I think). Doing that would return a selectivity value (call it nonmcv_sel) for those remaining rows. Then a reasonable estimate for the overall selectivity would seem to be mcv_matchsel + (1-mcv_totalsel) * nonmcv_sel and there would be no need for mcv_update_bitmap() to track eqmatches or return fullmatch, and it wouldn't actually matter whether or not we had equality clauses or if all the MCV columns were used. Regards, Dean