Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Tomas Vondra Sun, 06 Jan 2019 16:45:44 -0800

FWIW the main unsolved issue (at least on the MCV part) is how it
decides which items to keep in the list.


As explained in [1], in the multivariate case we can't simply look at
the group frequency and compare it to the average frequency (of the
non-MCV items), which is what analyze_mcv_list() does in the
single-column case. In the multivariate case we also case about observed
vs. base frequency, i.e. we want the MCV list to include groups that are
present singificantly more/less than product of per-column stats.

I've repeatedly tried to come up with a criteria that would address
that, but it seems rather difficult because we can't abandon the other
criteria either. So the MCV list should include groups that match both

(a) items that are statistically more common than the non-MCV part (i.e.
the rule from per-column analyze_mcv_list)

(b) items that are statistically more/less common than estimated from
per-column stats (i.e. the new rule)

Enforcing rule (a) seems reasonable because it ensures the MCV list
includes all items more frequent than the last one. Without it, it's
difficult to decide know whether the absent item is very common (but
close to base frequency) or very uncommon (so less frequent than the
last MCV item).

So it's not clear to me how to best marry these two things. So far the
only thing I came up with is looking for the last item where the
frequency and base frequency are very different (not sure how exactly to
decide when the difference becomes statistically significant), include
all items with higher frequencies, and then do analyze_mcv_list() to
also enforce (a). But it seems a bit cumbersome :-(

[1]
https://www.postgresql.org/message-id/8ac8bd94-478d-215d-e6bd-339f1f20a74c%402ndquadrant.com


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Reply via email to