FWIW the main unsolved issue (at least on the MCV part) is how it decides which items to keep in the list.
As explained in [1], in the multivariate case we can't simply look at the group frequency and compare it to the average frequency (of the non-MCV items), which is what analyze_mcv_list() does in the single-column case. In the multivariate case we also case about observed vs. base frequency, i.e. we want the MCV list to include groups that are present singificantly more/less than product of per-column stats. I've repeatedly tried to come up with a criteria that would address that, but it seems rather difficult because we can't abandon the other criteria either. So the MCV list should include groups that match both (a) items that are statistically more common than the non-MCV part (i.e. the rule from per-column analyze_mcv_list) (b) items that are statistically more/less common than estimated from per-column stats (i.e. the new rule) Enforcing rule (a) seems reasonable because it ensures the MCV list includes all items more frequent than the last one. Without it, it's difficult to decide know whether the absent item is very common (but close to base frequency) or very uncommon (so less frequent than the last MCV item). So it's not clear to me how to best marry these two things. So far the only thing I came up with is looking for the last item where the frequency and base frequency are very different (not sure how exactly to decide when the difference becomes statistically significant), include all items with higher frequencies, and then do analyze_mcv_list() to also enforce (a). But it seems a bit cumbersome :-( [1] https://www.postgresql.org/message-id/8ac8bd94-478d-215d-e6bd-339f1f20a74c%402ndquadrant.com regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services