Re: [HACKERS] multivariate statistics (v25)

Tomas Vondra Thu, 02 Mar 2017 06:54:08 -0800

On 03/02/2017 07:42 AM, Kyotaro HORIGUCHI wrote:

Hello,


At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra <[email protected]> wrote 
in <[email protected]>

OK,

attached is v24 of the patch series, addressing most of the reported
issues and comments (at least I believe so). The main changes are:


Unfortunately, 0002 conflicts with the current master
(4461a9b). Could you rebase them or tell us the commit where this
patches stand on?


Attached is a rebased patch series, otherwise it's the same as v24.

FWIW it was based on 016c990834 from Feb 28, but apparently some recentpatch caused a minor conflict.

I only saw the patch files but have some comments.

1) I've mostly abandoned the "multivariate" name in favor of
"extended", particularly in places referring to stats stored in the
pg_statistic_ext in general. "Multivariate" is now used only in places
talking about particular types (e.g. multivariate histograms).

The "extended" name is more widely used for this type of statistics,
and the assumption is that we'll also add other (non-multivariate)
types of statistics - e.g. statistics on custom expressions, or some
for of join statistics.


In 0005, and

@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,
         * If there are no such stats or not enough attributes, don't waste time
         * simply skip to estimation using the plain per-column stats.
         */
+       if (has_stats(stats, STATS_TYPE_MCV) &&
...
+                       /* compute the multivariate stats */
+                       s1 *= clauselist_ext_selectivity(root, mvclauses, stat);
====
@@ -1080,10 +1136,71 @@ clauselist_ext_selectivity_deps(PlannerInfo *root, 
Index relid,
 }

 /*
+ * estimate selectivity of clauses using multivariate statistic

These comment is left unchanged?  or on purpose? 0007 adds very
similar texts.


Hmm, those comments should be probably changed to "extended".

2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
pg_mv_stats view renamed to pg_stats_ext).


FWIW, "extended statistic" would be abbreviated as
"ext_statistic" or "extended_stats". Why have you exchanged the
words?

Because this way it's clear it's a version of pg_statistic, and it willbe sorted right next to it.

3) The structure of pg_statistic_ext was changed as proposed by
Alvaro, i.e. the boolean flags were removed and instead we have just a
single "char[]" column with list of enabled statistics.

4) I also got rid of the "mv" part in most variable/function/constant
names, replacing it by "ext" or something similar. Also mvstats.h got
renamed to stats.h.

5) Moved the files from src/backend/utils/mvstats to
backend/statistics.

6) Fixed the n_choose_k() overflow issues by using the algorithm
proposed by Dean. Also, use the simple formula for num_combinations().

7) I've tweaked data types for a few struct members (in stats.h). I've
kept most of the uint32 fields at the top level though, because int16
might not be large enough for large statistics and the overhead is
minimal (compared to the space needed e.g. for histogram buckets).


Some formulated proof or boundary value test cases might be
needed (to prevent future trouble). Or any defined behavior on
overflow of them might be enough. I belive all (or most) of
overflow-able data has such behavior.


That is probably a good idea and I plan to do that.

The renames/changes were quite widespread, but I've done my best to
fix all the comments and various other places.

regards


regards,


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v25)

Reply via email to