On Wed, 24 Mar 2021 at 10:22, Tomas Vondra
<[email protected]> wrote:
>
> Thanks, it seems to be some thinko in handling in PlaceHolderVars, which
> seem to break the code's assumptions about varnos. This fixes it for me,
> but I need to look at it more closely.
>
I think that makes sense.
Reviewing the docs, I noticed a couple of omissions, and had a few
other suggestions (attached).
Regards,
Dean
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
new file mode 100644
index dadca67..382cbd7
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7377,6 +7377,15 @@ SCRAM-SHA-256$<replaceable><iteration
<literal>m</literal> for most common values (MCV) list statistics
</para></entry>
</row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>stxexprs</structfield> <type>pg_node_tree</type>
+ </para>
+ <para>
+ A list of any expressions covered by this statistics object.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -7474,6 +7483,16 @@ SCRAM-SHA-256$<replaceable><iteration
<structname>pg_mcv_list</structname> type
</para></entry>
</row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>stxdexpr</structfield> <type>pg_statistic[]</type>
+ </para>
+ <para>
+ Per-expression statistics, serialized as an array of
+ <structname>pg_statistic</structname> type
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -12843,7 +12862,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
<para>
The view <structname>pg_stats_ext</structname> provides access to
- the information stored in the <link
+ information about each extended statistics object in the database,
+ combining information stored in the <link
linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>
and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
catalogs. This view allows access only to rows of
@@ -12930,7 +12950,16 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
(references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attname</structfield>)
</para>
<para>
- Names of the columns the extended statistics is defined on
+ Names of the columns included in the extended statistics
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>exprs</structfield> <type>text[]</type>
+ </para>
+ <para>
+ Expressions included in the extended statistics
</para></entry>
</row>
@@ -13033,7 +13062,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
<para>
The view <structname>pg_stats_ext_exprs</structname> provides access to
- the information stored in the <link
+ information about all expressions included in extended statistics objects,
+ combining information stored in the <link
linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>
and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
catalogs. This view allows access only to rows of
@@ -13119,7 +13149,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_p
<structfield>expr</structfield> <type>text</type>
</para>
<para>
- Expression the extended statistics is defined on
+ Expression included in the extended statistics
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
new file mode 100644
index 5f3aefd..f561599
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -27,7 +27,7 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep
CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_name</replaceable>
[ ( <replaceable class="parameter">statistics_kind</replaceable> [, ... ] ) ]
- ON { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [, ...]
+ ON { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) }, { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [, ...]
FROM <replaceable class="parameter">table_name</replaceable>
</synopsis>
@@ -45,12 +45,15 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep
<para>
The <command>CREATE STATISTICS</command> command has two basic forms. The
- simple variant allows building statistics for a single expression, does
- not allow specifying any statistics kinds and provides benefits similar
- to an expression index. The full variant allows defining statistics objects
- on multiple columns and expressions, and selecting which statistics kinds will
- be built. The per-expression statistics are built automatically when there
- is at least one expression.
+ first form allows univariate statistics for a single expression to be
+ collected, providing benefits similar to an expression index without the
+ overhead of index maintenance. This form does not allow the statistics
+ kind to be specified, since the various statistics kinds refer only to
+ multivariate statistics. The second form of the command allows
+ multivariate statistics on multiple columns and/or expressions to be
+ collected, optionally specifying which statistics kinds to include. This
+ form will also automatically cause univariate statistics to be collected on
+ any expressions included in the list.
</para>
<para>
@@ -93,16 +96,16 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep
<term><replaceable class="parameter">statistics_kind</replaceable></term>
<listitem>
<para>
- A statistics kind to be computed in this statistics object.
+ A multivariate statistics kind to be computed in this statistics object.
Currently supported kinds are
<literal>ndistinct</literal>, which enables n-distinct statistics,
<literal>dependencies</literal>, which enables functional
dependency statistics, and <literal>mcv</literal> which enables
most-common values lists.
If this clause is omitted, all supported statistics kinds are
- included in the statistics object. Expression statistics are built
- automatically when the statistics definition includes complex
- expressions and not just simple column references.
+ included in the statistics object. Univariate expression statistics are
+ built automatically if the statistics definition includes any complex
+ expressions rather than just simple column references.
For more information, see <xref linkend="planner-stats-extended"/>
and <xref linkend="multivariate-statistics-examples"/>.
</para>
@@ -114,8 +117,9 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep
<listitem>
<para>
The name of a table column to be covered by the computed statistics.
- At least two column names must be given; the order of the column names
- is insignificant.
+ This is only allowed when building multivariate statistics. At least
+ two column names or expressions must be specified, and their order is
+ not significant.
</para>
</listitem>
</varlistentry>
@@ -124,9 +128,11 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep
<term><replaceable class="parameter">expression</replaceable></term>
<listitem>
<para>
- The expression to be covered by the computed statistics. In this case
- only a single expression is required, in which case only statistics
- for the expression are built.
+ An expression to be covered by the computed statistics. This may be
+ used to build univariate statistics on a single expression, or as part
+ of a list of multiple column names and/or expressions to build
+ multivariate statistics. In the latter case, separate univariate
+ statistics are built automatically for each expression in the list.
</para>
</listitem>
</varlistentry>
@@ -156,8 +162,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <rep
<para>
Expression statistics are per-expression and are similar to creating an
index on the expression, except that they avoid the overhead of index
- maintenance. Expression statistics are built automatically when there
- is at least one expression in the statistics object definition.
+ maintenance. Expression statistics are built automatically for each
+ expression in the statistics object definition.
</para>
</refsect1>
@@ -232,13 +238,12 @@ EXPLAIN ANALYZE SELECT * FROM t2 WHERE (
<para>
Create table <structname>t3</structname> with a single timestamp column,
- and run a query using an expression on that column. Without extended
- statistics, the planner has no information about data distribution for
- results of those expression, and uses default estimates as illustrated
- by the first query. The planner also does not realize that the value of
- the second column fully determines the value of the other column, because
- date truncated to day still identifies the month. Then expression and
- ndistinct statistics are built on those two columns:
+ and run queries using expressions on that column. Without extended
+ statistics, the planner has no information about the data distribution for
+ the expressions, and uses default estimates. The planner also does not
+ realize that the value of the date truncated to the month is fully
+ determined by the value of the date truncated to the day. Then expression
+ and ndistinct statistics are built on those two expressions:
<programlisting>
CREATE TABLE t3 (
@@ -262,7 +267,8 @@ EXPLAIN ANALYZE SELECT * FROM t3
EXPLAIN ANALYZE SELECT date_trunc('month', a), date_trunc('day', a)
FROM t3 GROUP BY 1, 2;
--- per-expression statistics are built automatically
+-- build ndistinct statistics on the pair of expressions (per-expression
+-- statistics are built automatically)
CREATE STATISTICS s3 (ndistinct) ON date_trunc('month', a), date_trunc('day', a) FROM t3;
ANALYZE t3;