On Mon, Aug 9, 2021 at 8:40 PM David G. Johnston <david.g.johns...@gmail.com> wrote:
> On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <br...@momjian.us> wrote: > >> >> > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/ >> > num_distinct1, >> > 1/num_distinct2) >> > = (1 - 0) * (1 - 0) / max(10000, 10000) >> > = 0.0001 >> >> Nice, can you provide a patch please? >> >> > Change the line: > > Concretely, as attached and inline. David J. commit 73fa486a855d75d74a1a695bb350bfbfe27c7751 Author: David G. Johnston <david.g.johns...@gmail.com> Date: Tue Apr 12 21:23:53 2022 +0000 doc: make unique non-null join selectivity example match the prose The description of the computation for the unique, non-null, join selectivity describes a division by the maximum of two values, while the example shows a multiplication by their reciprocal. While equivalent the max phrasing is easier to understand; which seems more important here than precisely adhering to the formula use in the code (for which either variant is still an approximation). While both num_distinct and num_rows are equal for a unique column both the concept and formula use row count (10,000) and the field num_distinct has already been set to mean the specific value present in the pg_stats table (i.e, -1), so use num_rows here. diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index 78053d7c49..f72bc4b274 100644 --- a/doc/src/sgml/planstats.sgml +++ b/doc/src/sgml/planstats.sgml @@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals </programlisting> In this case there is no <acronym>MCV</acronym> information for - <structfield>unique2</structfield> because all the values appear to be - unique, so we use an algorithm that relies only on the number of - distinct values for both relations together with their null fractions: + <structname>unique2</structname> and all the values appear to be + unique (n_distinct = -1), so we use an algorithm that relies on the row + count estimates for both relations (num_rows, not shown, but "tenk") + together with the column null fractions (zero for both): <programlisting> -selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2) +selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2) = (1 - 0) * (1 - 0) / max(10000, 10000) = 0.0001 </programlisting> This is, subtract the null fraction from one for each of the relations, - and divide by the maximum of the numbers of distinct values. + and divide by the row count of the larger relation (this value does get + scaled in the non-unique case). The number of rows that the join is likely to emit is calculated as the cardinality of the Cartesian product of the two inputs, multiplied by the
v0001-doc-make-row-estimation-example-match-prose.patch
Description: Binary data