Re: Improve selectivity estimate for range queries

Kyotaro HORIGUCHI Tue, 08 Jan 2019 18:56:47 -0800

At Tue, 08 Jan 2019 16:26:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<[email protected]> wrote in 
<[email protected]>
> Hello.
> 
> At Fri, 21 Dec 2018 11:50:28 -0500, Tom Lane <[email protected]> wrote in 
> <[email protected]>
> > seem that that's just moving the problem around, but I think it
> > might be possible to show that such a value couldn't be computed
> > by scalarltsel given a histogram with no more than 10000 members.
> > (I haven't tried to actually prove that, but it seems intuitive
> > that the set of possible results would be quantized with no more
> > than about 5 digits precision.)


I think we don't need a perfect proof for that. The fact that
exactly 1/3 is quite natural and common but 1/3 + ε is not would
be enough.

> FWIW, I got the following result on my environment. It seems
> different enough if this holds on all supported platforms, though
> there still is a case where the result of a sequence of
> arithmetics makes false match.

Simple selectivity of a relation theoretically cannot match with
the epsilon. (Of couse on *my* environment.)

(0.333..)
binary format: 3f d5 55 55 55 55 55 55
x = 0.333333333333333315
   231 matches, 79 no_matches

(0.3{13}42..)
binary format: 3f d5 55 55 55 55 55 f1
x = 0.333333333333341975
   0 matches, 310 no_matches

(0.3{15}42..)
binary format: 3f d5 55 55 55 55 55 57
x = 0.333333333333333426
   0 matches, 310 no_matches


It seems that 0.3{13}42 is correctly 0.3{15}42, which makes just
two LSBs difference from 1/3. I believe C is well standardized on
the translation. Other DEFAULT_*_SELs are not compared in this
way.

The attached small patch fixes the case mentioned in this thread,
but I'm not sure where to put a test. Static assertion is not
usable. Assertion in somewhere perhaps in clauselist_selectivity
seems somewhat overdone.. I don't find a suitable place in
regression test..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

#include <stdio.h>
#include <math.h>

int test(double x)
{
  double d = 1.0;
  double d0 = 0;
  unsigned char *c_x = (unsigned char *) &x;
  int nmatches = 0;
  int nnomatches = 0;
  int i;
  
  fprintf(stderr, "binary format: ");
  for (i = 7 ; i >= 0 ; i--)
        fprintf(stderr, "%s%02x", i < 7 ? " " : "", c_x[i]);
  fprintf(stderr, "\n");

  fprintf(stderr, "x = %20.18f\n", x);
  while (d != d0)
  {
        double z = floor(d * 3);
        double z1 = z + 1.0;
        double y = d / z;
        double y1 = d / z1;

        /* Check if both sides of d * 3 doesn't make match */
        if (y == x || y1 == x)
          nmatches++;
        else
          nnomatches++;

        d0 = d;
        d = d * 10;
  }

  fprintf(stderr, "   %d matches, %d no_matches\n", nmatches, nnomatches);

}

int main(void)
{
  test(0.3333333333333333);
  test(0.333333333333342);
  test(0.33333333333333342);

  return 0;
}

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 3739b9817a..cdeaac22c8 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -109,7 +109,7 @@ clauselist_selectivity(PlannerInfo *root,
 	ListCell   *l;
 	int			listidx;
 
-	/*
+ 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selectivity(). None of what we might do below is relevant.
 	 */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 5cc4cf15e2..15a8d2402a 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -33,8 +33,12 @@
 /* default selectivity estimate for equalities such as "A = b" */
 #define DEFAULT_EQ_SEL	0.005
 
-/* default selectivity estimate for inequalities such as "A < b" */
-#define DEFAULT_INEQ_SEL  0.3333333333333333
+/*
+ * default selectivity estimate for inequalities such as "A < b"
+ * The last two digits prevent it from making a false match with 1/3 computed
+ * from histogram and/or MCV.
+ */
+#define DEFAULT_INEQ_SEL  0.33333333333333342
 
 /* default selectivity estimate for range inequalities "A > b AND A < c" */
 #define DEFAULT_RANGE_INEQ_SEL	0.005

Re: Improve selectivity estimate for range queries

Reply via email to