Re: [HACKERS] Making clausesel.c Smarter

David Rowley Thu, 06 Apr 2017 16:54:06 -0700

On 7 April 2017 at 09:05, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
>>>> If you ask me, I'd just leave:
>>>>
>>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>>> DEFAULT_INEQ_SEL)
>>>> + {
>>>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>>> implies we have no stats */
>>>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>>>> + }
>>>> + else
>>>> + {
>>>> ...
>>>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>>> +   notnullsel = 1.0 - nullsel;
>>>> +
>>>> +   /* Adjust for double-exclusion of NULLs */
>>>> +   s2 += nullsel;
>>>> +   if (s2 <= 0.0) {
>>>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>>> +      {
>>>> +           /* Most likely contradictory clauses found */
>>>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>>> +      }
>>>> +      else
>>>> +      {
>>>> +           /* Could be a rounding error */
>>>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>> +      }
>>>> +   }
>>>> + }
>>>>
>>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>>> cautious) estimation of the amount of rounding error that could be
>>>> there with 32-bit floats.
>>>>
>>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>>> an estimation based on stats, maybe approximate, but one that makes
>>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>>> negative values are either sign of a contradiction or rounding error.
>>>> The previous code left the possibility of negatives coming out of
>>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>>> but that doesn't seem possible coming out of scalarineqsel.
>>>>
>>>> But I'd like it if Tom could comment on that, since he's the one that
>>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>>> in 2004).
>>>>
>>>> Barring that, I'd just change the
>>>>
>>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>>
>>>> To
>>>>
>>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>>
>>>> Which makes a lot more sense.
>>>
>>> I think to fix this properly the selfuncs would have to return the
>>> estimate, and nullfrac separately, so the nullfrac could just be
>>> applied once per expression. There's already hacks in there to get rid
>>> of the double null filtering. I'm not proposing that as something for
>>> this patch. It would be pretty invasive to change this.
>>
>> There's no need, you can compute the nullfrac with nulltestsel. While
>> there's some rework involved, it's not a big deal (it just reads the
>> stats tuple), and it's a clean solution.
>
> I'm marking this Waiting on Author until we figure out what to do with
> those issues (the null issue quoted above, and the quadratic behavior)


I've attached a rebased patch as the existing one no longer applies.

I've not yet had time to wrap my head around the null handling stuff.
I'll try to get to that today.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

smarter_clausesel_2017-04-07.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making clausesel.c Smarter

Reply via email to