On 7 April 2017 at 09:05, Claudio Freire <klaussfre...@gmail.com> wrote: > On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfre...@gmail.com> wrote: >>>> If you ask me, I'd just leave: >>>> >>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound == >>>> DEFAULT_INEQ_SEL) >>>> + { >>>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL >>>> implies we have no stats */ >>>> + s2 = DEFAULT_RANGE_INEQ_SEL; >>>> + } >>>> + else >>>> + { >>>> ... >>>> + s2 = rqlist->hibound + rqlist->lobound - 1.0; >>>> + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid); >>>> + notnullsel = 1.0 - nullsel; >>>> + >>>> + /* Adjust for double-exclusion of NULLs */ >>>> + s2 += nullsel; >>>> + if (s2 <= 0.0) { >>>> + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6)) >>>> + { >>>> + /* Most likely contradictory clauses found */ >>>> + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel; >>>> + } >>>> + else >>>> + { >>>> + /* Could be a rounding error */ >>>> + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; >>>> + } >>>> + } >>>> + } >>>> >>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly >>>> cautious) estimation of the amount of rounding error that could be >>>> there with 32-bit floats. >>>> >>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is >>>> an estimation based on stats, maybe approximate, but one that makes >>>> sense (ie: preserves the monotonicity of the CPF), and as such >>>> negative values are either sign of a contradiction or rounding error. >>>> The previous code left the possibility of negatives coming out of >>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates, >>>> but that doesn't seem possible coming out of scalarineqsel. >>>> >>>> But I'd like it if Tom could comment on that, since he's the one that >>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back >>>> in 2004). >>>> >>>> Barring that, I'd just change the >>>> >>>> s2 = DEFAULT_RANGE_INEQ_SEL; >>>> >>>> To >>>> >>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; >>>> >>>> Which makes a lot more sense. >>> >>> I think to fix this properly the selfuncs would have to return the >>> estimate, and nullfrac separately, so the nullfrac could just be >>> applied once per expression. There's already hacks in there to get rid >>> of the double null filtering. I'm not proposing that as something for >>> this patch. It would be pretty invasive to change this. >> >> There's no need, you can compute the nullfrac with nulltestsel. While >> there's some rework involved, it's not a big deal (it just reads the >> stats tuple), and it's a clean solution. > > I'm marking this Waiting on Author until we figure out what to do with > those issues (the null issue quoted above, and the quadratic behavior)
I've attached a rebased patch as the existing one no longer applies. I've not yet had time to wrap my head around the null handling stuff. I'll try to get to that today. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
smarter_clausesel_2017-04-07.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers