[HACKERS] Should planner fold stable functions for estimation purposes?

2004-03-12 Thread Tom Lane
I've been toying with the notion of allowing the planner to compute the
current values of stable functions when it's trying to estimate
selectivities.  For instance, in a query like

select ... where timestampcol = now() - interval '1 day';

we currently throw up our hands and treat the righthand side as an
unknown quantity for estimation purposes, which leads to selection of
a very conservative default selectivity estimate.  That often
discourages the planner from selecting an indexscan, and can lead to
unreasonably slow join choices at upper levels of the plan.

It would not be correct to reduce the righthand side to a constant in
advance of execution, of course, but is it reasonable to compute its
current value solely for purposes of comparison to column statistics?

The risk we take if we do so is that the estimate we thereby derive
could be stale by the time the generated plan is used, and in the worst
case the plan could be really inappropriate.  On the other hand, in most
of the practical examples that I've seen, the current planner behavior
is producing a pretty inappropriate plan.

A possibly useful compromise is to do this reduction only in
scalarineqsel, where not having any comparison value is really a serious
blow, and not risk it in eqsel, where we can often generate a not-too-awful
estimate without any specific comparison value.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Should planner fold stable functions for estimation purposes?

2004-03-12 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 It would not be correct to reduce the righthand side to a constant in
 advance of execution, of course, but is it reasonable to compute its
 current value solely for purposes of comparison to column statistics?

 So this means it would be double evaluated? A flag will be required to
 prevent this for functions that do more than just return a value or have
 a high cost in execution.

Functions with side-effects had better be marked volatile anyway, so I'm
not worried about that case.  As for the expense argument, keep in mind
that the one extra evaluation in the planner is likely to save you an
awful lot of evaluations at runtime, if it convinces the planner to use
an indexscan and not a seqscan.  We are after all talking about
functions appearing in WHERE, and I wouldn't think that people can
reasonably expect those to get evaluated just once.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster