Tom Lane <t...@sss.pgh.pa.us> wrote:

> florian.schoppm...@emc.com (Florian Schoppmann) writes:
> > [VOLATILE function in WHERE clause *does* get optimized]
> 
> I can't get excited about this.  Any time you put a volatile function
> into WHERE, you're playing with fire.  The docs warn against it:
> http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-
> EXPRESS-EVAL

All this section tells me is that one cannot rely on the evaluation
order in expressions, and that side effects are dangerous in WHERE and
HAVING clauses.

I do not read in this section that VOLATILE functions are unsafe per se.
After all, a VOLATILE function is not required to have side effects
(suppose, e.g., somebody implemented a true random nubmer generator).

However,
<http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html>
is actually very clear in its wording:

| VOLATILE indicates that the function value can change even within a
| single table scan, so no optimizations can be made.

I therefore tend to see the behavior as a bug. I concede though that
there is also good reason to keep the current behavior (VOLATILE is the
default for UDFs, etc.). But then I think the documentation needs to be
changed, and it has to be made explicit where the optimizer may change
semantics and what, on the other hand, is defined behavior.

E.g., users should need to know if the following rewrite causes defined
or undefined behavior. Does it give correct results by accident, or are
correct results guaranteed?

--8<--
WITH source AS (
    SELECT i FROM generate_series(1,10) AS i
)
SELECT
    i
FROM
    source AS _stats
WHERE
    random() < 5::DOUBLE PRECISION 
               / (SELECT count(*) FROM source);
-->8--

> To do what you want, I'd suggest wrapping the join into a sub-select
> with an "OFFSET 0" clause, which will serve as an optimization fence
> that prevents the random() call from being pushed down.

My interpretation so far is that VOLATILE functions in a WHERE clause
can always be avoided: E.g., move the condition to the SELECT list and
embed in an outer query that then filters on the condition column. Or is
my assumption wrong, and the optimizer could theoretically interfere
even here?

Florian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to