Hi all,
In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query
--8<--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random() < 5::DOUBLE PRECISION/_n;
-->8--
translates into the following query plan:
--8<--
Nested Loop (cost=35.00..65.03 rows=1000 width=4)
CTE source
-> Function Scan on generate_series i (cost=0.00..10.00 rows=1000
width=4)
-> Aggregate (cost=25.00..25.02 rows=1 width=0)
Filter: (random() < (5::double precision / (count(*))::double
precision))
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=0)
-> CTE Scan on source (cost=0.00..20.00 rows=1000 width=4)
-->8--
In other words, the query either gives exactly 0 or 10 rows, and both
cases happen with probability 0.5. Naturally, I would have expected
instead that each row is sampled independently with probability 0.5.
Since random() is volatile, so is the whole where-expression. So I
wonder why the condition is pushed down to the lowest level, given that
this changes results. Is this behavior correct, i.e., specified
somewhere? Or is this a bug?
Florian
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers