Re: Wrong results with grouping sets

Richard Guo Wed, 17 Jul 2024 17:31:45 -0700

On Wed, Jul 17, 2024 at 8:50 AM Paul George <p.a.georg...@gmail.com> wrote:
> > Since a subquery is a volatile expression, each of its instances
> should be evaluated separately.


I don't think this conclusion is correct.  Look at:

select random(), random() from t group by random();
       random       |       random
--------------------+--------------------
 0.7972330769936766 | 0.7972330769936766
(1 row)

> This seems like a valid point, though "query 2" below which groups over a 
> RANDOM() column and outputs an additional RANDOM() column a potential, albeit 
> contrived, counter-example? [NOTE: this was done on Postgres 16.3] I've 
> included a few different combinations of GROUP BYs.

Interesting.  I looked into the scenarios with multiple instances of
the same volatile grouping expressions and here is what I observed.

create table t (a int, b int);
insert into t select 1,1;

-- on master, with plain volatile functions
select random() as c1,
       random() as c2,
       random() as c3
from t t1 group by c1;
        c1         |        c2         |        c3
-------------------+-------------------+-------------------
 0.567478050404431 | 0.567478050404431 | 0.567478050404431
(1 row)

So the random() function is evaluated only once, even though it
appears three times.

-- on master, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.8420177313766823 | 0.2969648209746336 | 0.3499675329093421
(1 row)

So on master the subquery is evaluated three times.  Why isn't this
consistent with the behavior of the first query?

-- on patched, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.5203586066423254 | 0.5203586066423254 | 0.5203586066423254
(1 row)

So on patched the subquery is evaluated only once, which is consistent
with the behavior of the first query.

Does this suggest that the patched version is more 'correct' for this
case?


Now let's look at the scenario with two grouping keys.

-- on master, with plain volatile functions
select random() as c1,
       random() as c2,
       random() as c3
from t t1 group by c1, c2;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.9388558105069595 | 0.2900389441597979 | 0.9388558105069595
(1 row)

So the first two random() functions are evaluated independently, and
the third random() function references the result of the first one.

-- on master, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1, c2;
         c1          |         c2         |         c3
---------------------+--------------------+--------------------
 0.46275163300894073 | 0.5083760995112951 | 0.6752682696191123
(1 row)

So on master the subquery is evaluated three times.

-- on patched, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1, c2;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.9887848690744176 | 0.9887848690744176 | 0.9887848690744176
(1 row)

So on patched the subquery is evaluated only once.

It seems that in this scenario, neither the master nor the patched
version handles volatile subqueries in grouping expressions the same
way as it handles plain volatile functions.

I am confused.  Does the SQL standard explicitly define or standardize
the behavior of grouping by volatile expressions?  Does anyone know
about that?

Thanks
Richard

Re: Wrong results with grouping sets

Reply via email to