Re: Add proper planner support for ORDER BY / DISTINCT aggregates

Ronan Dunklau Mon, 07 Nov 2022 06:53:58 -0800

Le samedi 5 novembre 2022, 09:51:23 CET Pavel Luzanov a écrit :
> While playing with the patch I found a situation where the performance
> may be degraded compared to previous versions.
> 
> The test case below.
> If you create a proper index for the query (a,c), version 16 wins. On my
> notebook, the query runs ~50% faster.
> But if there is no index (a,c), but only (a,b), in previous versions the
> planner uses it, but with this patch a full table scan is selected.


Hello,

In your exact use case, the combo incremental-sort + Index scan is evaluated 
to cost more than doing a full sort + seqscan. 

If you try for example to create an index on (b, a) and group by b, you will 
get the expected behaviour:

ro=# create index on t (b, a);
CREATE INDEX
ro=# explain  select b, array_agg(c order by c) from t group by b;
                                       QUERY PLAN                               
         
-----------------------------------------------------------------------------------------
 GroupAggregate  (cost=10.64..120926.80 rows=9970 width=36)
   Group Key: b
   ->  Incremental Sort  (cost=10.64..115802.17 rows=1000000 width=6)
         Sort Key: b, c
         Presorted Key: b
         ->  Index Scan using t_b_a_idx on t  (cost=0.42..47604.12 
rows=1000000 width=6)
(6 rows)

I think we can trace that back to incremental sort being pessimistic about 
it's performance. If you try the same query, but with set enable_seqscan = off, 
you will get a full sort over an index scan:

                                       QUERY PLAN                               
         
-----------------------------------------------------------------------------------------
 GroupAggregate  (cost=154944.94..162446.19 rows=100 width=34)
   Group Key: a
   ->  Sort  (cost=154944.94..157444.94 rows=1000000 width=4)
         Sort Key: a, c
         ->  Index Scan using t_a_b_idx on t  (cost=0.42..41612.60 
rows=1000000 width=4)
(5 rows)


This probably comes from the overly pessimistic behaviour that the number of 
tuples per group will be 1.5 times as much as we should estimate:

        /*
         * Estimate average cost of sorting of one group where presorted 
keys are
         * equal.  Incremental sort is sensitive to distribution of tuples 
to the
         * groups, where we're relying on quite rough assumptions.  Thus, 
we're
         * pessimistic about incremental sort performance and increase its 
average
         * group size by half.
         */

I can't see why an incrementalsort could be more expensive than a full sort, 
using the same presorted path. It looks to me that in that case we should 
always prefer an incrementalsort. Maybe we should bound incremental sorts cost 
to make sure they are never more expensive than the full sort ?

Also, prior to this commit I don't think it made a real difference, because 
worst case scenario we would have missed an incremental sort, which we didn't 
have beforehand. But with this patch, we may actually replace a "hidden" 
incremental sort which was done in the agg codepath by a full sort. 

Best regards,

--
Ronan Dunklau

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

Reply via email to