Re: POC: GROUP BY optimization

Andrei Lepikhov Wed, 21 Feb 2024 23:05:19 -0800

On 22/2/2024 13:35, Richard Guo wrote:

The avg() function on integer argument is commonly used in
aggregates.sql.  I don't think this is an issue.  See the first test
query in aggregates.sql.

Make sense

     > it should be parallel to the test cases for utilize the ordering of
     > index scan and subquery scan.


    Also, I'm unsure about removing the disabling of the
    max_parallel_workers_per_gather parameter. Have you discovered the
    domination of the current plan over the partial one? Do the cost
    fluctuations across platforms not trigger a parallel plan?


The table used for testing contains only 100 tuples, which is the size
of only one page.  I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.

I don't intend to argue it, but just for the information, I frequentlyreduce it to zero, allowing PostgreSQL to make a decision based oncosts. It sometimes works much better, because one small table in multijoin can disallow an effective parallel plan.


    What's more, I suggest to address here the complaint from [1]. As I
    see,
    cost difference between Sort and IncrementalSort strategies in that
    case
    is around 0.5. To make the test more stable I propose to change it a
    bit
    and add a limit:
    SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
    It makes efficacy of IncrementalSort more obvious difference around 10
    cost points.


I don't think that's necessary.  With Incremental Sort the final cost
is:

     GroupAggregate  (cost=1.66..19.00 rows=100 width=25)

while with full Sort it is:

     GroupAggregate  (cost=16.96..19.46 rows=100 width=25)

With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost.  Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.

As before, I won't protest here - it needs some computations about howmuch cost can be added by bulk extension of the relation blocks. IfMaxim will answer that it's enough to resolve his issue, why not?


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

Reply via email to