Re: POC: GROUP BY optimization

Andrei Lepikhov Wed, 21 Feb 2024 20:19:07 -0800

On 22/2/2024 09:09, Richard Guo wrote:

On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorot...@gmail.com<mailto:aekorot...@gmail.com>> wrote:


    Hi, Richard!

     > What do you think about the revisions for the test cases?

    I've rebased your patch upthread.  Did some minor beautifications.

     > * The table 'btg' is inserted with 10000 tuples, which seems a bit
     > expensive for a test.  I don't think we need such a big table to test
     > what we want.

    Your patch reduces the number of rows to 1000 tuples.  I found it
    possible to further reduce it to 100 tuples.  That also allowed me to
    save the plan in the test case introduced by e1b7fde418.

    Please check if you're OK with the patch attached.


I looked through the v2 patch and have two comments.

* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan.  So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?

I confirm it works correctly. I am not sure about the stability of thezeros number in the output on different platforms here:

       avg
--------------------
 4.0000000000000000
 5.0000000000000000

It was why I'd used the format() function before. So, may we elaborateon the test and restrict the output?


* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation".  This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath.  So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.

I confirm, this version also checks ec_sortref initialization in thecase when ec are contructed from WHERE clause. Generally, I like moreone test for one issue instead of one test for all at once. But it worksand I don't have any reason to dispute it.

Also, I'm unsure about removing the disabling of themax_parallel_workers_per_gather parameter. Have you discovered thedomination of the current plan over the partial one? Do the costfluctuations across platforms not trigger a parallel plan?

What's more, I suggest to address here the complaint from [1]. As I see,cost difference between Sort and IncrementalSort strategies in that caseis around 0.5. To make the test more stable I propose to change it a bitand add a limit:

SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;

It makes efficacy of IncrementalSort more obvious difference around 10cost points.

[1]https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rbkxeotwgzmclad5mphfw...@mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

Reply via email to