Re: POC: GROUP BY optimization

Andrei Lepikhov Thu, 01 Feb 2024 20:40:40 -0800

On 2/2/2024 11:06, Richard Guo wrote:

On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <[email protected]<mailto:[email protected]>> wrote:


    On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <[email protected]
    <mailto:[email protected]>> wrote:

        One of the test cases added by this commit has not been very
        stable in the buildfarm.  Latest example is here:

        
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
 
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>

        and I've seen similar failures intermittently on other machines.

        I'd suggest building this test atop a table that is more stable
        than pg_class.  You're just waving a red flag in front of a bull
        if you expect stable statistics from that during a regression run.
        Nor do I see any particular reason for pg_class to be especially
        suited to the test.


    Yeah, it's not a good practice to use pg_class in this place.  While
    looking through the test cases added by this commit, I noticed some
    other minor issues that are not great.  Such as

    * The table 'btg' is inserted with 10000 tuples, which seems a bit
    expensive for a test.  I don't think we need such a big table to test
    what we want.

    * I don't see why we need to manipulate GUC max_parallel_workers and
    max_parallel_workers_per_gather.

    * I think we'd better write the tests with the keywords being all upper
    or all lower.  A mixed use of upper and lower is not great. Such as in

         explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

    * Some comments for the test queries are not easy to read.

    * For this statement

         CREATE INDEX idx_y_x_z ON btg(y,x,w);

    I think the index name would cause confusion.  It creates an index on
    columns y, x and w, but the name indicates an index on y, x and z.

    I'd like to write a draft patch for the fixes.


Here is the draft patch that fixes the issues I complained about in
upthread.

I passed through the patch. Looks like it doesn't break anything. Why doyou prefer to use count(*) in EXPLAIN instead of plain targetlist, like"SELECT x,y,..."?

Also, according to the test mentioned by Tom:

1. I see, PG uses IndexScan on (x,y). So, column x will be alreadysorted before the MergeJoin. Why not use Incremental Sort on (x,z,w)instead of full sort?2. For memo, IMO, this test shows us the future near perspective of thisfeature: It is cheaper to use grouping order (w,z) instead of (z,w).


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

Reply via email to