On 2015/09/28 13:31, David Rowley wrote:
> I've been spending time working on allowing the planner to perform
> aggregation before the final join relation is created.
>
...

>
> The patch is however so far capable of giving us extremely nice performance
> improvements for some (likely artificial) queries.
> 
> Let's look at a quick example:
> 
> CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
> NULL, PRIMARY KEY(product_id));
> CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
> -- create small list of products
> INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
> generate_series(1,100) g(id);
> 
> CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
> INT NOT NULL);
> 
> INSERT INTO sale (sale_id, product_id,quantity) SELECT
> x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
> generate_series(1,100000000) x(x);
> 
> ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);
> 
> test=# SELECT count(sale.sale_id) FROM sale, product;
>     count
> -------------
>  10000000000
> (1 row)
> Time: 10323.053 ms
> 
> 
> And if I disable the optimisation:
> 
> test=# set enable_earlygrouping = off;
> SET
> Time: 0.302 ms
> test=# SELECT count(sale.sale_id) FROM sale, product;
>     count
> -------------
>  10000000000
> (1 row)
> Time: 775790.764 ms
> 
> So, in this probably rather unlikely query, we get something around a 7500%
> performance increase. Of course as the ratio of groups per underlying
> tuples increase, the performance increase will tail off.
> 
> The explain output from the optimised version is as follows:
> 
>                                      QUERY PLAN
> ------------------------------------------------------------------------------------
>  Finalize Aggregate  (cost=1790544.37..1790544.38 rows=1 width=4)
>    ->  Nested Loop  (cost=1790541.10..1790544.12 rows=100 width=4)
>          ->  Partial Aggregate  (cost=1790541.10..1790541.11 rows=1 width=4)
>                ->  Seq Scan on sale  (cost=0.00..1540541.08 rows=100000008
> width=4)
>          ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
> 
> 

Did you perhaps attach a version of the patch you didn't intend to?

I get the following plan and hence a different result from what's shown above:

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
                                   QUERY PLAN

--------------------------------------------------------------------------------
 Aggregate  (cost=17909.27..17909.28 rows=1 width=4)
   ->  Nested Loop  (cost=17906.00..17909.02 rows=100 width=4)
         ->  Aggregate  (cost=17906.00..17906.01 rows=1 width=4)
               ->  Seq Scan on sale  (cost=0.00..15406.00 rows=1000000
width=4)
         ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
 count
-------
   100
(1 row)

postgres=# set enable_earlygrouping = off;
SET

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
                                QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=1515408.25..1515408.26 rows=1 width=4)
   ->  Nested Loop  (cost=0.00..1265408.25 rows=100000000 width=4)
         ->  Seq Scan on sale  (cost=0.00..15406.00 rows=1000000 width=4)
         ->  Materialize  (cost=0.00..2.50 rows=100 width=0)
               ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
(5 rows)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
   count
-----------
 100000000
(1 row)

Am I missing something?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to