On 2015/09/28 13:31, David Rowley wrote: > I've been spending time working on allowing the planner to perform > aggregation before the final join relation is created. > ...
> > The patch is however so far capable of giving us extremely nice performance > improvements for some (likely artificial) queries. > > Let's look at a quick example: > > CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT > NULL, PRIMARY KEY(product_id)); > CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code); > -- create small list of products > INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM > generate_series(1,100) g(id); > > CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity > INT NOT NULL); > > INSERT INTO sale (sale_id, product_id,quantity) SELECT > x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM > generate_series(1,100000000) x(x); > > ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id); > > test=# SELECT count(sale.sale_id) FROM sale, product; > count > ------------- > 10000000000 > (1 row) > Time: 10323.053 ms > > > And if I disable the optimisation: > > test=# set enable_earlygrouping = off; > SET > Time: 0.302 ms > test=# SELECT count(sale.sale_id) FROM sale, product; > count > ------------- > 10000000000 > (1 row) > Time: 775790.764 ms > > So, in this probably rather unlikely query, we get something around a 7500% > performance increase. Of course as the ratio of groups per underlying > tuples increase, the performance increase will tail off. > > The explain output from the optimised version is as follows: > > QUERY PLAN > ------------------------------------------------------------------------------------ > Finalize Aggregate (cost=1790544.37..1790544.38 rows=1 width=4) > -> Nested Loop (cost=1790541.10..1790544.12 rows=100 width=4) > -> Partial Aggregate (cost=1790541.10..1790541.11 rows=1 width=4) > -> Seq Scan on sale (cost=0.00..1540541.08 rows=100000008 > width=4) > -> Seq Scan on product (cost=0.00..2.00 rows=100 width=0) > > Did you perhaps attach a version of the patch you didn't intend to? I get the following plan and hence a different result from what's shown above: postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product; QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=17909.27..17909.28 rows=1 width=4) -> Nested Loop (cost=17906.00..17909.02 rows=100 width=4) -> Aggregate (cost=17906.00..17906.01 rows=1 width=4) -> Seq Scan on sale (cost=0.00..15406.00 rows=1000000 width=4) -> Seq Scan on product (cost=0.00..2.00 rows=100 width=0) postgres=# SELECT count(sale.sale_id) FROM sale, product; count ------- 100 (1 row) postgres=# set enable_earlygrouping = off; SET postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product; QUERY PLAN --------------------------------------------------------------------------- Aggregate (cost=1515408.25..1515408.26 rows=1 width=4) -> Nested Loop (cost=0.00..1265408.25 rows=100000000 width=4) -> Seq Scan on sale (cost=0.00..15406.00 rows=1000000 width=4) -> Materialize (cost=0.00..2.50 rows=100 width=0) -> Seq Scan on product (cost=0.00..2.00 rows=100 width=0) (5 rows) postgres=# SELECT count(sale.sale_id) FROM sale, product; count ----------- 100000000 (1 row) Am I missing something? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers