Hello, I have a query to aggregate data wich is too slow :-)
Here a simplified example: create table test ( time int8, --store the time as epoch a_group varchar, category varchar ) For each group, I need the first/last times and categories , the number of distinct categories and the number of records. Here my best solution until now: SELECT FIRST.a_group, FIRST.time as first_time, FIRST.category as first_category, LAST.time as last_time, LAST.category as last_category, AGG.c_count, AGG.c_all FROM ( select distinct on (a_group) a_group, time, category from test order by a_group, time ) FIRST, ( select distinct on (a_group) a_group, time, category from test order by a_group, time DESC ) LAST, ( select a_group, count(distinct category) as c_count, count(*) as c_all from test group by a_group order by a_group ) AGG where FIRST.a_group = LAST.a_group and LAST.a_group=AGG.a_group each sub query is quite fast -- thanks for the DISTINCT ON feature :-) , but the whole is really slow as Postgres start to swap due to the large amount of data to merge. I guess there must be a better solution as the three sub queries return exactly one row for each 'a_group' and are moreover already sorted (The table does not contain any NULL value). But in the query plan below, most of the cost comes form the merges. I imagine there must be a way using custom aggregation functions, but I'm not confident with those: Is it possible to define aggregate in order to retrieve the first/last values of an ordered result set? This would allow to make a single scan of the table. something like select a_group, first(category) as first_category, last(category) as last_category, ... from test order by a_group,time Many thanks for any hints. Marc Mamin Here are some dummy values if you'd like to play with this issue: insert into test select s,'G'||s , 'C1' from(select generate_series(1,10000)as s)s; insert into test select s+10,'G'||s , 'C2' from(select generate_series(1,10000)as s)s; insert into test select s+13,'G'||s , 'C3' from(select generate_series(1,10000)as s)s; insert into test select s+1,'G'||s , 'C2' from(select generate_series(1,10000,5)as s)s; insert into test select s,'G'||s%10 , 'C3' from(select generate_series(1,10000,5)as s)s; insert into test select s+1,'G'||s%5 , 'C2' from(select generate_series(1,10000,5)as s)s; insert into test select s+1,'G'||s , 'C1' from(select generate_series(1,1000000)as s)s; --10^6 !! create index test_i on test(a_group); analyze test; => Merge Join (cost=259000.31..34904377039.75 rows=1550421099181 width=128) Merge Cond: ((test.a_group)::text = (last.a_group)::text) -> Merge Join (cost=129500.16..17814340.14 rows=783387153 width=120) Merge Cond: ((test.a_group)::text = (test.a_group)::text) -> GroupAggregate (cost=0.00..53681.23 rows=395825 width=10) -> Index Scan using test_i on test (cost=0.00..39973.53 rows=1036043 width=10) -> Materialize (cost=129500.16..133458.41 rows=395825 width=72) -> Unique (cost=119965.87..125146.08 rows=395825 width=18) -> Sort (cost=119965.87..122555.97 rows=1036043 width=18) Sort Key: test.a_group, test."time" -> Seq Scan on test (cost=0.00..16451.43 rows=1036043 width=18) -> Materialize (cost=129500.16..133458.41 rows=395825 width=72) -> Subquery Scan last (cost=119965.87..129104.33 rows=395825 width=72) -> Unique (cost=119965.87..125146.08 rows=395825 width=18) -> Sort (cost=119965.87..122555.97 rows=1036043 width=18) Sort Key: test.a_group, test."time" -> Seq Scan on test (cost=0.00..16451.43 rows=1036043 width=18) -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql