I have PostgreSQL 8.3.9 [PostgreSQL 8.3.9 on i386-apple-darwin10.3.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1)]
and the custom first and last aggregates from: http://wiki.postgresql.org/wiki/First_(aggregate) http://wiki.postgresql.org/wiki/Last_(aggregate) I have a simple table, of two columns. The first is a timestamp and is the primary key, the second is an integer. I've loaded the table up with values, one for every minute, for a whole year. Some SQL to recreate the table and the aggregates can be retrieved from: http://blog.devauld.ca/aggregate_test.zip (File is approximately 180KB) Now when I try to make use of the first and last aggregates, I get: # select first(t), last(t) from test group by extract(day from t); first | last ---------------------+--------------------- 2009-01-01 00:00:00 | 2009-01-01 17:02:00 2009-01-02 10:07:00 | 2009-01-02 10:06:00 2009-01-03 20:15:00 | 2009-01-03 20:14:00 2009-01-04 00:00:00 | 2009-01-04 23:59:00 2009-01-05 00:00:00 | 2009-01-05 23:59:00 2009-01-06 16:31:00 | 2009-01-06 16:30:00 2009-01-07 00:00:00 | 2009-01-07 23:49:00 2009-01-08 11:09:00 | 2009-01-08 11:42:00 2009-01-09 11:08:00 | 2009-01-09 00:51:00 2009-01-10 11:33:00 | 2009-01-10 23:37:00 2009-01-11 13:05:00 | 2009-01-11 23:59:00 2009-01-12 23:55:00 | 2009-01-12 23:47:00 2009-01-13 01:50:00 | 2009-01-13 23:36:00 2009-01-14 23:55:00 | 2009-01-14 23:41:00 2009-01-15 00:47:00 | 2009-01-15 23:40:00 2009-01-16 00:29:00 | 2009-01-16 23:38:00 2009-01-17 00:09:00 | 2009-01-17 23:37:00 2009-01-18 23:48:00 | 2009-01-18 23:37:00 2009-01-19 23:56:00 | 2009-01-19 23:39:00 2009-01-20 07:14:00 | 2009-01-20 23:36:00 2009-01-21 23:40:00 | 2009-01-21 23:41:00 2009-01-22 02:57:00 | 2009-01-22 23:40:00 2009-01-23 23:56:00 | 2009-01-23 23:38:00 2009-01-24 09:34:00 | 2009-01-24 23:37:00 2009-01-25 23:50:00 | 2009-01-25 23:37:00 2009-01-26 23:48:00 | 2009-01-26 23:39:00 2009-01-27 06:36:00 | 2009-01-27 23:37:00 2009-01-28 23:59:00 | 2009-01-28 23:41:00 2009-01-29 16:12:00 | 2009-01-29 23:40:00 2009-01-30 21:11:00 | 2009-01-30 23:39:00 2009-01-31 20:12:00 | 2009-01-31 16:20:00 (31 rows) For some reason the aggregates are not falling into the proper group. I can't blame timezones as the results are all over the map, and first/last relationship is broken as in some cases 'last' is chronologically before 'first' If I explicitly retrieve the values for midnight each day: # select t, v from test where extract(hour from t) = 0 and extract(minute from t) = 0; t | v ---------------------+------- 2009-01-01 00:00:00 | 0 2009-01-02 00:00:00 | 1440 2009-01-03 00:00:00 | 2880 2009-01-04 00:00:00 | 4320 2009-01-05 00:00:00 | 5760 2009-01-06 00:00:00 | 7200 2009-01-07 00:00:00 | 8640 2009-01-08 00:00:00 | 10080 2009-01-09 00:00:00 | 11520 2009-01-10 00:00:00 | 12960 2009-01-11 00:00:00 | 14400 2009-01-12 00:00:00 | 15840 2009-01-13 00:00:00 | 17280 2009-01-14 00:00:00 | 18720 2009-01-15 00:00:00 | 20160 2009-01-16 00:00:00 | 21600 2009-01-17 00:00:00 | 23040 2009-01-18 00:00:00 | 24480 2009-01-19 00:00:00 | 25920 2009-01-20 00:00:00 | 27360 2009-01-21 00:00:00 | 28800 2009-01-22 00:00:00 | 30240 2009-01-23 00:00:00 | 31680 2009-01-24 00:00:00 | 33120 2009-01-25 00:00:00 | 34560 2009-01-26 00:00:00 | 36000 2009-01-27 00:00:00 | 37440 2009-01-28 00:00:00 | 38880 2009-01-29 00:00:00 | 40320 2009-01-30 00:00:00 | 41760 2009-01-31 00:00:00 | 43200 (31 rows) I get back the values for which I am seeking. The pain is in finding the last record in the day before. I would have thought that grouping by date_trunc on month would have yeilded similar results to above: # select first(t), first(v) from test group by date_trunc('day', t); first | first ---------------------+------- 2009-01-01 00:00:00 | 0 2009-01-02 10:07:00 | 2047 2009-01-03 20:15:00 | 4095 2009-01-04 00:00:00 | 4320 2009-01-05 00:00:00 | 5760 2009-01-06 17:33:00 | 8253 2009-01-07 16:56:00 | 9656 2009-01-08 17:28:00 | 11128 2009-01-09 21:14:00 | 12794 2009-01-10 05:47:00 | 13307 2009-01-11 16:42:00 | 15402 2009-01-12 16:30:00 | 16830 2009-01-13 20:14:00 | 18494 2009-01-14 23:59:00 | 20159 2009-01-15 22:17:00 | 21497 2009-01-16 23:57:00 | 23037 2009-01-17 18:32:00 | 24152 2009-01-18 20:15:00 | 25695 2009-01-19 07:58:00 | 26398 2009-01-20 22:16:00 | 28696 2009-01-21 17:31:00 | 29851 2009-01-22 16:37:00 | 31237 2009-01-23 23:59:00 | 33119 2009-01-24 21:13:00 | 34393 2009-01-25 22:17:00 | 35897 2009-01-26 16:42:00 | 37002 2009-01-27 16:30:00 | 38430 2009-01-28 16:52:00 | 39892 2009-01-29 23:59:00 | 41759 2009-01-30 10:19:00 | 42379 2009-01-31 14:58:00 | 44098 (31 rows) Looking at the plan: # explain select first(t), first(v) from test group by date_trunc('day', t); QUERY PLAN ----------------------------------------------------------------------- GroupAggregate (cost=5010.56..6238.16 rows=44640 width=12) -> Sort (cost=5010.56..5122.16 rows=44640 width=12) Sort Key: (date_trunc('day'::text, t)) -> Seq Scan on test (cost=0.00..800.00 rows=44640 width=12) (4 rows) Now, if I have a much smaller data set: # delete from test where t > '2009-01-03'; vacuum analyze; # select first(t), first(v) from test group by date_trunc('day', t); first | first ---------------------+------- 2009-01-01 00:00:00 | 0 2009-01-03 00:00:00 | 2880 2009-01-02 00:00:00 | 1440 (3 rows) # explain select first(t), first(v) from test group by date_trunc('day', t); QUERY PLAN --------------------------------------------------------------- HashAggregate (cost=73.62..124.04 rows=2881 width=12) -> Seq Scan on test (cost=0.00..52.01 rows=2881 width=12) (2 rows) So, is there a way I can set up my query to deterministically return the same results each time? Is there another way to get 'first of the month' or 'last for the day' aggregates that work reliably on varying data set sizes? Can I force the optimizer to not use a GroupAggregate? Any help would be appreciated, -W