Hi, sc.taken_end <= '1996-12-31'::date AND > m.taken BETWEEN sc.taken_start AND sc.taken_end AND > > category of data at a certain time. But I'm afraid this makes the planning > much more difficult, as the select from measurements depend on the data > returned by other parts of the query (rows from category). >
Right. Users can select 1900 - 2009. Station data hardly ever spans that range. The *station_category* is used to create a unique key into the measurement data for every station: station_id, category_id, and taken_start. The measurement data should be contiguous until taken_end. I thought that that combination would be a pointer to the exact spot in the measurement table where the data starts, which should be ridiculously fast to find. See this http://explain.depesz.com/s/H1 and this > http://explain.depesz.com/s/GGx > I was getting some red lines when I looked at a different plan. It's a great site. How many rows does the query return without the group by clause? About > 140000 in both cases, right? > SELECT * FROM climate.measurement m WHERE m.station_id = 5148 AND m.taken BETWEEN '1900-08-01'::date AND '2009-12-31'::date AND m.category_id = 1 5397 rows (10 seconds cold; 0.5 seconds hot); estimated too high by 2275 rows? http://explain.depesz.com/s/uq OK, I haven't noticed the table is already partitioned by category_id and > I didn't mean to partition by (taken, category_id) - that would produce a > lot of partitions. Yes, that might cause problems related to number of > files, but that's rather a filesystem related issue. > Constrained as: CONSTRAINT measurement_013_category_id_ck CHECK (category_id = 7) > I'd expect rather issues related to RULEs or triggers (not sure which of > them you use to redirect the data into partitions). But when partitioning > I created seven child tables of measurement. Each of these has a constraint by category_id. This makes it extremely fast to select the correct partition. > I'm not sure what you mean by 'year wrapping issue' but I think it might > work quite well - right not the problem is PostgreSQL decides to scan the > whole partition (all data for a given category_id). > I'll give it another try. :-) *Use Case #1* User selects: Mar 22 to Dec 22 User selects: 1900 to 2009 Result: Query should average *9 months* of climate data per year between Mar 22 and Dec 22 of Year. *Use Case #2* User selects: Dec 22 to Mar 22 User selects: 1900 to 2009 Result: Query should average *3 months* of climate data per year between Dec 22 of Year and Mar 22 of Year+1. So if a user selects 1950 to *1960*: - first case should average between 1950 and *1960*; and - second case should average between 1950 and *1961*. Dave