Re: [PERFORM] Random Page Cost and Planner

David Jarvis Wed, 26 May 2010 11:56:04 -0700

Hi,

sc.taken_end <= '1996-12-31'::date AND
> m.taken BETWEEN sc.taken_start AND sc.taken_end AND
>
> category of data at a certain time. But I'm afraid this makes the planning
> much more difficult, as the select from measurements depend on the data
> returned by other parts of the query (rows from category).
>


Right. Users can select 1900 - 2009. Station data hardly ever spans that
range.

The *station_category* is used to create a unique key into the measurement
data for every station: station_id, category_id, and taken_start. The
measurement data should be contiguous until taken_end.

I thought that that combination would be a pointer to the exact spot in the
measurement table  where the data starts, which should be ridiculously fast
to find.

See this http://explain.depesz.com/s/H1 and this
> http://explain.depesz.com/s/GGx
>

I was getting some red lines when I looked at a different plan. It's a great
site.

How many rows does the query return without the group by clause? About
> 140000 in both cases, right?
>

SELECT
  *
FROM
  climate.measurement m
WHERE
  m.station_id = 5148 AND
  m.taken BETWEEN '1900-08-01'::date AND '2009-12-31'::date AND
  m.category_id = 1

5397 rows (10 seconds cold; 0.5 seconds hot); estimated too high by 2275
rows?

http://explain.depesz.com/s/uq

 OK, I haven't noticed the table is already partitioned by category_id and
> I didn't mean to partition by (taken, category_id) - that would produce a
> lot of partitions. Yes, that might cause problems related to number of
> files, but that's rather a filesystem related issue.
>

Constrained as:

  CONSTRAINT measurement_013_category_id_ck CHECK (category_id = 7)


> I'd expect rather issues related to RULEs or triggers (not sure which of
> them you use to redirect the data into partitions). But when partitioning
>

I created seven child tables of measurement. Each of these has a constraint
by category_id. This makes it extremely fast to select the correct
partition.


> I'm not sure what you mean by 'year wrapping issue' but I think it might
> work quite well - right not the problem is PostgreSQL decides to scan the
> whole partition (all data for a given category_id).
>

I'll give it another try. :-)

*Use Case #1*
User selects: Mar 22 to Dec 22
User selects: 1900 to 2009

Result: Query should average *9 months* of climate data per year between Mar
22 and Dec 22 of Year.

*Use Case #2*
User selects: Dec 22 to Mar 22
User selects: 1900 to 2009

Result: Query should average *3 months* of climate data per year between Dec
22 of Year and Mar 22 of Year+1.

So if a user selects 1950 to *1960*:

   - first case should average between 1950 and *1960*; and
   - second case should average between 1950 and *1961*.

Dave

Re: [PERFORM] Random Page Cost and Planner

Reply via email to