Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Michael Fuhr
On Wed, Jan 11, 2006 at 12:56:55AM -0700, Michael Fuhr wrote:
 WHERE ...
   AND doy = EXTRACT(doy FROM now() - '24 hour'::interval)
   AND doy = EXTRACT(doy FROM now())

To work on 1 Jan this should be more like

WHERE ...
  AND (doy = EXTRACT(doy FROM now() - '24 hour'::interval) OR
   doy = EXTRACT(doy FROM now()))

In any case the point is to add conditions to the WHERE clause that
will use an index on the table for which you're currently getting
a sequential scan.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Robert Creager
When grilled further on (Wed, 11 Jan 2006 00:56:55 -0700),
Michael Fuhr [EMAIL PROTECTED] confessed:

 On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote:
  The query is now correct, but still is slow because of lack of
  index usage.  I don't know how to structure the query correctly to
  use the index.
 
 Have you tried adding restrictions on doy in the WHERE clause?
 Something like this, I think:

I cannot.  That's what I thought I would get from the join.  The query shown 
will always have two days involved, and only grows from there.  The data is 
graphed at http://www.logicalchaos.org/weather/index.html, and I'm looking at 
adding historical data to the graphs.

Opps, never mind.  You hit the nail on the head:

weather-# SELECT *, unmunge_time( time_group ) AS time,
weather-# EXTRACT( doy FROM unmunge_time( time_group ) )
weather-# FROM minute.windspeed
weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy 
)
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval ) 
weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) 
weather-# AND EXTRACT( doy FROM now() )
weather-# ORDER BY time_group;

   QUERY PLAN   


 Sort  (cost=21914.09..21914.10 rows=1 width=48) (actual time=76.595..76.662 
rows=286 loops=1)
   Sort Key: windspeed.time_group
   -  Hash Join  (cost=21648.19..21914.08 rows=1 width=48) (actual 
time=64.656..75.562 rows=286 loops=1)
 Hash Cond: (date_part('doy'::text, unmunge_time(outer.time_group)) = 
inner.doy)
 -  Bitmap Heap Scan on windspeed  (cost=2.27..267.40 rows=74 
width=28) (actual time=0.585..1.111 rows=286 loops=1)
   Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
   -  Bitmap Index Scan on minute_windspeed_unmunge_index  
(cost=0.00..2.27 rows=74 width=0) (actual time=0.566..0.566 rows=287 loops=1)
 Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Hash  (cost=21645.92..21645.92 rows=3 width=20) (actual 
time=63.849..63.849 rows=2 loops=1)
   -  HashAggregate  (cost=21645.84..21645.89 rows=3 width=20) 
(actual time=63.832..63.834 rows=2 loops=1)
 -  Bitmap Heap Scan on readings  (cost=59.21..21596.85 
rows=6532 width=20) (actual time=15.174..53.249 rows=7613 loops=1)
   Recheck Cond: ((date_part('doy'::text, when) = 
date_part('doy'::text, (now() - '24:00:00'::interval))) AND 
(date_part('doy'::text, when) = date_part('doy'::text, now(
   -  Bitmap Index Scan on readings_doy_index  
(cost=0.00..59.21 rows=6532 width=0) (actual time=12.509..12.509 rows=10530 
loops=1)
 Index Cond: ((date_part('doy'::text, when) 
= date_part('doy'::text, (now() - '24:00:00'::interval))) AND 
(date_part('doy'::text, when) = date_part('doy'::text, now(
 Total runtime: 77.177 ms

What I had thought is that PG would (could?) be smart enough to realize that 
one query was restricted, and apply that restriction to the other based on the 
join.  I know it works in other cases (using indexes on both tables using the 
join)...

 
 Something else occurred to me: do you (or will you) have more than
 one year of data?  If so then matching on doy could be problematic
 unless you also check for the year, or unless you want to match
 more than one year.

Yes and yes.  I'm doing both aggregate by day of the year for all data, and 
aggregate by day of year within each year.  The examples are:

weather=# select * from doy_agg where doy = extract( doy from now() );
 doy |  avg_windspeed   | max_windspeed 
-+--+---
  11 | 6.14058239764748 |69
(1 row)

weather=# select * from doy_day_agg where extract( doy from day ) = extract( 
doy from now() );
 day |  avg_windspeed   | max_windspeed 
-+--+---
 2004-01-11 00:00:00 | 5.03991313397539 |17
 2006-01-11 00:00:00 |  18.532050716667 |69
 2005-01-11 00:00:00 |  3.6106763448041 |13

Thanks for your help Michael.

Cheers,
Rob

-- 
 07:07:30 up 3 days, 23:34,  9 users,  load average: 2.29, 2.44, 2.43
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpbEkxD1FaVM.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Robert Creager
When grilled further on (Wed, 11 Jan 2006 07:26:59 -0700),
Robert Creager [EMAIL PROTECTED] confessed:

 
 weather-# SELECT *, unmunge_time( time_group ) AS time,
 weather-# EXTRACT( doy FROM unmunge_time( time_group ) )
 weather-# FROM minute.windspeed
 weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = 
 doy )
 weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval ) 
 weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) 
 weather-# AND EXTRACT( doy FROM now() )
 weather-# ORDER BY time_group;

The more I think about it, the more I believe PG is missing an opportunity.  
The query is adequately constrained without the BETWEEN clause.  Why doesn't PG 
see that?  I realize I'm a hack and by db organization shows that...

The query is wrong as stated, as it won't work when the interval crosses a year 
boundary, but it's a stop gap for now.

Cheers,
Rob

-- 
 07:58:30 up 4 days, 25 min,  9 users,  load average: 2.13, 2.15, 2.22
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpvvd8OrY8zM.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Tom Lane
Robert Creager [EMAIL PROTECTED] writes:
 What I had thought is that PG would (could?) be smart enough to realize tha=
 t one query was restricted, and apply that restriction to the other based o=
 n the join.  I know it works in other cases (using indexes on both tables u=
 sing the join)...

The planner understands about transitivity of equality, ie given a = b
and b = c it can infer a = c.  It doesn't do any such thing for
inequalities though, nor does it deduce f(a) = f(b) for arbitrary
functions f.  The addition Michael suggested requires much more
understanding of the properties of the functions in your query than
I think would be reasonable to put into the planner.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Michael Fuhr
On Wed, Jan 11, 2006 at 08:02:37AM -0700, Robert Creager wrote:
 The query is wrong as stated, as it won't work when the interval
 crosses a year boundary, but it's a stop gap for now.

Yeah, I realized that shortly after I posted the original and posted
a correction.

http://archives.postgresql.org/pgsql-performance/2006-01/msg00104.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Robert Creager
When grilled further on (Wed, 11 Jan 2006 10:33:03 -0500),
Tom Lane [EMAIL PROTECTED] confessed:

 The planner understands about transitivity of equality, ie given a = b
 and b = c it can infer a = c.  It doesn't do any such thing for
 inequalities though, nor does it deduce f(a) = f(b) for arbitrary
 functions f.  The addition Michael suggested requires much more
 understanding of the properties of the functions in your query than
 I think would be reasonable to put into the planner.
 

OK.  I think reached a point that I need to re-organize how the data is stored,
maybe ridding myself of the schema and switching entirely to views.  At that
point, I likely could rid myself of the function (unmunge_time) I'm using, and
work with times and doy fields.

Thanks,
Rob

-- 
 21:17:00 up 4 days, 13:43,  9 users,  load average: 2.02, 2.18, 2.23
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpFfPrTCdHVy.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-10 Thread Robert Creager
When grilled further on (Mon, 9 Jan 2006 22:58:18 -0700),
Michael Fuhr [EMAIL PROTECTED] confessed:

 On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote:
  I'm working with a query to get more info out with a join.  The base
  query works great speed wise because of index usage.  When the join is
  tossed in, the index is no longer used, so the query performance tanks.
 
 The first query you posted returns 285 rows and the second returns
 over one million; index usage aside, that difference surely accounts
 for a performance penalty.  And as is often pointed out, index scans
 aren't always faster than sequential scans: the more of a table a
 query has to fetch, the more likely a sequential scan will be faster.

Thanks for pointing out the obvious that I missed.  Too much data in the second 
query.  It's supposed to match (row wise) what was returned from the first 
query.

Just ignore me for now...

Thanks,
Rob

-- 
 08:15:24 up 3 days, 42 min,  9 users,  load average: 2.07, 2.20, 2.25
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006


pgpQ6nKK3glcj.pgp
Description: PGP signature


Re: [PERFORM] Index isn't used during a join.

2006-01-10 Thread Robert Creager

Ok, I'm back, and in a little better shape.

The query is now correct, but still is slow because of lack of index usage.  I 
don't know how to structure the query correctly to use the index.

Taken individually:

weather=# explain analyze select * from doy_agg where doy = extract( doy from 
now() );
   QUERY PLAN   


 HashAggregate  (cost=13750.67..13750.71 rows=2 width=20) (actual 
time=123.134..123.135 rows=1 loops=1)
   -  Bitmap Heap Scan on readings  (cost=25.87..13720.96 rows=3962 width=20) 
(actual time=6.384..116.559 rows=4175 loops=1)
 Recheck Cond: (date_part('doy'::text, when) = date_part('doy'::text, 
now()))
 -  Bitmap Index Scan on readings_doy_index  (cost=0.00..25.87 
rows=3962 width=0) (actual time=5.282..5.282 rows=4215 loops=1)
   Index Cond: (date_part('doy'::text, when) = 
date_part('doy'::text, now()))
 Total runtime: 123.366 ms

produces the data:

weather=# select * from doy_agg where doy = extract( doy from now() );
 doy |  avg_windspeed   | max_windspeed 
-+--+---
  10 | 8.53403056583666 |59

and:

weather=# EXPLAIN ANALYZE
weather-# SELECT *,
weather-# unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
   QUERY PLAN   
 
-
 Sort  (cost=595.33..595.77 rows=176 width=28) (actual time=4.762..4.828 
rows=283 loops=1)
   Sort Key: time_group
   -  Bitmap Heap Scan on windspeed  (cost=2.62..588.76 rows=176 width=28) 
(actual time=0.901..3.834 rows=283 loops=1)
 Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Bitmap Index Scan on minute_windspeed_unmunge_index  
(cost=0.00..2.62 rows=176 width=0) (actual time=0.745..0.745 rows=284 loops=1)
   Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 Total runtime: 5.108 ms

produces:

 time_group |min_reading| max_reading |avg_reading|time 

+---+-+---+-
 1136869500 |   0.8 |   6 |  2.62193548387097 | 2006-01-09 
22:05:00
 1136869800 | 0 |   3 | 0.406021505376343 | 2006-01-09 
22:10:00
 1136870100 | 0 |   5 |  1.68 | 2006-01-09 
22:15:00
... 

But I want the composite of the two queries, and I'm stuck on:

weather=# EXPLAIN ANALYZE
weather-# SELECT *,
weather-# unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy 
)
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
 QUERY 
PLAN  
-
 Sort  (cost=153627.67..153628.48 rows=322 width=48) (actual 
time=10637.681..10637.748 rows=286 loops=1)
   Sort Key: windspeed.time_group
   -  Merge Join  (cost=153604.82..153614.26 rows=322 width=48) (actual 
time=10633.375..10636.728 rows=286 loops=1)
 Merge Cond: (outer.?column5? = inner.doy)
 -  Sort  (cost=594.89..595.33 rows=176 width=28) (actual 
time=5.539..5.612 rows=286 loops=1)
   Sort Key: date_part('doy'::text, 
unmunge_time(windspeed.time_group))
   -  Bitmap Heap Scan on windspeed  (cost=2.62..588.32 rows=176 
width=28) (actual time=0.918..4.637 rows=286 loops=1)
 Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Bitmap Index Scan on minute_windspeed_unmunge_index  
(cost=0.00..2.62 rows=176 width=0) (actual time=0.739..0.739 rows=287 loops=1)
   Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Sort  (cost=153009.93..153010.84 rows=366 width=20) (actual 
time=10627.699..10627.788 rows=295 loops=1)
   Sort Key: doy_agg.doy
   -  HashAggregate  (cost=152984.28..152990.69 rows=366 width=20) 
(actual time=10625.649..10626.601 rows=366 loops=1)
 -  Seq Scan on readings  (cost=0.00..145364.93 
rows=1015914 width=20) (actual 

Re: [PERFORM] Index isn't used during a join.

2006-01-10 Thread Michael Fuhr
On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote:
 The query is now correct, but still is slow because of lack of
 index usage.  I don't know how to structure the query correctly to
 use the index.

Have you tried adding restrictions on doy in the WHERE clause?
Something like this, I think:

WHERE ...
  AND doy = EXTRACT(doy FROM now() - '24 hour'::interval)
  AND doy = EXTRACT(doy FROM now())

Something else occurred to me: do you (or will you) have more than
one year of data?  If so then matching on doy could be problematic
unless you also check for the year, or unless you want to match
more than one year.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Index isn't used during a join.

2006-01-09 Thread Robert Creager


Hey folks,

I'm working with a query to get more info out with a join.  The base query 
works great speed wise because of index usage.  When the join is tossed in, the 
index is no longer used, so the query performance tanks.

Can anyone advise on how to get the index usage back?

weather=# select version();
version 
   
---
 PostgreSQL 8.1.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1 
(4.0.1-5mdk for Mandriva Linux release 2006.0)
(1 row)

The base query is:

weather=# EXPLAIN ANALYZE
weather-# SELECT min_reading, max_reading, avg_reading, -- doy,
weather-#unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# --JOIN readings_doy ON EXTRACT( doy FROM unmunge_time( time_group ) ) 
= doy
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
 QUERY PLAN 
 
-
 Sort  (cost=10995.29..11155.58 rows=64117 width=28) (actual time=4.509..4.574 
rows=285 loops=1)
   Sort Key: time_group
   -  Bitmap Heap Scan on windspeed  (cost=402.42..5876.05 rows=64117 
width=28) (actual time=0.784..3.639 rows=285 loops=1)
 Recheck Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Bitmap Index Scan on minute_windspeed_index  (cost=0.00..402.42 
rows=64117 width=0) (actual time=0.675..0.675 rows=285 loops=1)
   Index Cond: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 Total runtime: 4.880 ms
(7 rows)

When I add in the join, the query tosses out the nice quick index in favor of 
sequence scans:

weather=# EXPLAIN ANALYZE
weather-# SELECT min_reading, max_reading, avg_reading, -- doy,
weather-#unmunge_time( time_group ) AS time
weather-# FROM minute.windspeed
weather-# JOIN readings_doy ON EXTRACT( doy FROM unmunge_time( time_group ) ) = 
doy
weather-# WHERE unmunge_time( time_group )  ( now() - '24 hour'::interval )
weather-# ORDER BY time_group;
   QUERY PLAN   
 
-
 Sort  (cost=98239590.88..99052623.66 rows=325213113 width=28) (actual 
time=60136.484..61079.845 rows=1030656 loops=1)
   Sort Key: windspeed.time_group
   -  Merge Join  (cost=265774.21..8396903.54 rows=325213113 width=28) (actual 
time=34318.334..47113.277 rows=1030656 loops=1)
 Merge Cond: (outer.?column5? = inner.?column2?)
 -  Sort  (cost=12997.68..13157.98 rows=64120 width=28) (actual 
time=2286.155..2286.450 rows=284 loops=1)
   Sort Key: date_part('doy'::text, 
unmunge_time(windspeed.time_group))
   -  Seq Scan on windspeed  (cost=0.00..7878.18 rows=64120 
width=28) (actual time=2279.275..2285.271 rows=284 loops=1)
 Filter: (unmunge_time(time_group)  (now() - 
'24:00:00'::interval))
 -  Sort  (cost=252776.54..255312.51 rows=1014389 width=8) (actual 
time=32001.370..33473.407 rows=1051395 loops=1)
   Sort Key: date_part('doy'::text, readings.when)
   -  Seq Scan on readings  (cost=0.00..142650.89 rows=1014389 
width=8) (actual time=0.053..13759.015 rows=1014448 loops=1)
 Total runtime: 61720.935 ms
(12 rows)

weather=# \d minute.windspeed
  Table minute.windspeed
   Column|   Type   | Modifiers 
-+--+---
 time_group  | integer  | not null
 min_reading | double precision | not null
 max_reading | double precision | not null
 avg_reading | double precision | not null
Indexes:
windspeed_pkey PRIMARY KEY, btree (time_group)
minute_windspeed_index btree (unmunge_time(time_group))

CREATE OR REPLACE FUNCTION unmunge_time( integer )
RETURNS timestamp AS '
DECLARE
   input ALIAS FOR $1;
BEGIN
   RETURN (''epoch''::timestamptz + input * ''1sec''::interval)::timestamp;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;

weather=# \d readings
 Table public.readings
Column|Type |  
Modifiers  
--+-+-
 when | timestamp without time zone | not null default 
(timeofday())::timestamp without time zone
 hour_group   | integer | 
 minute_group

Re: [PERFORM] Index isn't used during a join.

2006-01-09 Thread Michael Fuhr
On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote:
 I'm working with a query to get more info out with a join.  The base
 query works great speed wise because of index usage.  When the join is
 tossed in, the index is no longer used, so the query performance tanks.

The first query you posted returns 285 rows and the second returns
over one million; index usage aside, that difference surely accounts
for a performance penalty.  And as is often pointed out, index scans
aren't always faster than sequential scans: the more of a table a
query has to fetch, the more likely a sequential scan will be faster.

Have the tables been vacuumed and analyzed?  The planner's estimates
for windspeed are pretty far off, which could be affecting the query
plan:

 -  Sort  (cost=12997.68..13157.98 rows=64120 width=28) (actual 
 time=2286.155..2286.450 rows=284 loops=1)
   Sort Key: date_part('doy'::text, 
 unmunge_time(windspeed.time_group))
   -  Seq Scan on windspeed  (cost=0.00..7878.18 rows=64120 
 width=28) (actual time=2279.275..2285.271 rows=284 loops=1)
 Filter: (unmunge_time(time_group)  (now() - 
 '24:00:00'::interval))

That's a small amount of the total query time, however, so although
an index scan might help it probably won't provide the big gain
you're looking for.

Have you done any tests with enable_seqscan disabled?  That'll show
whether an index or bitmap scan would be faster.  And have you
verified that the join condition is correct?  Should the query be
returning over a million rows?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org