Re: [PERFORM] Index isn't used during a join.
On Wed, Jan 11, 2006 at 12:56:55AM -0700, Michael Fuhr wrote: WHERE ... AND doy = EXTRACT(doy FROM now() - '24 hour'::interval) AND doy = EXTRACT(doy FROM now()) To work on 1 Jan this should be more like WHERE ... AND (doy = EXTRACT(doy FROM now() - '24 hour'::interval) OR doy = EXTRACT(doy FROM now())) In any case the point is to add conditions to the WHERE clause that will use an index on the table for which you're currently getting a sequential scan. -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Wed, 11 Jan 2006 00:56:55 -0700), Michael Fuhr [EMAIL PROTECTED] confessed: On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote: The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Have you tried adding restrictions on doy in the WHERE clause? Something like this, I think: I cannot. That's what I thought I would get from the join. The query shown will always have two days involved, and only grows from there. The data is graphed at http://www.logicalchaos.org/weather/index.html, and I'm looking at adding historical data to the graphs. Opps, never mind. You hit the nail on the head: weather-# SELECT *, unmunge_time( time_group ) AS time, weather-# EXTRACT( doy FROM unmunge_time( time_group ) ) weather-# FROM minute.windspeed weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy ) weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) weather-# AND EXTRACT( doy FROM now() ) weather-# ORDER BY time_group; QUERY PLAN Sort (cost=21914.09..21914.10 rows=1 width=48) (actual time=76.595..76.662 rows=286 loops=1) Sort Key: windspeed.time_group - Hash Join (cost=21648.19..21914.08 rows=1 width=48) (actual time=64.656..75.562 rows=286 loops=1) Hash Cond: (date_part('doy'::text, unmunge_time(outer.time_group)) = inner.doy) - Bitmap Heap Scan on windspeed (cost=2.27..267.40 rows=74 width=28) (actual time=0.585..1.111 rows=286 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.27 rows=74 width=0) (actual time=0.566..0.566 rows=287 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Hash (cost=21645.92..21645.92 rows=3 width=20) (actual time=63.849..63.849 rows=2 loops=1) - HashAggregate (cost=21645.84..21645.89 rows=3 width=20) (actual time=63.832..63.834 rows=2 loops=1) - Bitmap Heap Scan on readings (cost=59.21..21596.85 rows=6532 width=20) (actual time=15.174..53.249 rows=7613 loops=1) Recheck Cond: ((date_part('doy'::text, when) = date_part('doy'::text, (now() - '24:00:00'::interval))) AND (date_part('doy'::text, when) = date_part('doy'::text, now( - Bitmap Index Scan on readings_doy_index (cost=0.00..59.21 rows=6532 width=0) (actual time=12.509..12.509 rows=10530 loops=1) Index Cond: ((date_part('doy'::text, when) = date_part('doy'::text, (now() - '24:00:00'::interval))) AND (date_part('doy'::text, when) = date_part('doy'::text, now( Total runtime: 77.177 ms What I had thought is that PG would (could?) be smart enough to realize that one query was restricted, and apply that restriction to the other based on the join. I know it works in other cases (using indexes on both tables using the join)... Something else occurred to me: do you (or will you) have more than one year of data? If so then matching on doy could be problematic unless you also check for the year, or unless you want to match more than one year. Yes and yes. I'm doing both aggregate by day of the year for all data, and aggregate by day of year within each year. The examples are: weather=# select * from doy_agg where doy = extract( doy from now() ); doy | avg_windspeed | max_windspeed -+--+--- 11 | 6.14058239764748 |69 (1 row) weather=# select * from doy_day_agg where extract( doy from day ) = extract( doy from now() ); day | avg_windspeed | max_windspeed -+--+--- 2004-01-11 00:00:00 | 5.03991313397539 |17 2006-01-11 00:00:00 | 18.532050716667 |69 2005-01-11 00:00:00 | 3.6106763448041 |13 Thanks for your help Michael. Cheers, Rob -- 07:07:30 up 3 days, 23:34, 9 users, load average: 2.29, 2.44, 2.43 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpbEkxD1FaVM.pgp Description: PGP signature
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Wed, 11 Jan 2006 07:26:59 -0700), Robert Creager [EMAIL PROTECTED] confessed: weather-# SELECT *, unmunge_time( time_group ) AS time, weather-# EXTRACT( doy FROM unmunge_time( time_group ) ) weather-# FROM minute.windspeed weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy ) weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval) weather-# AND EXTRACT( doy FROM now() ) weather-# ORDER BY time_group; The more I think about it, the more I believe PG is missing an opportunity. The query is adequately constrained without the BETWEEN clause. Why doesn't PG see that? I realize I'm a hack and by db organization shows that... The query is wrong as stated, as it won't work when the interval crosses a year boundary, but it's a stop gap for now. Cheers, Rob -- 07:58:30 up 4 days, 25 min, 9 users, load average: 2.13, 2.15, 2.22 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpvvd8OrY8zM.pgp Description: PGP signature
Re: [PERFORM] Index isn't used during a join.
Robert Creager [EMAIL PROTECTED] writes: What I had thought is that PG would (could?) be smart enough to realize tha= t one query was restricted, and apply that restriction to the other based o= n the join. I know it works in other cases (using indexes on both tables u= sing the join)... The planner understands about transitivity of equality, ie given a = b and b = c it can infer a = c. It doesn't do any such thing for inequalities though, nor does it deduce f(a) = f(b) for arbitrary functions f. The addition Michael suggested requires much more understanding of the properties of the functions in your query than I think would be reasonable to put into the planner. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Index isn't used during a join.
On Wed, Jan 11, 2006 at 08:02:37AM -0700, Robert Creager wrote: The query is wrong as stated, as it won't work when the interval crosses a year boundary, but it's a stop gap for now. Yeah, I realized that shortly after I posted the original and posted a correction. http://archives.postgresql.org/pgsql-performance/2006-01/msg00104.php -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Wed, 11 Jan 2006 10:33:03 -0500), Tom Lane [EMAIL PROTECTED] confessed: The planner understands about transitivity of equality, ie given a = b and b = c it can infer a = c. It doesn't do any such thing for inequalities though, nor does it deduce f(a) = f(b) for arbitrary functions f. The addition Michael suggested requires much more understanding of the properties of the functions in your query than I think would be reasonable to put into the planner. OK. I think reached a point that I need to re-organize how the data is stored, maybe ridding myself of the schema and switching entirely to views. At that point, I likely could rid myself of the function (unmunge_time) I'm using, and work with times and doy fields. Thanks, Rob -- 21:17:00 up 4 days, 13:43, 9 users, load average: 2.02, 2.18, 2.23 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpFfPrTCdHVy.pgp Description: PGP signature
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Mon, 9 Jan 2006 22:58:18 -0700), Michael Fuhr [EMAIL PROTECTED] confessed: On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote: I'm working with a query to get more info out with a join. The base query works great speed wise because of index usage. When the join is tossed in, the index is no longer used, so the query performance tanks. The first query you posted returns 285 rows and the second returns over one million; index usage aside, that difference surely accounts for a performance penalty. And as is often pointed out, index scans aren't always faster than sequential scans: the more of a table a query has to fetch, the more likely a sequential scan will be faster. Thanks for pointing out the obvious that I missed. Too much data in the second query. It's supposed to match (row wise) what was returned from the first query. Just ignore me for now... Thanks, Rob -- 08:15:24 up 3 days, 42 min, 9 users, load average: 2.07, 2.20, 2.25 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpQ6nKK3glcj.pgp Description: PGP signature
Re: [PERFORM] Index isn't used during a join.
Ok, I'm back, and in a little better shape. The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Taken individually: weather=# explain analyze select * from doy_agg where doy = extract( doy from now() ); QUERY PLAN HashAggregate (cost=13750.67..13750.71 rows=2 width=20) (actual time=123.134..123.135 rows=1 loops=1) - Bitmap Heap Scan on readings (cost=25.87..13720.96 rows=3962 width=20) (actual time=6.384..116.559 rows=4175 loops=1) Recheck Cond: (date_part('doy'::text, when) = date_part('doy'::text, now())) - Bitmap Index Scan on readings_doy_index (cost=0.00..25.87 rows=3962 width=0) (actual time=5.282..5.282 rows=4215 loops=1) Index Cond: (date_part('doy'::text, when) = date_part('doy'::text, now())) Total runtime: 123.366 ms produces the data: weather=# select * from doy_agg where doy = extract( doy from now() ); doy | avg_windspeed | max_windspeed -+--+--- 10 | 8.53403056583666 |59 and: weather=# EXPLAIN ANALYZE weather-# SELECT *, weather-# unmunge_time( time_group ) AS time weather-# FROM minute.windspeed weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# ORDER BY time_group; QUERY PLAN - Sort (cost=595.33..595.77 rows=176 width=28) (actual time=4.762..4.828 rows=283 loops=1) Sort Key: time_group - Bitmap Heap Scan on windspeed (cost=2.62..588.76 rows=176 width=28) (actual time=0.901..3.834 rows=283 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.62 rows=176 width=0) (actual time=0.745..0.745 rows=284 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) Total runtime: 5.108 ms produces: time_group |min_reading| max_reading |avg_reading|time +---+-+---+- 1136869500 | 0.8 | 6 | 2.62193548387097 | 2006-01-09 22:05:00 1136869800 | 0 | 3 | 0.406021505376343 | 2006-01-09 22:10:00 1136870100 | 0 | 5 | 1.68 | 2006-01-09 22:15:00 ... But I want the composite of the two queries, and I'm stuck on: weather=# EXPLAIN ANALYZE weather-# SELECT *, weather-# unmunge_time( time_group ) AS time weather-# FROM minute.windspeed weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy ) weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# ORDER BY time_group; QUERY PLAN - Sort (cost=153627.67..153628.48 rows=322 width=48) (actual time=10637.681..10637.748 rows=286 loops=1) Sort Key: windspeed.time_group - Merge Join (cost=153604.82..153614.26 rows=322 width=48) (actual time=10633.375..10636.728 rows=286 loops=1) Merge Cond: (outer.?column5? = inner.doy) - Sort (cost=594.89..595.33 rows=176 width=28) (actual time=5.539..5.612 rows=286 loops=1) Sort Key: date_part('doy'::text, unmunge_time(windspeed.time_group)) - Bitmap Heap Scan on windspeed (cost=2.62..588.32 rows=176 width=28) (actual time=0.918..4.637 rows=286 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.62 rows=176 width=0) (actual time=0.739..0.739 rows=287 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Sort (cost=153009.93..153010.84 rows=366 width=20) (actual time=10627.699..10627.788 rows=295 loops=1) Sort Key: doy_agg.doy - HashAggregate (cost=152984.28..152990.69 rows=366 width=20) (actual time=10625.649..10626.601 rows=366 loops=1) - Seq Scan on readings (cost=0.00..145364.93 rows=1015914 width=20) (actual
Re: [PERFORM] Index isn't used during a join.
On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote: The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Have you tried adding restrictions on doy in the WHERE clause? Something like this, I think: WHERE ... AND doy = EXTRACT(doy FROM now() - '24 hour'::interval) AND doy = EXTRACT(doy FROM now()) Something else occurred to me: do you (or will you) have more than one year of data? If so then matching on doy could be problematic unless you also check for the year, or unless you want to match more than one year. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Index isn't used during a join.
Hey folks, I'm working with a query to get more info out with a join. The base query works great speed wise because of index usage. When the join is tossed in, the index is no longer used, so the query performance tanks. Can anyone advise on how to get the index usage back? weather=# select version(); version --- PostgreSQL 8.1.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1 (4.0.1-5mdk for Mandriva Linux release 2006.0) (1 row) The base query is: weather=# EXPLAIN ANALYZE weather-# SELECT min_reading, max_reading, avg_reading, -- doy, weather-#unmunge_time( time_group ) AS time weather-# FROM minute.windspeed weather-# --JOIN readings_doy ON EXTRACT( doy FROM unmunge_time( time_group ) ) = doy weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# ORDER BY time_group; QUERY PLAN - Sort (cost=10995.29..11155.58 rows=64117 width=28) (actual time=4.509..4.574 rows=285 loops=1) Sort Key: time_group - Bitmap Heap Scan on windspeed (cost=402.42..5876.05 rows=64117 width=28) (actual time=0.784..3.639 rows=285 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_index (cost=0.00..402.42 rows=64117 width=0) (actual time=0.675..0.675 rows=285 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) Total runtime: 4.880 ms (7 rows) When I add in the join, the query tosses out the nice quick index in favor of sequence scans: weather=# EXPLAIN ANALYZE weather-# SELECT min_reading, max_reading, avg_reading, -- doy, weather-#unmunge_time( time_group ) AS time weather-# FROM minute.windspeed weather-# JOIN readings_doy ON EXTRACT( doy FROM unmunge_time( time_group ) ) = doy weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# ORDER BY time_group; QUERY PLAN - Sort (cost=98239590.88..99052623.66 rows=325213113 width=28) (actual time=60136.484..61079.845 rows=1030656 loops=1) Sort Key: windspeed.time_group - Merge Join (cost=265774.21..8396903.54 rows=325213113 width=28) (actual time=34318.334..47113.277 rows=1030656 loops=1) Merge Cond: (outer.?column5? = inner.?column2?) - Sort (cost=12997.68..13157.98 rows=64120 width=28) (actual time=2286.155..2286.450 rows=284 loops=1) Sort Key: date_part('doy'::text, unmunge_time(windspeed.time_group)) - Seq Scan on windspeed (cost=0.00..7878.18 rows=64120 width=28) (actual time=2279.275..2285.271 rows=284 loops=1) Filter: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Sort (cost=252776.54..255312.51 rows=1014389 width=8) (actual time=32001.370..33473.407 rows=1051395 loops=1) Sort Key: date_part('doy'::text, readings.when) - Seq Scan on readings (cost=0.00..142650.89 rows=1014389 width=8) (actual time=0.053..13759.015 rows=1014448 loops=1) Total runtime: 61720.935 ms (12 rows) weather=# \d minute.windspeed Table minute.windspeed Column| Type | Modifiers -+--+--- time_group | integer | not null min_reading | double precision | not null max_reading | double precision | not null avg_reading | double precision | not null Indexes: windspeed_pkey PRIMARY KEY, btree (time_group) minute_windspeed_index btree (unmunge_time(time_group)) CREATE OR REPLACE FUNCTION unmunge_time( integer ) RETURNS timestamp AS ' DECLARE input ALIAS FOR $1; BEGIN RETURN (''epoch''::timestamptz + input * ''1sec''::interval)::timestamp; END; ' LANGUAGE plpgsql IMMUTABLE STRICT; weather=# \d readings Table public.readings Column|Type | Modifiers --+-+- when | timestamp without time zone | not null default (timeofday())::timestamp without time zone hour_group | integer | minute_group
Re: [PERFORM] Index isn't used during a join.
On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote: I'm working with a query to get more info out with a join. The base query works great speed wise because of index usage. When the join is tossed in, the index is no longer used, so the query performance tanks. The first query you posted returns 285 rows and the second returns over one million; index usage aside, that difference surely accounts for a performance penalty. And as is often pointed out, index scans aren't always faster than sequential scans: the more of a table a query has to fetch, the more likely a sequential scan will be faster. Have the tables been vacuumed and analyzed? The planner's estimates for windspeed are pretty far off, which could be affecting the query plan: - Sort (cost=12997.68..13157.98 rows=64120 width=28) (actual time=2286.155..2286.450 rows=284 loops=1) Sort Key: date_part('doy'::text, unmunge_time(windspeed.time_group)) - Seq Scan on windspeed (cost=0.00..7878.18 rows=64120 width=28) (actual time=2279.275..2285.271 rows=284 loops=1) Filter: (unmunge_time(time_group) (now() - '24:00:00'::interval)) That's a small amount of the total query time, however, so although an index scan might help it probably won't provide the big gain you're looking for. Have you done any tests with enable_seqscan disabled? That'll show whether an index or bitmap scan would be faster. And have you verified that the join condition is correct? Should the query be returning over a million rows? -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org