[SQL] Multi-column index not used, new flipped column index is
Hi, I'm using PG 8.0.3 and recently spotted a query that was not using a multi-column index I had created. The index looks like: CREATE INDEX . ON FooTable(fkColumnOne, fkColumnTwo); The query that was not using the index was using: SELECT a bunch of columns and joins WHERE FooTable.fkColumnTwo=1000; So I added another index where the indexed columns are flipped: CREATE INDEX . ON FooTable(fkColumnTwo, fkColumnOne); Now the query started to use that index -- good! But now I have 2 indices that are nearly the same, and that means overhead during INSERTs/DELETEs. :( Is there a way to get this to use that first index, so I don't have to have this additional index? If I had PG 8.1.3, would PG know how to use that first index? I seem to recall something about this somewhere... but I don't see it at http://www.postgresql.org/docs/whatsnew . Thanks, Otis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Multi-column index not used, new flipped column index is
Hi, Otis, [EMAIL PROTECTED] wrote: > I'm using PG 8.0.3 and recently spotted a query that was not using a > multi-column index I had created. The index looks like: > > CREATE INDEX . ON FooTable(fkColumnOne, fkColumnTwo); > > The query that was not using the index was using: > > SELECT a bunch of columns and joins WHERE > FooTable.fkColumnTwo=1000; That's how it is supposed to be currently. AFAIK, Oracle has a technique called "index skip scan" that could take (limited) use of that second index, but PostgreSQL currently does not support them. > So I added another index where the indexed columns are flipped: > > CREATE INDEX . ON FooTable(fkColumnTwo, fkColumnOne); > > Now the query started to use that index -- good! But now I have 2 > indices that are nearly the same, and that means overhead during > INSERTs/DELETEs. :( > Is there a way to get this to use that first index, so I don't have > to have this additional index? If I had PG 8.1.3, would PG know how > to use that first index? I seem to recall something about this > somewhere... but I don't see it at > http://www.postgresql.org/docs/whatsnew . Bitmap Index Scans can be your solution, but AFAIK they were invented in 8.1. For bitmap index scans, you have one index on fkColumnOne and one on fkColumnTwo, and the query planner knows to combine them when both columns are given in a WHERE clause. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] Help with a seq scan on multi-million row table
Hello, I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential scan on a multi-million row table. I _thought_ I had all the appropriate indices, but apparently I do not. I was wondering if anyone can spot a way I can speed up this query. The query currently takes... *gulp*: 381119.201 ms :( There are only 2 tables in the game: user_url and user_url_tag. The latter has FKs pointing to the former. The sequential scan happens on the latter - user_url_tag: EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from user_url_tag userurltag0_, user_url userurl1_ WHERE (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) GROUP BY userurltag0_.tag ORDER BY count(*) DESC; QUERY PLAN --- Unique (cost=140972.22..140996.28 rows=3207 width=10) (actual time=381082.868..381110.094 rows=2546 loops=1) -> Sort (cost=140972.22..140980.24 rows=3207 width=10) (actual time=381082.858..381091.733 rows=2546 loops=1) Sort Key: count(*), userurltag0_.tag -> HashAggregate (cost=140777.45..140785.46 rows=3207 width=10) (actual time=381032.844..381064.068 rows=2546 loops=1) -> Hash Join (cost=2797.65..140758.50 rows=3790 width=10) (actual time=248.530..380635.132 rows=8544 loops=1) Hash Cond: ("outer".user_url_id = "inner".id) -> Seq Scan on user_url_tag userurltag0_ (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 rows=6259553 loops=1) -> Hash (cost=2795.24..2795.24 rows=962 width=4) (actual time=199.840..199.840 rows=0 loops=1) -> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2795.24 rows=962 width=4) (actual time=0.048..193.707 rows=1666 loops=1) Index Cond: (user_id = 1) Total runtime: 381119.201 ms (11 rows) This is what the two tables look like (extra colums removed): Table "public.user_url_tag" Column| Type | Modifiers -+---+-- id | integer | not null default nextval('public.user_url_tag_id_seq'::text) user_url_id | integer | tag | character varying(64) | Indexes: "pk_user_url_tag_id" PRIMARY KEY, btree (id) "ix_user_url_tag_tag" btree (tag) "ix_user_url_tag_user_url_id" btree (user_url_id) Foreign-key constraints: "fk_user_url_tag_user_url_id" FOREIGN KEY (user_url_id) REFERENCES user_url(id) Table "public.user_url" Column |Type | Modifiers --+-+-- id | integer | not null default nextval('public.user_url_id_seq'::text) user_id | integer | url_id | integer | Indexes: "pk_user_url_id" PRIMARY KEY, btree (id) "ix_user_url_url_id_user_id" UNIQUE, btree (url_id, user_id) "ix_user_url_user_id_url_id" UNIQUE, btree (user_id, url_id) Does anyone see a way to speed up this s-l-o-w query? I cache DB results, but I'd love to get rid of that sequential scan. Thanks, Otis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Multi-column index not used, new flipped column index is
Markus Schaber <[EMAIL PROTECTED]> schrieb: > Bitmap Index Scans can be your solution, but AFAIK they were invented in > 8.1. Right. > > For bitmap index scans, you have one index on fkColumnOne and one on > fkColumnTwo, and the query planner knows to combine them when both > columns are given in a WHERE clause. Yeah! The Bitmap Index Scan is a great thing. With some selects i have a performance-boost from 5 to 10, compared with 8.0. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly."(unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Help with a seq scan on multi-million row table
On Wed, May 10, 2006 at 13:13:59 -0500, [EMAIL PROTECTED] wrote: > Hello, > > I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential > scan on a multi-million row table. I _thought_ I had all the appropriate > indices, but apparently I do not. I was wondering if anyone can spot a way I > can speed up this query. > The query currently takes... *gulp*: 381119.201 ms :( > > There are only 2 tables in the game: user_url and user_url_tag. The latter > has FKs pointing to the former. The sequential scan happens on the latter - > user_url_tag: > > EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ > from user_url_tag userurltag0_, user_url userurl1_ WHERE > (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) > GROUP BY userurltag0_.tag ORDER BY count(*) DESC; While this isn't a big issue, it looks like DISTINCT is redundant in your query and seems to be adding some extra work. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Help with a seq scan on multi-million row table
Hi, Thanks Bruno. That was indeed a redundant DISTINCT. It did reduce the amount of work, but as you said it doesn't get rid of the sequential scan, which is the real problem with this query. Otis - Original Message From: Bruno Wolff III <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: pgsql-sql@postgresql.org Sent: Wednesday, May 10, 2006 3:23:29 PM Subject: Re: [SQL] Help with a seq scan on multi-million row table On Wed, May 10, 2006 at 13:13:59 -0500, [EMAIL PROTECTED] wrote: > Hello, > > I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential > scan on a multi-million row table. I _thought_ I had all the appropriate > indices, but apparently I do not. I was wondering if anyone can spot a way I > can speed up this query. > The query currently takes... *gulp*: 381119.201 ms :( > > There are only 2 tables in the game: user_url and user_url_tag. The latter > has FKs pointing to the former. The sequential scan happens on the latter - > user_url_tag: > > EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ > from user_url_tag userurltag0_, user_url userurl1_ WHERE > (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) > GROUP BY userurltag0_.tag ORDER BY count(*) DESC; While this isn't a big issue, it looks like DISTINCT is redundant in your query and seems to be adding some extra work. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Help with a seq scan on multi-million row table
<[EMAIL PROTECTED]> writes: >-> Hash Join (cost=2797.65..140758.50 rows=3790 width=10) > (actual time=248.530..380635.132 rows=8544 loops=1) > Hash Cond: ("outer".user_url_id = "inner".id) > -> Seq Scan on user_url_tag userurltag0_ > (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 > rows=6259553 loops=1) > -> Hash (cost=2795.24..2795.24 rows=962 width=4) > (actual time=199.840..199.840 rows=0 loops=1) >-> Index Scan using ix_user_url_user_id_url_id on > user_url userurl1_ (cost=0.00..2795.24 rows=962 width=4) (actual > time=0.048..193.707 rows=1666 loops=1) > Index Cond: (user_id = 1) Hm, I'm not sure why it's choosing that join plan. A nestloop indexscan wouldn't be terribly cheap, but just counting on my fingers it seems like it ought to come in at less than 10 cost units. What do you get if you set enable_hashjoin off? (Then try disabling its second-choice join type too --- I'm interested to see EXPLAIN ANALYZE output for all three join types.) What PG version is this exactly? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Help with a seq scan on multi-million row table
Aha! set hashjoin=off did the trick. The PG version is: 8.0.3 NB: I removed that redundant "DISTINCT" after the SELECT. EXPLAIN ANALYZE select userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from user_url_tag userurltag0_, user_url userurl1_ where (((userurl1_.user_id=1 ))AND((userurltag0_.user_url_id=userurl1_.id ))) group by userurltag0_.tag order by count(*)DESC; QUERY PLAN - Sort (cost=155766.79..155774.81 rows=3207 width=10) (actual time=2387.756..2396.578 rows=2546 loops=1) Sort Key: count(*) -> HashAggregate (cost=155572.02..155580.03 rows=3207 width=10) (actual time=2365.643..2376.626 rows=2546 loops=1) -> Nested Loop (cost=0.00..12.68 rows=3867 width=10) (actual time=0.135...028 rows=8544 loops=1) -> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2798.12 rows=963 width=4) (actual time=0.067..9.744 rows=1666 loops=1) Index Cond: (user_id = 1) -> Index Scan using ix_user_url_tag_user_url_id on user_url_tag userurltag0_ (cost=0.00..157.34 rows=103 width=14) (actual time=1.223..1.281 rows=5 loops=1666) Index Cond: (userurltag0_.user_url_id = "outer".id) Total runtime: 2405.691 ms (9 rows) Are you still interested in other "its second-choice join type"? If you are, please tell me what join types those are, this is a bit beyond me. :( Is there a way to force PG to use the index automatically? This query is executed from something called Hibernate, and I'm not sure if that will let me set enable_hashjoin=off through its API... Thanks, Otis - Original Message From: Tom Lane <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: pgsql-sql@postgresql.org Sent: Wednesday, May 10, 2006 8:27:01 PM Subject: Re: [SQL] Help with a seq scan on multi-million row table <[EMAIL PROTECTED]> writes: >-> Hash Join (cost=2797.65..140758.50 rows=3790 width=10) > (actual time=248.530..380635.132 rows=8544 loops=1) > Hash Cond: ("outer".user_url_id = "inner".id) > -> Seq Scan on user_url_tag userurltag0_ > (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 > rows=6259553 loops=1) > -> Hash (cost=2795.24..2795.24 rows=962 width=4) > (actual time=199.840..199.840 rows=0 loops=1) >-> Index Scan using ix_user_url_user_id_url_id on > user_url userurl1_ (cost=0.00..2795.24 rows=962 width=4) (actual > time=0.048..193.707 rows=1666 loops=1) > Index Cond: (user_id = 1) Hm, I'm not sure why it's choosing that join plan. A nestloop indexscan wouldn't be terribly cheap, but just counting on my fingers it seems like it ought to come in at less than 10 cost units. What do you get if you set enable_hashjoin off? (Then try disabling its second-choice join type too --- I'm interested to see EXPLAIN ANALYZE output for all three join types.) What PG version is this exactly? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Help with a seq scan on multi-million row table
<[EMAIL PROTECTED]> writes: > Aha! set hashjoin=off did the trick. >-> Index Scan using ix_user_url_tag_user_url_id on > user_url_tag userurltag0_ (cost=0.00..157.34 rows=103 width=14) (actual > time=1.223..1.281 rows=5 loops=1666) > Index Cond: (userurltag0_.user_url_id = "outer".id) This seems to be the problem right here: the estimate of matching rows is off by a factor of 20, and that inflates the overall cost estimate for this plan about the same, causing the planner to think the other way is cheaper. What does the pg_stats row for user_url_tag.user_url_id contain? Have you analyzed that table recently? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Help with a seq scan on multi-million row table
Not sure if I'm showing you what you asked for, but here it is: select * from pg_stats where tablename='user_url_tag' and attname='user_url_id'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +--+-+---+---++++--+- public | user_url_tag | user_url_id | 0 | 4 | 60825 | {458321,1485346,16304,68027,125417,153465,182503,201175,202973,218423} | {0.0013,0.001,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067} | {195,195993,325311,480323,647778,782598,1014527,1201726,1424822,1614712,1853719} |0.795521 You asked if the table has been analyzed recently. I think so - I run ANALYZE on the whole DB every night, like this: $ psql -U me -c "ANALYZE;" mydb For a good measure, I just analyzed the table now: $ psql -U me -c "ANALYZE user_url_tag;" mydb Then I set the enable_hashjoin back to ON and re-run the EXPLAIN ANALYZE. I still get the sequential scan, even after analyzing the table :( I'm not sure which numbers you are referring to when you said the estimate is off, but here are some numbers: The whole table has 6-7 M rows. That query matches about 2500 rows. If there are other things I can play with and help narrow this down, please let me know. Thanks, Otis - Original Message From: Tom Lane <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: pgsql-sql@postgresql.org Sent: Wednesday, May 10, 2006 9:53:49 PM Subject: Re: [SQL] Help with a seq scan on multi-million row table <[EMAIL PROTECTED]> writes: > Aha! set hashjoin=off did the trick. >-> Index Scan using ix_user_url_tag_user_url_id on > user_url_tag userurltag0_ (cost=0.00..157.34 rows=103 width=14) (actual > time=1.223..1.281 rows=5 loops=1666) > Index Cond: (userurltag0_.user_url_id = "outer".id) This seems to be the problem right here: the estimate of matching rows is off by a factor of 20, and that inflates the overall cost estimate for this plan about the same, causing the planner to think the other way is cheaper. What does the pg_stats row for user_url_tag.user_url_id contain? Have you analyzed that table recently? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] ORDER BY question
Hi, For example I have a table like this ID Name 1 A 1 B 2 C 2 D And I have a Query like this: SELECT * FROM table ORDER BY ID. Would my result always give me the same order or is there may be a possible different result? For example? 1 B 1 A 2 D 2 C ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] ORDER BY question
On May 11, 2006, at 21:00 , Christian Paul Cosinas wrote: For example I have a table like this ID Name 1 A 1 B 2 C 2 D And I have a Query like this: SELECT * FROM table ORDER BY ID. Would my result always give me the same order or is there may be a possible different result? It may give you a different result: if the order is not explicitly specified by the ORDER BY clause, you are not guaranteed to get the rows back in any particular order. So for cases where you may not be ordering on a unique column (such as above), you may want to do something like: SELECT * FROM table ORDER BY "ID", "Name"; Hope this helps. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq