Re: [GENERAL] optimizer choosing the wrong index
Heyho! On Wednesday 07 July 2010 13.42:59 Martin Below wrote: > I'm facing a strange problem where the optimizer does pick the wrong > index. What's not clear to me is why you want pg to chose a particular index. Are you having performance issues? (The case you show is a table with 200k rows - this doesn't look like very much.) cheers -- vbi -- Verbing weirds language. -- Calvin & Hobbes signature.asc Description: This is a digitally signed message part.
Re: [GENERAL] optimizer choosing the wrong index
Merlin Moncure writes: > I think so too, but suppose we wanted to force the other plan anyways: > select * from ps where (client_id, expires_on) >= > ('123', '24.11.2010'::timestamp) and (client_id, expires_on) < ('123', > null) order by client_id, expires_on; A simpler way to force use of the other index is begin; drop index wrong_index; explain analyze my_query; rollback; > I'd be curious to see explain analyze (not explain) comparisons for > the 'wrong' index vs above. I suspect the plan is 'correct' for > *most* of the data, or you cherry picked (or unluckily drew) a bad > value to get your 22 times speed difference. Yeah, it would be interesting to see explain analyze output for both cases, and for a few different values of the expires_on date if this one is unlike the typical value. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimizer choosing the wrong index
On Thu, Jul 8, 2010 at 10:11 AM, Tom Lane wrote: > Martin Below writes: >> test=# select count(*) total, count(distinct client_id) ids, >> count(distinct expires_on) dates from ps; >> total | ids | dates >> ++ >> 213645 | 123366 | 213549 > > That says the expires_on column is practically unique, which makes me > think the planner is indeed making the right choice. I think so too, but suppose we wanted to force the other plan anyways: select * from ps where (client_id, expires_on) >= ('123', '24.11.2010'::timestamp) and (client_id, expires_on) < ('123', null) order by client_id, expires_on; I'd be curious to see explain analyze (not explain) comparisons for the 'wrong' index vs above. I suspect the plan is 'correct' for *most* of the data, or you cherry picked (or unluckily drew) a bad value to get your 22 times speed difference. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimizer choosing the wrong index
Martin Below writes: > test=# select count(*) total, count(distinct client_id) ids, > count(distinct expires_on) dates from ps; > total | ids | dates > ++ > 213645 | 123366 | 213549 That says the expires_on column is practically unique, which makes me think the planner is indeed making the right choice. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimizer choosing the wrong index
Hello Merlin, thanks for your help. > can you supply the plans on the actual tables? the 'wrong' index > might actually be the 'right' one if expires_on is of high cardinality > (perhaps it's distributed badly and the table needs a stats tweak to > make it correct). test=# select count(*) total, count(distinct client_id) ids, count(distinct expires_on) dates from ps; total | ids | dates ++ 213645 | 123366 | 213549 I played arround with set statistics on both columns, but that didn't seem to help. (I did run analyze) > You can probably force the right index like this: > explain analyze select * from ps where (client_id, expires_on) = > ('foo', '2010-11-24'::timestamp); That didn't seem to work either: test=# explain select * from ps where (client_id, expires_on) = ('123', '24.11.2010'::timestamp); QUERY PLAN - Index Scan using idx_wrong on ps (cost=0.00..8.29 rows=1 width=53) Index Cond: (expires_on = '2010-11-24 00:00:00'::timestamp without time zone) Filter: ((client_id)::text = '123'::text) > > merlin > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimizer choosing the wrong index
Martin Below writes: > Why is "idx_wrong" used (which only includes one of the fields > queried) instead of idx_correct (which contains both fields)? It's not immediately clear that that choice is "wrong". A two-column index is bigger and hence more expensive to search than a one-column index --- perhaps quite substantially so, given this particular combination of datatypes. The planner is probably estimating that the expires_on condition is selective enough that nothing much will be gained by using the two-column index. If that's wrong (which is not proven by your example) you might be able to fix it by increasing the stats target for the table. Also, if you haven't customized effective_cache_size, increasing that makes large indexes look cheaper to use. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimizer choosing the wrong index
On Wed, Jul 7, 2010 at 7:42 AM, Martin Below wrote: > Hello, > > I'm facing a strange problem where the optimizer does pick the wrong index. > Im using postgres 8.4, and my schema look like this: > > client_id | character varying(36) | not null > key | character varying(16) | not null > expires_on | timestamp without time zone | not null > > Indexe: > "ps_pkey" PRIMARY KEY, btree (client_id, key) > "idx_correct" btree (client_id, expires_on) > "idx_wrong" btree (expires_on) > > > the query: > explain analyze select * from ps where client_id='foo' and expires_on > = timestamp '2010-11-24'; > > QUERY PLAN > --- > Index Scan using idx_wrong on ps (cost=0.00..8.29 rows=1 width=61) > (actual time=0.010..0.010 rows=0 loops=1) > Index Cond: (expires_on = '2010-11-24 00:00:00'::timestamp without time > zone) > Filter: ((client_id)::text = 'foo'::text) > Total runtime: 0.089 ms > > > Why is "idx_wrong" used (which only includes one of the fields > queried) instead of idx_correct (which contains both fields)? > If I drop idx_wrong, the correct index is choosen: > > test=# explain analyze select * from ps where client_id='foo' and > expires_on = timestamp '2010-11-24'; > > QUERY PLAN > - > Index Scan using idx_correct on ps (cost=0.00..8.34 rows=1 width=53) > (actual time=0.023..0.023 rows=0 loops=1) > Index Cond: (((client_id)::text = 'foo'::text) AND (expires_on = > '2010-11-24 00:00:00'::timestamp without time zone)) > Total runtime: 0.058 ms > > > > The problem seems to me that the estimates costs are not correct. > With the table containing about 200.000 records, using the "wrong" > index takes about 22 times as long as using the "right" index. I did > run "vacuum analyze", without any effect. > > Any help would be very much appreciated. can you supply the plans on the actual tables? the 'wrong' index might actually be the 'right' one if expires_on is of high cardinality (perhaps it's distributed badly and the table needs a stats tweak to make it correct). btw, consider using 'date' type for dates vs non timezone timestamp, which is a bit of a kludge imo. You can probably force the right index like this: explain analyze select * from ps where (client_id, expires_on) = ('foo', '2010-11-24'::timestamp); merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] optimizer choosing the wrong index
Hello, I'm facing a strange problem where the optimizer does pick the wrong index. Im using postgres 8.4, and my schema look like this: client_id | character varying(36) | not null key | character varying(16) | not null expires_on | timestamp without time zone | not null Indexe: "ps_pkey" PRIMARY KEY, btree (client_id, key) "idx_correct" btree (client_id, expires_on) "idx_wrong" btree (expires_on) the query: explain analyze select * from ps where client_id='foo' and expires_on = timestamp '2010-11-24'; QUERY PLAN --- Index Scan using idx_wrong on ps (cost=0.00..8.29 rows=1 width=61) (actual time=0.010..0.010 rows=0 loops=1) Index Cond: (expires_on = '2010-11-24 00:00:00'::timestamp without time zone) Filter: ((client_id)::text = 'foo'::text) Total runtime: 0.089 ms Why is "idx_wrong" used (which only includes one of the fields queried) instead of idx_correct (which contains both fields)? If I drop idx_wrong, the correct index is choosen: test=# explain analyze select * from ps where client_id='foo' and expires_on = timestamp '2010-11-24'; QUERY PLAN - Index Scan using idx_correct on ps (cost=0.00..8.34 rows=1 width=53) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: (((client_id)::text = 'foo'::text) AND (expires_on = '2010-11-24 00:00:00'::timestamp without time zone)) Total runtime: 0.058 ms The problem seems to me that the estimates costs are not correct. With the table containing about 200.000 records, using the "wrong" index takes about 22 times as long as using the "right" index. I did run "vacuum analyze", without any effect. Any help would be very much appreciated. Thanks, Martin Below -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general