On 24 Jun 2010, at 11:13, kaifeng.zhu wrote:

> <resend to mailing list>
> 
> On Thu, Jun 24, 2010 at 16:57, Thom Brown <thombr...@gmail.com> wrote:
>> Sounds like the planner took a wrong turn in the 2nd case.  Which
>> version of PostgreSQL are you running?
> 
> PostgreSQL version 8.1.21 (With schemas)
> 
> 
> The explain commands show that:

For a next time, explain analyse would have been more useful.

> 
> db1=> explain select * from emails where email_msg_id =
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1;
>                                                      QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..4.00 rows=1 width=184)
>   ->  Index Scan using idx_emails_email_msg_id on emails
> (cost=0.00..370016.42 rows=92413 width=184)

I think your problem is here. I assume that msg-id's are fairly unique between 
messages, but the planner expects there are over 90,000 records matching this 
ID. Hence the planner thinks this index has a low selectivity.

>         Index Cond: (email_msg_id =
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar)
> (3 rows)
> 
> db1=> explain select * from emails where email_msg_id =
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender =
> 'sen...@domain.com' limit 1;
> 
>    QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..801.06 rows=1 width=184)
>   ->  Index Scan using idx_emails_email_sender on emails
> (cost=0.00..370089.46 rows=462 width=184)

For this index the planner only expects 462 rows, so it has a higher 
selectivity than the index on msg_id and therefore the planner prefers this 
index over the other one.

>         Index Cond: (email_sender = 'sen...@domain.com'::text)
>         Filter: (email_msg_id = 
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar)

Apparently there are quite a few records matching that sender, and the database 
has to check each of them whether they have the requested msg_id or not. 
Apparently the planner decided that a seqscan on the results would be the most 
efficient here - and it would indeed be fairly quick if there are only 462 rows 
with this sender.

In summary, I think your statistics are off. Do you vacuum frequently enough? 
Autovacuum helps here, but there have been large improvements to that in later 
versions.

Another approach would be an index on (email_sender, email_msg_id) - that would 
particularly help the second query and it shouldn't hurt queries on just 
email_sender much.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c2325a8286216564294622!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to