[PERFORM] An "obvious" index not being used

Daniele Varrazzo Wed, 18 Jun 2008 18:14:04 -0700

Hello,

I am experiencing a query for which an useful index is not being used byPostgreSQL. The query is in the form:


    select count(*) from foo
    where foo.account_id in (
        select id from accounts where system = 'abc');

and the size of the tables it works on is:

  - 270 records in "accounts" 22 of which match the condition 'abc';
  - 5.3M records in "foo", 92K of which match the query condition.

There is an index in the field "foo.account_id" but is not used. The resultingquery plan is:


 Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
time=13412.088..13412.089 rows=1 loops=1)
   ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
time=0.678..13307.074 rows=92790 loops=1)
         Hash Cond: (foo.account_id = accounts.id)
         ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
(actual time=0.014..7163.538 rows=5313514 loops=1)
         ->  Hash  (cost=11.70..11.70 rows=22 width=4) (actual
time=0.199..0.199 rows=22 loops=1)
               ->  Bitmap Heap Scan on accounts  (cost=1.42..11.70 rows=22
width=4) (actual time=0.092..0.160 rows=22 loops=1)
                     Recheck Cond: (("system")::text = 'abc'::text)
                     ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..1.42 rows=22 width=0) (actual time=0.077..0.077 rows=22
loops=1)
                           Index Cond: (("system")::text = 'abc'::text)
 Total runtime: 13412.226 ms

There is a seqscan on the large table. If seqscans are disabled, the planbecomes the more acceptable:


 Aggregate  (cost=2471979.99..2471980.00 rows=1 width=0) (actual
time=630.977..630.978 rows=1 loops=1)
   ->  Nested Loop  (cost=1258.12..2470897.61 rows=432953 width=0) (actual
time=0.164..526.174 rows=92790 loops=1)
         ->  HashAggregate  (cost=12.75..12.97 rows=22 width=4) (actual
time=0.131..0.169 rows=22 loops=1)
               ->  Bitmap Heap Scan on accounts  (cost=2.42..12.70 rows=22
width=4) (actual time=0.047..0.091 rows=22 loops=1)
                     Recheck Cond: (("system")::text = 'abc'::text)
                     ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..2.42 rows=22 width=0) (actual time=0.036..0.036 rows=22
loops=1)
                           Index Cond: (("system")::text = 'abc'::text)
         ->  Bitmap Heap Scan on foo  (cost=1245.37..111275.14 rows=83024
width=4) (actual time=3.086..14.391 rows=4218 loops=22)
               Recheck Cond: (foo.account_id = accounts.id)
               ->  Bitmap Index Scan on ifoo_x1  (cost=0.00..1224.61
rows=83024 width=0) (actual time=2.962..2.962 rows=4218 loops=22)
                     Index Cond: (foo.account_id = accounts.id)
 Total runtime: 631.121 ms

where the index "ifoo_x1" is used.

A similar query plan can be also obtained performing first the internal queryand hardcoding the result in a new query:


    explain analyze select count(*) from foo
    where account_id in
(70,33,190,21,191,223,203,202,148,246,85,281,280,319,234,67,245,310,318,279,320,9);


I have tried to:

  - rewrite the query with a JOIN instead of an IN (no change in the plan),
  - rewrite the query using EXISTS (it gets worse),
  - raise the statistics for the foo.account_id field to 100 and to 1000,
  - decrease the random_page_cost down to 1,
  - vacuum-analyze the tables at each change,

none of which has changed the situation.

The system is an Ubuntu Hardy 64 bits running PG 8.3. The issue has beenconfirmed on Mac OS 1.5/PG 8.3. Although I made fewer tests on a PG 8.2 werecently switched from, I think the issue presents on that version too.

This is the first time I see the query planner failing a plan rather obvious:is there any other setting to tweak to force it to do good? (but a sensibletweaking: the random_page_cost to 1 was just a try to have the index used,nothing to be really put in production)

If you want to try the issue, an anonimized dataset is available onhttp://piro.develer.com/test.sql.bz2 . The file size is 46MB (1.5GBuncompressed). Chris Mair, who tested it on Mac OS, also noticed that PGbehaved correctly with the freshly imported data: as soon as he VACUUMed thedatabase he started experiencing the described issue.


Thank you very much.

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] An "obvious" index not being used

Reply via email to