At 10:11 PM +0100 11/6/04, Pierre-Frédéric Caillaud wrote:
explain  select notificationID from NOTIFICATION n, ITEM i where
n.itemID = i.itemID;
                                  QUERY PLAN

------------------------------------------------------------------------
------
 Hash Join  (cost=47162.85..76291.32 rows=223672 width=44)
   Hash Cond: ("outer".itemid = "inner".itemid)
   ->  Seq Scan on notification n  (cost=0.00..12023.71 rows=223671
width=48)
   ->  Hash  (cost=42415.28..42415.28 rows=741028 width=4)
         ->  Seq Scan on item i  (cost=0.00..42415.28 rows=741028
width=4)

This query takes about 20 seconds to run.

Well, you're joining the entire two tables, so yes, the seq scan might be faster.
Try your query with enable_seqscan=0 so it'll use an index scan and compare the times.
You may be surprised to find that the planner has indeed made the right choice.
This query selects 223672 rows, are you surprised it's slow ?

I'm not a SQL guru by any stretch but would a constrained sub-select be appropriate here?


e.g. a simple test setup where each record in table test1 has a FK referenced to an entry in test:

joels=# \d test
        Table "public.test"
 Column |     Type     | Modifiers
--------+--------------+-----------
 id     | integer      | not null
 foo    | character(3) |
Indexes:
    "test_pkey" primary key, btree (id)

joels=# \d test1
     Table "public.test1"
 Column  |  Type   | Modifiers
---------+---------+-----------
 id      | integer | not null
 test_id | integer |
Indexes:
    "test1_pkey" primary key, btree (id)
    "test1_test_id_idx" btree (test_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (test_id) REFERENCES test(id) ON DELETE CASCADE

joels=# select count(*) from test;
 count
-------
 10001
(1 row)

joels=# select count(*) from test1;
 count
-------
 10001
(1 row)

joels=# explain select test_id from test1 t1, test t where t1.test_id = t.id;
                               QUERY PLAN
------------------------------------------------------------------------
 Hash Join  (cost=170.01..495.05 rows=10002 width=4)
   Hash Cond: ("outer".test_id = "inner".id)
   ->  Seq Scan on test1 t1  (cost=0.00..150.01 rows=10001 width=4)
   ->  Hash  (cost=145.01..145.01 rows=10001 width=4)
         ->  Seq Scan on test t  (cost=0.00..145.01 rows=10001 width=4)
(5 rows)

joels=# explain select test_id from test1 t1 where test_id in (select id from test where id = t1.test_id);
QUERY PLAN ------------------------------------------------------------------------------
Seq Scan on test1 t1 (cost=0.00..15269.02 rows=5001 width=4)
Filter: (subplan)
SubPlan
-> Index Scan using test_pkey on test (cost=0.00..3.01 rows=2 width=4)
Index Cond: (id = $0)
(5 rows)



So with the subselect the query planner would use the primary key index on test when finding referencing records in the test1 table.


Pierre, I seen the advice to use an additional where condition in certain cases to induce an index scan; how is this done?

my 1.2 pennies,
-Joel

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to