I have 3 tables that I am trying to join together: 

------------------------------------------
  Table "caturljoin"
 Attribute |  Type   |
-----------+---------+
 category  | integer |
 url       | integer |
Index: caturljoin_url 
       caturljoin_cat

           Table "stories"
  Attribute  |          Type          |
-------------+------------------------+
 urn         | integer                |
 headline    | character varying      |
 author      | character varying      |
 source      | integer                |
 story       | text                   |
 added       | date                   |
 description | character varying      |
 displayall  | smallint               |
 fullurl     | character varying(255) |
 publish     | smallint               |
 error       | integer                |
 sourceurl   | character varying(255) |
 sourcename  | character varying(100) |
 rank        | smallint               |
Indices: stories_added,
         stories_source,
         stories_unique_story,
         stories_urn_key

           Table "urllist"
  Attribute   |          Type          |
--------------+------------------------+
 urn          | integer                |
 url          | character varying(255) |
 friendlyname | character varying(30)  |
 homepage     | character varying(255) |
 method       | smallint               |
 script       | character varying(20)  |
 params       | character varying(500) |
 collect      | smallint               |
 section      | smallint               |
 index_script | character varying      |
 regexp       | character varying(100) |
 baseurl      | character varying(75)  |
Index: urllist_urn
------------------------------------------

With the following SQL:

------------------------------------------
SELECT a.category, b.headline, b.added, c.friendlyname
FROM caturljoin as a
        INNER JOIN stories as b ON (a.url = b.source)
        INNER JOIN urllist as c ON (a.url = d.urn)
WHERE a.category = 93 ORDER BY b.added DESC LIMIT 1;
------------------------------------------

The results of explain for the above are:

------------------------------------------
psql:scratch.sql:5: NOTICE:  QUERY PLAN:
 
Limit  (cost=1587.30..1587.30 rows=1 width=44)
  ->  Sort  (cost=1587.30..1587.30 rows=1 width=44)
        ->  Merge Join  (cost=249.89..1587.29 rows=1 width=44)
              ->  Sort  (cost=249.89..249.89 rows=409 width=28)
                    ->  Nested Loop  (cost=0.00..232.15 rows=409 width=28)
                          ->  Index Scan using caturljoin_cat on caturljoin 
a  (cost=0.00..5.09 rows=7 width=8)
                          ->  Index Scan using stories_source on stories b  
(cost=0.00..34.41 rows=29 width=20)
              ->  Index Scan using urllist_urn on urllist c  
(cost=0.00..1323.69 rows=505 width=16)
 
EXPLAIN
------------------------------------------

and as you might be able to guess the query takes an age to complete.

If I remove the table urllist from the query, I get a much better response:

------------------------------------------
psql:scratch.sql:4: NOTICE:  QUERY PLAN:
 
Limit  (cost=0.00..207.74 rows=1 width=28)
  ->  Nested Loop  (cost=0.00..84945.18 rows=409 width=28)
        ->  Index Scan Backward using stories_added on stories b  
(cost=0.00..2310.04 rows=16149 width=20)
        ->  Index Scan using caturljoin_url on caturljoin a  
(cost=0.00..5.10 rows=1 width=8)
 
EXPLAIN
------------------------------------------

Currently the tables contain the following rows of data:

------------------------------------------
caturljoin: 653 rows
urllist: 505 rows
stories: 21554 rows
------------------------------------------

Can anyone tell me why the inclusion of urllist would slow it down so much, 
and what can I do to improve the speed of the query?

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Reply via email to