I've got a very slow query, which I can make faster by doing something
seemingly trivial. 
The query has been trouble for years (always slow, sometimes taking hours):

 512,600ms Original, filter on articles.indexed (622 results)
   7,500ms Remove "AND articles.indexed" (726 results, undesirable).
   7,675ms Extra join for "AND articles.indexed" (622 results, same as
original).

Hardware is Postgres 8.3 on a Sunfire X4240 under Debian Lenny, with a
fresh ANALYZE. What I don't understand is why the improvement? Is the
second way of doing things actually superior, or is this just a query
planner edge case?


Original (512,600ms)
---------------------
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
  (SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
   INTERSECT
   SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
  (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
   WHERE lower(bp_categories.category) = 'law')
AND articles.indexed;

Extra join (7,675ms)
---------------------------------------------
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts JOIN articles using (context_key)
WHERE contexts.context_key IN
(
SELECT contexts.context_key FROM contexts
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
    (SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
     INTERSECT
     SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
    (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
     WHERE lower(bp_categories.category) = 'law')
)
AND articles.indexed;



# select indexed,count(*) from articles group by indexed;
 indexed | count
---------+--------
 t       | 354605
 f       | 513552


QUERY PLAN                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=131663.39..140005.40 rows=4769 width=4) (actual
time=511893.241..512599.348 rows=622 loops=1)
   Hash Cond: (matview_82034.context_key = articles.context_key)
   ->  Seq Scan on matview_82034  (cost=0.00..6596.00 rows=465600
width=4) (actual time=0.019..463.278 rows=438220 loops=1)
   ->  Hash  (cost=131663.38..131663.38 rows=1 width=16) (actual
time=511659.671..511659.671 rows=622 loops=1)
         ->  Nested Loop IN Join  (cost=46757.70..131663.38 rows=1
width=16) (actual time=1142.789..511656.211 rows=622 loops=1)
               Join Filter: (a.context_key = articles.context_key)
               ->  Nested Loop  (cost=46757.70..46789.06 rows=2
width=12) (actual time=688.057..839.297 rows=1472 loops=1)
                     ->  Nested Loop  (cost=46757.70..46780.26 rows=2
width=8) (actual time=688.022..799.945 rows=1472 loops=1)
                           ->  Subquery Scan "IN_subquery" 
(cost=46757.70..46757.97 rows=5 width=4) (actual time=687.963..743.587
rows=1652 loops=1)
                                 ->  SetOp Intersect 
(cost=46757.70..46757.93 rows=5 width=4) (actual time=687.961..738.955
rows=1652 loops=1)
                                       ->  Sort 
(cost=46757.70..46757.81 rows=46 width=4) (actual time=687.943..709.972
rows=19527 loops=1)
                                             Sort Key: "*SELECT*
1".context_key
                                             Sort Method:  quicksort 
Memory: 1684kB
                                             ->  Append 
(cost=0.00..46756.43 rows=46 width=4) (actual time=8.385..657.839
rows=19527 loops=1)
                                                   ->  Subquery Scan
"*SELECT* 1"  (cost=0.00..23378.21 rows=23 width=4) (actual
time=8.383..215.613 rows=4002 loops=1)
                                                         ->  Nested
Loop  (cost=0.00..23377.98 rows=23 width=4) (actual time=8.380..207.499
rows=4002 loops=1)
                                                               ->  Index
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual
time=0.102..0.105 rows=1 loops=1)
                                                                    
Index Cond: ((word)::text = 'insider'::text)
                                                               ->  Index
Scan using article_words_wc on article_words  (cost=0.00..23219.17
rows=12268 width=8) (actual time=8.272..199.224 rows=4002 loops=1)
                                                                    
Index Cond: (public.article_words.word_key = public.words.word_key)
                                                   ->  Subquery Scan
"*SELECT* 2"  (cost=0.00..23378.21 rows=23 width=4) (actual
time=5.397..404.164 rows=15525 loops=1)
                                                         ->  Nested
Loop  (cost=0.00..23377.98 rows=23 width=4) (actual time=5.394..372.883
rows=15525 loops=1)
                                                               ->  Index
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual
time=0.054..0.056 rows=1 loops=1)
                                                                    
Index Cond: ((word)::text = 'trading'::text)
                                                               ->  Index
Scan using article_words_wc on article_words  (cost=0.00..23219.17
rows=12268 width=8) (actual time=5.331..341.535 rows=15525 loops=1)
                                                                    
Index Cond: (public.article_words.word_key = public.words.word_key)
                           ->  Index Scan using article_key_idx on
articles  (cost=0.00..4.44 rows=1 width=4) (actual time=0.026..0.029
rows=1 loops=1652)
                                 Index Cond: (articles.context_key =
"IN_subquery".context_key)
                                 Filter: articles.indexed
                     ->  Index Scan using contexts_pkey on contexts 
(cost=0.00..4.39 rows=1 width=4) (actual time=0.018..0.021 rows=1
loops=1472)
                           Index Cond: (contexts.context_key =
articles.context_key)
               ->  Nested Loop  (cost=0.00..111539.51 rows=1261757
width=4) (actual time=0.019..306.679 rows=39189 loops=1472)
                     ->  Seq Scan on bp_categories  (cost=0.00..1315.59
rows=16613 width=4) (actual time=0.008..57.960 rows=14529 loops=1472)
                           Filter: (lower(category) = 'law'::text)
                     ->  Index Scan using virtual_ancestor_key_idx on
virtual_ancestors a  (cost=0.00..5.18 rows=116 width=8) (actual
time=0.005..0.010 rows=3 loops=21386112)
                           Index Cond: (a.ancestor_key =
bp_categories.context_key)
 Total runtime: 512600.354 ms
(37 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to