So as the op, back to the original posting....

In the real world, what should I do?  Does it make sense to pull the "AND articles.indexed" clause into an outer query?  Will that query simply perform poorly on other arbitrary combinations of words?


I'm happy to test any given query against the same set of servers. If it involves a persistent change
it has to run on a test server).  For example, the Robert Haas method:
# ...
Total runtime: 254207.857 ms

# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 1000;
# ANALYZE VERBOSE article_words
INFO:  analyzing "public.article_words"
INFO:  "article_words": scanned 300000 of 1342374 pages, containing 64534899 live rows and 3264839 dead rows; 300000 rows in sample, 288766568 estimated total rows
ANALYZE
# ...
Total runtime: 200591.751 ms

# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 50;
# ANALYZE VERBOSE article_words
# ...
Total runtime: 201204.972 ms


Sadly, it made essentially zero difference.  Attached.


preproduction-20091214=# EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
    (SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'insider'
     INTERSECT
     SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'trading')
AND contexts.context_key IN
         (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON 
(a.ancestor_key = bp_categories.context_key)
         WHERE lower(bp_categories.category) = 'law') AND articles.indexed;
                                                                                
              QUERY PLAN                                                        
       
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=12861.91..62150.21 rows=2 width=4) (actual 
time=136.789..200744.455 rows=546 loops=1)
   ->  Nested Loop IN Join  (cost=12861.91..62148.32 rows=1 width=16) (actual 
time=136.777..200737.004 rows=546 loops=1)
         Join Filter: (a.context_key = articles.context_key)
         ->  Nested Loop  (cost=12861.91..12875.48 rows=2 width=12) (actual 
time=55.674..116.443 rows=1306 loops=1)
               ->  Nested Loop  (cost=12861.91..12871.68 rows=2 width=8) 
(actual time=55.662..97.863 rows=1306 loops=1)
                     ->  Subquery Scan "IN_subquery"  (cost=12861.91..12862.18 
rows=5 width=4) (actual time=55.639..75.777 rows=1473 loops=1)
                           ->  SetOp Intersect  (cost=12861.91..12862.14 rows=5 
width=4) (actual time=55.638..73.724 rows=1473 loops=1)
                                 ->  Sort  (cost=12861.91..12862.02 rows=46 
width=4) (actual time=55.631..62.140 rows=17892 loops=1)
                                       Sort Key: "*SELECT* 1".context_key
                                       Sort Method:  quicksort  Memory: 1607kB
                                       ->  Append  (cost=0.00..12860.63 rows=46 
width=4) (actual time=0.040..42.026 rows=17892 loops=1)
                                             ->  Subquery Scan "*SELECT* 1"  
(cost=0.00..6430.32 rows=23 width=4) (actual time=0.039..6.909 rows=3583 
loops=1)
                                                   ->  Nested Loop  
(cost=0.00..6430.09 rows=23 width=4) (actual time=0.038..5.110 rows=3583 
loops=1)
                                                         ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.019..0.020 
rows=1 loops=1)
                                                               Index Cond: 
((word)::text = 'insider'::text)
                                                         ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..6323.81 rows=8325 width=8) 
(actual time=0.015..3.243 rows=3583 loops=1)
                                                               Index Cond: 
(public.article_words.word_key = public.words.word_key)
                                             ->  Subquery Scan "*SELECT* 2"  
(cost=0.00..6430.32 rows=23 width=4) (actual time=0.036..27.166 rows=14309 
loops=1)
                                                   ->  Nested Loop  
(cost=0.00..6430.09 rows=23 width=4) (actual time=0.035..20.037 rows=14309 
loops=1)
                                                         ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.015 
rows=1 loops=1)
                                                               Index Cond: 
((word)::text = 'trading'::text)
                                                         ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..6323.81 rows=8325 width=8) 
(actual time=0.017..12.618 rows=14309 loops=1)
                                                               Index Cond: 
(public.article_words.word_key = public.words.word_key)
                     ->  Index Scan using article_key_idx on articles  
(cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473)
                           Index Cond: (articles.context_key = 
"IN_subquery".context_key)
                           Filter: articles.indexed
               ->  Index Scan using contexts_pkey on contexts  (cost=0.00..1.89 
rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1306)
                     Index Cond: (contexts.context_key = articles.context_key)
         ->  Nested Loop  (cost=0.00..59848.83 rows=1124834 width=4) (actual 
time=0.012..142.436 rows=39193 loops=1306)
               ->  Seq Scan on bp_categories  (cost=0.00..1231.49 rows=16669 
width=4) (actual time=0.006..19.730 rows=14552 loops=1306)
                     Filter: (lower(category) = 'law'::text)
               ->  Index Scan using virtual_ancestor_key_idx on 
virtual_ancestors a  (cost=0.00..2.57 rows=76 width=8) (actual 
time=0.004..0.006 rows=3 loops=19004304)
                     Index Cond: (a.ancestor_key = bp_categories.context_key)
   ->  Index Scan using matview_82034_ck on matview_82034  (cost=0.00..1.88 
rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=546)
         Index Cond: (matview_82034.context_key = articles.context_key)
 Total runtime: 200745.372 ms
(36 rows)


preproduction-20091214=# ALTER TABLE article_words ALTER COLUMN word_key SET 
STATISTICS 1000;
ALTER TABLE
preproduction-20091214=# ANALYZE VERBOSE article_words;
INFO:  analyzing "public.article_words"
INFO:  "article_words": scanned 300000 of 1342374 pages, containing 64534899 
live rows and 3264839 dead rows; 300000 rows in sample, 288766568 estimated 
total rows
ANALYZE


preproduction-20091214=# EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
    (SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'insider'
     INTERSECT
     SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'trading')
AND contexts.context_key IN
         (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON 
(a.ancestor_key = bp_categories.context_key)
         WHERE lower(bp_categories.category) = 'law') AND articles.indexed;
                                                                                
              QUERY PLAN                                                        
       
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5120.81..54409.11 rows=2 width=4) (actual 
time=136.666..200590.858 rows=546 loops=1)
   ->  Nested Loop IN Join  (cost=5120.81..54407.22 rows=1 width=16) (actual 
time=136.654..200583.588 rows=546 loops=1)
         Join Filter: (a.context_key = articles.context_key)
         ->  Nested Loop  (cost=5120.81..5134.38 rows=2 width=12) (actual 
time=55.577..116.349 rows=1306 loops=1)
               ->  Nested Loop  (cost=5120.81..5130.58 rows=2 width=8) (actual 
time=55.563..97.425 rows=1306 loops=1)
                     ->  Subquery Scan "IN_subquery"  (cost=5120.81..5121.08 
rows=5 width=4) (actual time=55.500..75.615 rows=1473 loops=1)
                           ->  SetOp Intersect  (cost=5120.81..5121.04 rows=5 
width=4) (actual time=55.499..73.464 rows=1473 loops=1)
                                 ->  Sort  (cost=5120.81..5120.92 rows=46 
width=4) (actual time=55.491..62.078 rows=17892 loops=1)
                                       Sort Key: "*SELECT* 1".context_key
                                       Sort Method:  quicksort  Memory: 1607kB
                                       ->  Append  (cost=0.00..5119.54 rows=46 
width=4) (actual time=0.041..42.203 rows=17892 loops=1)
                                             ->  Subquery Scan "*SELECT* 1"  
(cost=0.00..2559.77 rows=23 width=4) (actual time=0.040..7.040 rows=3583 
loops=1)
                                                   ->  Nested Loop  
(cost=0.00..2559.54 rows=23 width=4) (actual time=0.039..5.188 rows=3583 
loops=1)
                                                         ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.018..0.019 
rows=1 loops=1)
                                                               Index Cond: 
((word)::text = 'insider'::text)
                                                         ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..2516.17 rows=3292 width=8) 
(actual time=0.017..3.299 rows=3583 loops=1)
                                                               Index Cond: 
(public.article_words.word_key = public.words.word_key)
                                             ->  Subquery Scan "*SELECT* 2"  
(cost=0.00..2559.77 rows=23 width=4) (actual time=0.036..27.282 rows=14309 
loops=1)
                                                   ->  Nested Loop  
(cost=0.00..2559.54 rows=23 width=4) (actual time=0.035..19.972 rows=14309 
loops=1)
                                                         ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.016 
rows=1 loops=1)
                                                               Index Cond: 
((word)::text = 'trading'::text)
                                                         ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..2516.17 rows=3292 width=8) 
(actual time=0.018..12.545 rows=14309 loops=1)
                                                               Index Cond: 
(public.article_words.word_key = public.words.word_key)
                     ->  Index Scan using article_key_idx on articles  
(cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473)
                           Index Cond: (articles.context_key = 
"IN_subquery".context_key)
                           Filter: articles.indexed
               ->  Index Scan using contexts_pkey on contexts  (cost=0.00..1.89 
rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1306)
                     Index Cond: (contexts.context_key = articles.context_key)
         ->  Nested Loop  (cost=0.00..59848.83 rows=1124834 width=4) (actual 
time=0.012..142.295 rows=39193 loops=1306)
               ->  Seq Scan on bp_categories  (cost=0.00..1231.49 rows=16669 
width=4) (actual time=0.006..19.743 rows=14552 loops=1306)
                     Filter: (lower(category) = 'law'::text)
               ->  Index Scan using virtual_ancestor_key_idx on 
virtual_ancestors a  (cost=0.00..2.57 rows=76 width=8) (actual 
time=0.004..0.006 rows=3 loops=19004304)
                     Index Cond: (a.ancestor_key = bp_categories.context_key)
   ->  Index Scan using matview_82034_ck on matview_82034  (cost=0.00..1.88 
rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=546)
         Index Cond: (matview_82034.context_key = articles.context_key)
 Total runtime: 200591.751 ms
(36 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to