If you guys succeed in making this class of query perform, you'll have
beat out the professional consulting firm we hired, which was all but
useless!  The query is usually slow, but particular combinations of
words seem to make it obscenely slow.

The query plans are now attached (sorry I did not start there: many
lists reject attachments). Or you can click on "text" at the query
planner analysis site http://explain.depesz.com/s/qYq


_Here's typical server load:_
Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  0.9%sy,  0.0%ni, 86.5%id, 11.2%wa,  0.0%hi,  0.1%si, 
0.0%st
Mem:  32966936k total, 32873860k used,    93076k free,     2080k buffers
Swap: 33554424k total,      472k used, 33553952k free, 30572904k cached

_
Configurations modified from Postgres 8.3 default are:_
listen_addresses = '10.100.2.11, 10.101.2.11'   # what IP address(es) to
listen on;
port = 5432                             # (change requires restart)
max_connections = 400                   # (change requires restart)
shared_buffers = 4096MB                 # min 128kB or max_connections*16kB
work_mem = 16MB                         # min 64kB
max_fsm_pages =  500000                 # default:20000
min:max_fsm_relations*16,6 bytes see:MAIN-5740
max_fsm_relations = 2700                # min 100, ~70 bytes each
checkpoint_segments = 20                # in logfile segments, min 1,
16MB each
random_page_cost = 2.0                  # same scale as above
effective_cache_size = 28672MB
default_statistics_target = 150         # range 1-1000
log_destination = 'syslog'              # Valid values are combinations of
log_min_error_statement = error # values in order of decreasing detail:
log_min_duration_statement = 5000       # -1 is disabled, 0 logs all
statements
log_checkpoints = on                    # default off
autovacuum_naptime = 5min               # time between autovacuum runs
escape_string_warning = off             # default:on  (See bepress
MAIN-4857)
standard_conforming_strings = off       # deafult:off (See bepress
MAIN-4857)



production=# EXPLAIN ANALYZE SELECT context_key FROM article_words
JOIN words using (word_key) WHERE word = 'insider';
-------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..23393.15 rows=23 width=4) (actual
time=0.077..15.637 rows=4003 loops=1)
   ->  Index Scan using words_word on words  (cost=0.00..5.47 rows=1
width=4) (actual time=0.049..0.051 rows=1 loops=1)
         Index Cond: ((word)::text = 'insider'::text)
   ->  Index Scan using article_words_wc on article_words 
(cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237
rows=4003 loops=1)
         Index Cond: (article_words.word_key = words.word_key)
 Total runtime: 19.776 ms

production=# EXPLAIN ANALYZE SELECT context_key FROM article_words
WHERE word_key = 3675;
-------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using article_words_wc on article_words 
(cost=0.00..21433.53 rows=11309 width=4) (actual time=0.025..7.579
rows=4003 loops=1)
   Index Cond: (word_key = 3675)
 Total runtime: 11.704 ms



production=# explain analyze select count(*) from article_words;
Aggregate  (cost=263831.63..263831.64 rows=1 width=0) (actual
time=35851.654..35851.655 rows=1 loops=1)
   ->  Seq Scan on words  (cost=0.00..229311.30 rows=13808130 width=0)
(actual time=0.043..21281.124 rows=13808184 loops=1)
 Total runtime: 35851.723 ms

production=# select count(*) from words;
13,808,184


production=# explain analyze select count(*) from article_words;
Aggregate  (cost=5453242.40..5453242.41 rows=1 width=0) (actual
time=776504.017..776504.018 rows=1 loops=1)
   ->  Seq Scan on article_words  (cost=0.00..4653453.52 rows=319915552
width=0) (actual time=0.034..438969.347 rows=319956663 loops=1)
 Total runtime: 776504.177 ms

production=# select count(*) from article_words;
319,956,720

Aggregate  (cost=139279.85..139279.86 rows=1 width=4) (actual 
time=502493.000..502493.001 rows=1 loops=1)
   ->  Hash Join  (cost=131314.31..139279.84 rows=2 width=4) (actual 
time=501787.397..502492.316 rows=622 loops=1)
         Hash Cond: (matview_82034.context_key = articles.context_key)
         ->  Seq Scan on matview_82034  (cost=0.00..6322.20 rows=438220 
width=4) (actual time=0.014..462.312 rows=438220 loops=1)
         ->  Hash  (cost=131314.30..131314.30 rows=1 width=16) (actual 
time=501553.755..501553.755 rows=622 loops=1)
               ->  Nested Loop IN Join  (cost=46291.79..131314.30 rows=1 
width=16) (actual time=467.546..501550.735 rows=622 loops=1)
                     Join Filter: (a.context_key = articles.context_key)
                     ->  Nested Loop  (cost=46291.79..46323.15 rows=2 width=12) 
(actual time=179.760..303.474 rows=1473 loops=1)
                           ->  Nested Loop  (cost=46291.79..46314.35 rows=2 
width=8) (actual time=179.743..273.866 rows=1473 loops=1)
                                 ->  Subquery Scan "IN_subquery"  
(cost=46291.79..46292.06 rows=5 width=4) (actual time=179.715..233.942 
rows=1653 loops=1)
                                       ->  SetOp Intersect  
(cost=46291.79..46292.02 rows=5 width=4) (actual time=179.712..229.030 
rows=1653 loops=1)
                                             ->  Sort  (cost=46291.79..46291.90 
rows=46 width=4) (actual time=179.695..201.068 rows=19529 loops=1)
                                                   Sort Key: "*SELECT* 
1".context_key
                                                   Sort Method:  quicksort  
Memory: 1684kB
                                                   ->  Append  
(cost=0.00..46290.52 rows=46 width=4) (actual time=0.065..150.644 rows=19529 
loops=1)
                                                         ->  Subquery Scan 
"*SELECT* 1"  (cost=0.00..23145.26 rows=23 width=4) (actual time=0.063..22.643 
rows=4003 loops=1)
                                                               ->  Nested Loop  
(cost=0.00..23145.03 rows=23 width=4) (actual time=0.061..14.740 rows=4003 
loops=1)
                                                                     ->  Index 
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual 
time=0.037..0.039 rows=1 loops=1)
                                                                           
Index Cond: ((word)::text = 'insider'::text)
                                                                     ->  Index 
Scan using article_words_wc on article_words  (cost=0.00..22987.86 rows=12136 
width=8) (actual time=0.017..6.754 rows=4003 loops=1)
                                                                           
Index Cond: (public.article_words.word_key = public.words.word_key)
                                                         ->  Subquery Scan 
"*SELECT* 2"  (cost=0.00..23145.26 rows=23 width=4) (actual time=0.045..88.083 
rows=15526 loops=1)
                                                               ->  Nested Loop  
(cost=0.00..23145.03 rows=23 width=4) (actual time=0.042..56.956 rows=15526 
loops=1)
                                                                     ->  Index 
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual 
time=0.024..0.027 rows=1 loops=1)
                                                                           
Index Cond: ((word)::text = 'trading'::text)
                                                                     ->  Index 
Scan using article_words_wc on article_words  (cost=0.00..22987.86 rows=12136 
width=8) (actual time=0.013..26.043 rows=15526 loops=1)
                                                                           
Index Cond: (public.article_words.word_key = public.words.word_key)
                                 ->  Index Scan using article_key_idx on 
articles  (cost=0.00..4.44 rows=1 width=4) (actual time=0.016..0.019 rows=1 
loops=1653)
                                       Index Cond: (articles.context_key = 
"IN_subquery".context_key)
                                       Filter: articles.indexed
                           ->  Index Scan using contexts_pkey on contexts  
(cost=0.00..4.39 rows=1 width=4) (actual time=0.011..0.014 rows=1 loops=1473)
                                 Index Cond: (contexts.context_key = 
articles.context_key)
                     ->  Nested Loop  (cost=0.00..111660.42 rows=1266104 
width=4) (actual time=0.018..300.036 rows=39201 loops=1473)
                           ->  Seq Scan on bp_categories  (cost=0.00..1315.59 
rows=16645 width=4) (actual time=0.008..57.440 rows=14530 loops=1473)
                                 Filter: (lower(category) = 'law'::text)
                           ->  Index Scan using virtual_ancestor_key_idx on 
virtual_ancestors a  (cost=0.00..5.18 rows=116 width=8) (actual 
time=0.005..0.009 rows=3 loops=21402942)
                                 Index Cond: (a.ancestor_key = 
bp_categories.context_key)
 Total runtime: 502493.383 ms
 Nested Loop  (cost=180190.54..180221.68 rows=2 width=4) (actual 
time=7300.081..7317.508 rows=622 loops=1)
   ->  Nested Loop  (cost=180190.54..180212.88 rows=2 width=20) (actual 
time=7300.061..7309.989 rows=622 loops=1)
         ->  HashAggregate  (cost=180190.54..180190.59 rows=5 width=16) (actual 
time=7300.017..7300.959 rows=726 loops=1)
               ->  Hash IN Join  (cost=167273.49..180190.53 rows=5 width=16) 
(actual time=6584.948..7298.849 rows=726 loops=1)
                     Hash Cond: (public.contexts.context_key = a.context_key)
                     ->  Hash Join  (cost=46810.38..54775.93 rows=2 width=12) 
(actual time=246.691..1176.241 rows=1635 loops=1)
                           Hash Cond: (matview_82034.context_key = 
public.contexts.context_key)
                           ->  Seq Scan on matview_82034  (cost=0.00..6322.20 
rows=438220 width=4) (actual time=0.015..448.607 rows=438220 loops=1)
                           ->  Hash  (cost=46810.32..46810.32 rows=5 width=8) 
(actual time=244.912..244.912 rows=1653 loops=1)
                                 ->  Nested Loop  (cost=46788.03..46810.32 
rows=5 width=8) (actual time=177.216..242.771 rows=1653 loops=1)
                                       ->  Subquery Scan "IN_subquery"  
(cost=46788.03..46788.31 rows=5 width=4) (actual time=177.189..223.965 
rows=1653 loops=1)
                                             ->  SetOp Intersect  
(cost=46788.03..46788.26 rows=5 width=4) (actual time=177.185..220.717 
rows=1653 loops=1)
                                                   ->  Sort  
(cost=46788.03..46788.15 rows=46 width=4) (actual time=177.167..196.638 
rows=19530 loops=1)
                                                         Sort Key: "*SELECT* 
1".context_key
                                                         Sort Method:  
quicksort  Memory: 1684kB
                                                         ->  Append  
(cost=0.00..46786.76 rows=46 width=4) (actual time=0.070..148.234 rows=19530 
loops=1)
                                                               ->  Subquery 
Scan "*SELECT* 1"  (cost=0.00..23393.38 rows=23 width=4) (actual 
time=0.068..23.265 rows=    4003 loops=1)
                                                                     ->  Nested 
Loop  (cost=0.00..23393.15 rows=23 width=4) (actual time=0.066..15.061 
rows=4003 loop    s=1)
                                                                           ->  
Index Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual 
time=0.    040..0.043 rows=1 loops=1)
                                                                                
 Index Cond: ((word)::text = 'insider'::text)
                                                                           ->  
Index Scan using article_words_wc on article_words  (cost=0.00..23234.38 
rows=12264 wi    dth=8) (actual time=0.019..7.065 rows=4003 loops=1)
                                                                                
 Index Cond: (public.article_words.word_key = public.words.word_key)
                                                               ->  Subquery 
Scan "*SELECT* 2"  (cost=0.00..23393.38 rows=23 width=4) (actual 
time=0.059..87.614 rows=    15527 loops=1)
                                                                     ->  Nested 
Loop  (cost=0.00..23393.15 rows=23 width=4) (actual time=0.055..56.982 
rows=15527 loo    ps=1)
                                                                           ->  
Index Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual 
time=0.    032..0.036 rows=1 loops=1)
                                                                                
 Index Cond: ((word)::text = 'trading'::text)
                                                                           ->  
Index Scan using article_words_wc on article_words  (cost=0.00..23234.38 
rows=12264 wi    dth=8) (actual time=0.016..26.146 rows=15527 loops=1)
                                                                                
 Index Cond: (public.article_words.word_key = public.words.word_key)
                                       ->  Index Scan using contexts_pkey on 
contexts  (cost=0.00..4.39 rows=1 width=4) (actual time=0.006..0.007 rows=1 
loops=1653)
                                             Index Cond: 
(public.contexts.context_key = "IN_subquery".context_key)
                     ->  Hash  (cost=99680.28..99680.28 rows=1266706 width=4) 
(actual time=6107.125..6107.125 rows=48701 loops=1)
                           ->  Merge Join  (cost=17126.27..99680.28 
rows=1266706 width=4) (actual time=1130.265..6041.874 rows=48701 loops=1)
                                 Merge Cond: (bp_categories.context_key = 
a.ancestor_key)
                                 ->  Index Scan using 
bp_categories_context_key_idx on bp_categories  (cost=0.00..2543.46 rows=16625 
width=4) (actual time=0.027..102    .464 rows=16670 loops=1)
                                       Filter: (lower(category) = 'law'::text)
                                 ->  Index Scan using virtual_ancestor_key_idx 
on virtual_ancestors a  (cost=0.00..79159.65 rows=2297502 width=8) (actual 
time=0.105.    .3400.891 rows=2286890 loops=1)
         ->  Index Scan using article_key_idx on articles  (cost=0.00..4.44 
rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=726)
               Index Cond: (articles.context_key = public.contexts.context_key)
               Filter: articles.indexed
   ->  Index Scan using contexts_pkey on contexts  (cost=0.00..4.39 rows=1 
width=4) (actual time=0.007..0.008 rows=1 loops=622)
         Index Cond: (public.contexts.context_key = articles.context_key)
 Total runtime: 7318.618 ms
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to