[SQL] Seq scan on join, not on subselect? analyze this
I'm a bit confused why the query planner is not restricting my join, and not using the index. Two explain analyze statements follow. Why is the second so much better? lyell5=> select version(); PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) lyell5=> explain analyze select * from article_words join words using (word_key) where context_key=535462; ++ | QUERY PLAN | ++ | Hash Join (cost=192092.90..276920.93 rows=45327 width=17) (actual time=6020.932..60084.817 rows=777 loops=1) | | Hash Cond: (article_words.word_key = words.word_key) | | -> Index Scan using article_word_idx on article_words (cost=0.00..55960.50 rows=45327 width=8) (actual time=0.031..0.547 rows=777 loops=1) | | Index Cond: (context_key = 535462) | | -> Hash (cost=93819.62..93819.62 rows=5653462 width=13) (actual time=6020.605..6020.605 rows=5651551 loops=1) | | -> Seq Scan on words (cost=0.00..93819.62 rows=5653462 width=13) (actual time=0.006..2010.962 rows=5651551 loops=1) | | Total runtime: 60085.616 ms | ++ lyell5=> explain analyze select * from words where word_key in (select word_key from article_words where context_key=535462); +--+ | QUERY PLAN | +--+ | Nested Loop (cost=56073.81..56091.41 rows=2 width=13) (actual time=0.808..4.723 rows=777 loops=1) | | -> HashAggregate (cost=56073.81..56073.83 rows=2 width=4) (actual time=0.795..1.072 rows=777 loops=1)| | -> Index Scan using article_word_idx on article_words (cost=0.00..55960.50 rows=45327 width=4) (actual time=0.030..0.344 rows=777 loops=1) | | Index Cond: (context_key = 535462) | | -> Index Scan using words_pkey on words (cost=0.00..8.78 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=777)| | Index Cond: (words.word_key = article_words.word_key) | | Total runtime: 4.936 ms | +--+ -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Seq scan on join, not on subselect? analyze this
Dear Postgres Folks, I'm a bit confused why the query planner is not restricting my join, and not using the index. Two explain analyze statements follow. Why is the second so much better? lyell5=> vacuum analyze; lyell5=> select version(); PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) lyell5=> explain analyze select * from words where word_key in (select word_key from article_words where context_key=535462); +--+ | QUERY PLAN | +--+ | Nested Loop (cost=56073.81..56091.41 rows=2 width=13) (actual time=0.808..4.723 rows=777 loops=1) | | -> HashAggregate (cost=56073.81..56073.83 rows=2 width=4) (actual time=0.795..1.072 rows=777 loops=1)| | -> Index Scan using article_word_idx on article_words (cost=0.00..55960.50 rows=45327 width=4) (actual time=0.030..0.344 rows=777 loops=1) | | Index Cond: (context_key = 535462) | | -> Index Scan using words_pkey on words (cost=0.00..8.78 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=777)| | Index Cond: (words.word_key = article_words.word_key) | | Total runtime: 4.936 ms | +--+ lyell5=> explain analyze select words.* from article_words join words using (word_key) where context_key=535462; ++ | QUERY PLAN | ++ | Hash Join (cost=192092.90..276920.93 rows=45327 width=17) (actual time=6020.932..60084.817 rows=777 loops=1) | | Hash Cond: (article_words.word_key = words.word_key) | | -> Index Scan using article_word_idx on article_words (cost=0.00..55960.50 rows=45327 width=8) (actual time=0.031..0.547 rows=777 loops=1) | | Index Cond: (context_key = 535462) | | -> Hash (cost=93819.62..93819.62 rows=5653462 width=13) (actual time=6020.605..6020.605 rows=5651551 loops=1) | | -> Seq Scan on words (cost=0.00..93819.62 rows=5653462 width=13) (actual time=0.006..2010.962 rows=5651551 loops=1) | | Total runtime: 60085.616 ms | ++ lyell5=> \d article_words; Table "public.article_words" +-+-+---+ | Column| Type | Modifiers | +-+-+---+ | word_key| integer | | | context_key | integer | | +-+-+---+ Indexes: "article_word_idx" btree (context_key) "article_word_key_idx" btree (word_key) CLUSTER Foreign-key constraints: "article_words_context_key_constraint" FOREIGN KEY (context_key) REFERENCES contexts(context_key) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED "article_words_word_key_constraint" FOREIGN KEY (word_key) REFERENCES words(word_key) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED lyell5=> \d words; Table "public.words" +--+++ | Column | Type | Modifiers | +--+++ | word_key |
Re: [SQL] Seq scan on join, not on subselect? analyze this
Bryce Nesbitt <[EMAIL PROTECTED]> writes: > I'm a bit confused why the query planner is not restricting my join, and > not using the index. Two explain analyze statements follow. Why are the rowcount estimates so far off? Maybe you need to increase the statistics target for this table. regards, tom lane -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Seq scan on join, not on subselect? analyze this
Tom Lane wrote: Why are the rowcount estimates so far off? Maybe you need to increase the statistics target for this table. regards, tom lane Tom, How does one tell the rowcount is off in a query plan? I've never found a great reference on interpreting the query analyze output! Upping the stats target made little difference in the estimated row count, but a huge difference in the query. It also mattered only on the context_key column, not the other columns. Why would that be? lyell5=> \pset format wrapped lyell5=> select * from pg_stats where tablename='article_words'; ++-+--+---+---+++---+-+-+ | schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | ++-+--+---+---+++---+-+-+ | public | article_wor | word_key | 0 | 4 | 6361 | {382,226,1155,2130 | {0.003,0.0023 | {3,623,1376,2074,31 | 0.0292482 | | ; ds ; ,2246,3328,8999,73 ; ,0.0023,0.002 ; 54,4601,7269,12770, | | ; 3,1760,1796} ; 3,0.0023, ; 30017,212496,582068 | | ; 0.0023,0.0023 ; 0} | | ; ,0.002,0.002, | | ; 0.002} | | public | article_wor | context_ | 0 | 4 | 4671 | {639302,113013,133 | {0.0023,0.001 | {28,42838,92697,140 | 0.867505 | | ; ds ; key ; 052,211978,508496, ; 7,0.0017, ; 684,202950,248442,3 | | ; 545123,590829,5985 ; 0.0017,0.0016 ; 38954,403025,498952 | | ; 95,649645,37658} ; 6667,0.0017,0 ; ,584048,654070} | | ; .0017,0.00166 | | ; 667,0.0017,0. | | ; 0013} | ++-+--+---+---+++---+-+-+ lyell5=> analyze verbose article_words; INFO: "article_words": scanned 3000 of 1125489 pages, containing 555000 live rows and 0 dead rows; 3000 rows in sample, 208,215,465 estimated total rows lyell5=> set default_statistics_target to 500; lyell5=> analyze verbose article_words; INFO: "article_words": scanned 15 of 1125489 pages, containing 27749820 live rows and 180 dead rows; 15 rows in sample, 208,214,114 estimated total rows But it makes a huge impact on this query: stats target sampled query duration after analyze 10 3000 80600ms 15 4500 64000ms 20 6000 4.2ms 30 9000 4.2ms 250 75000 4.2ms 500 ? 4.2ms 1000 30 4.2ms Is there any good reason not to set stats target to 1000 always? lyell5=> alter table article_words alter word_key set statistics 30;analyze verbose article_words; explain analyze select words.* from article_words join words using (word_key) where context_key=535462; INFO: "article_words": scanned 9000 of 1125489 pages, containing 1665000 live rows and 0 dead rows; 9000 rows in sample, 208215465 estimated total rows ++ | QUERY PLAN | +---