Bryce Nesbitt <bry...@obviously.com> writes:
> The query plans are now attached (sorry I did not start there: many
> lists reject attachments). Or you can click on "text" at the query
> planner analysis site http://explain.depesz.com/s/qYq

At least some of the problem is the terrible quality of the rowcount
estimates in the IN subquery, as you extracted here:

>  Nested Loop  (cost=0.00..23393.15 rows=23 width=4) (actual 
> time=0.077..15.637 rows=4003 loops=1)
>    ->  Index Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) 
> (actual time=0.049..0.051 rows=1 loops=1)
>          Index Cond: ((word)::text = 'insider'::text)
>    ->  Index Scan using article_words_wc on article_words 
> (cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237 rows=4003 
> loops=1)
>          Index Cond: (article_words.word_key = words.word_key)
>  Total runtime: 19.776 ms

Given that it estimated 1 row out of "words" (quite correctly) and 12264
rows out of each scan on article_words, you'd think that the join size
estimate would be 12264, which would be off by "only" a factor of 3 from
the true result.  Instead it's 23, off by a factor of 200 :-(.

Running a roughly similar test case here, I see that 8.4 gives
significantly saner estimates, which I think is because of this patch:
http://archives.postgresql.org/pgsql-committers/2008-10/msg00191.php

At the time I didn't want to risk back-patching it, because there
were a lot of other changes in the same general area in 8.4.  But
it would be interesting to see what happens with your example if
you patch 8.3 similarly.  (Note: I think only the first diff hunk
is relevant to 8.3.)

                        regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to