Fw: [PERFORM] Getting rid of a seq scan in query on a large table

Denis de Bernardy Mon, 27 Jun 2011 16:01:16 -0700


----- Forwarded Message -----
>From: Denis de Bernardy <ddeberna...@yahoo.com>
>To: Jens Hoffrichter <jens.hoffrich...@gmail.com>
>Sent: Tuesday, June 28, 2011 12:59 AM
>Subject: Re: [PERFORM] Getting rid of a seq scan in query on a large table
>
>
>> Hash Cond: (posts.poster_id = posters.poster_id)
>
>>                     ->  Seq Scan on posts  (cost=0.00..11862.12 rows=112312 
>> width=24) (actual time=0.019..60.092 rows=112312 loops=1)
>
>
>Unless I am mistaking, you've very few poster ids in there (since the two rows 
>arguments are equal). The Postgres planner will identify this and just seq 
>scan the whole thing instead of bothering to randomly access the rows one by 
>one using the index. This looks like a case where you actually do not want it 
>to use an index scan -- doing so will be slower.
>
>
>D
>
>
>
>
>
>
>
>>________________________________
>>From: Jens Hoffrichter <jens.hoffrich...@gmail.com>
>>To: pgsql-performance@postgresql.org
>>Sent: Monday, June 27, 2011 2:46 PM
>>Subject: [PERFORM] Getting rid of a seq scan in query on a large table
>>
>>
>>Hi everyone,
>>
>>
>>I'm having trouble getting rid of a sequential scan on a table with roughly 
>>120k entries it. Creation of an index on that particular column which 
>>triggers the sequential scan doesn't do anything, VACUUM and ANALYZE has been 
>>done on the table.
>>
>>
>>The table in question has the following definition:
>>
>>
>>       Column       |           Type           |                            
>>Modifiers
>>--------------------+--------------------------+------------------------------------------------------------------
>> post_id            | bigint                   | not null default 
>>nextval('posts_post_id_seq'::regclass)
>> forum_id           | bigint                   | not null
>> threadlink         | character varying(255)   | not null
>> timestamp          | timestamp with time zone | not null
>> poster_id          | bigint                   |
>> thread_id          | bigint                   | not null
>> subject            | text                     | not null
>> text               | text                     | not null
>> postername         | character varying(255)   |
>> internal_post_id   | bigint                   | not null default 
>>nextval('posts_internal_post_id_seq'::regclass)
>> internal_thread_id | bigint                   |
>>Indexes:
>>    "posts_pkey" PRIMARY KEY, btree (internal_post_id)
>>    "posts_forum_id_key" UNIQUE, btree (forum_id, post_id)
>>    "idx_internal_thread_id" btree (internal_thread_id)
>>    "idx_posts_poster_id" btree (poster_id)
>>Foreign-key constraints:
>>    "posts_forum_id_fkey" FOREIGN KEY (forum_id) REFERENCES forums(forum_id)
>>    "posts_internal_thread_id_fkey" FOREIGN KEY (internal_thread_id) 
>>REFERENCES threads(internal_thread_id)
>>    "posts_poster_id_fkey" FOREIGN KEY (poster_id) REFERENCES 
>>posters(poster_id)
>>
>>
>>The query is this:
>>
>>
>>SELECT threads.internal_thread_id AS threads_internal_thread_id, 
>>threads.forum_id AS threads_forum_id, threads.thread_id AS threads_thread_id, 
>>threads.title AS threads_title, threads.poster_id AS threads_poster_id, 
>>threads.postername AS threads_postername, threads.category AS 
>>threads_category, threads.posttype AS threads_posttype                        
>>                                                                              
>>                                                     FROM threads JOIN posts 
>>ON threads.internal_thread_id = posts.internal_thread_id JOIN posters ON 
>>posts.poster_id = posters.poster_id JOIN posters_groups AS posters_groups_1 
>>ON posters.poster_id = posters_groups_1.poster_id JOIN groups ON 
>>groups.group_id = posters_groups_1.group_id WHERE groups.group_id = 4 ORDER 
>>BY posts.timestamp DESC;
>>
>>
>>The query plan (with an explain analyze) gives me the following:
>>
>>
>>                                                                              
>>QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sort  (cost=13995.93..14006.63 rows=4279 width=108) (actual 
>>time=79.927..79.947 rows=165 loops=1)
>>   Sort Key: posts."timestamp"
>>   Sort Method:  quicksort  Memory: 50kB
>>   ->  Nested Loop  (cost=6.97..13737.84 rows=4279 width=108) (actual 
>>time=0.605..79.693 rows=165 loops=1)
>>         ->  Seq Scan on groups  (cost=0.00..1.05 rows=1 width=8) (actual 
>>time=0.013..0.014 rows=1 loops=1)
>>               Filter: (group_id = 4)
>>         ->  Nested Loop  (cost=6.97..13694.00 rows=4279 width=116) (actual 
>>time=0.587..79.616 rows=165 loops=1)
>>               ->  Hash Join  (cost=6.97..12343.10 rows=4279 width=24) 
>>(actual time=0.568..78.230 rows=165 loops=1)
>>                     Hash Cond: (posts.poster_id = posters.poster_id)
>>                     ->  Seq Scan on posts  (cost=0.00..11862.12 rows=112312 
>>width=24) (actual time=0.019..60.092 rows=112312 loops=1)
>>                     ->  Hash  (cost=6.79..6.79 rows=14 width=24) (actual 
>>time=0.101..0.101 rows=14 loops=1)
>>                           ->  Hash Join  (cost=2.14..6.79 rows=14 width=24) 
>>(actual time=0.060..0.093 rows=14 loops=1)
>>                                 Hash Cond: (posters.poster_id = 
>>posters_groups_1.poster_id)
>>                                 ->  Seq Scan on posters  (cost=0.00..3.83 
>>rows=183 width=8) (actual time=0.006..0.023 rows=185 loops=1)
>>                                 ->  Hash  (cost=1.96..1.96 rows=14 width=16) 
>>(actual time=0.025..0.025 rows=14 loops=1)
>>                                       ->  Seq Scan on posters_groups 
>>posters_groups_1  (cost=0.00..1.96 rows=14 width=16) (actual 
>>time=0.016..0.021 rows=14 loops=1)
>>                                             Filter: (group_id = 4)
>>               ->  Index Scan using threads_pkey on threads  (cost=0.00..0.30 
>>rows=1 width=100) (actual time=0.006..0.007 rows=1 loops=165)
>>                     Index Cond: (threads.internal_thread_id = 
>>posts.internal_thread_id)
>> Total runtime: 80.137 ms
>>(20 rows)
>>
>>
>>So the big time lost is in this line:
>>
>>
>>Seq Scan on posts  (cost=0.00..11862.12 rows=112312 width=24) (actual 
>>time=0.019..60.092 rows=112312 loops=1)
>>
>>
>>which I can understand why it slow ;)
>>
>>
>>But I haven't yet managed to convert the Seq Scan into an Index Scan, and I'm 
>>not sure how to continue there.
>>
>>
>>As I am not a big expert on psql optimization, any input would be greatly 
>>appreciated.
>>
>>
>>Best regards,
>>Jens
>>
>>
>
>
Fw: [PERFORM] Getting rid of a seq scan in query on a large table

Reply via email to