----- Forwarded Message -----
>From: Denis de Bernardy <ddeberna...@yahoo.com>
>To: Jens Hoffrichter <jens.hoffrich...@gmail.com>
>Sent: Tuesday, June 28, 2011 12:59 AM
>Subject: Re: [PERFORM] Getting rid of a seq scan in query on a large table
>
>
>> Hash Cond: (posts.poster_id = posters.poster_id)
>
>> -> Seq Scan on posts (cost=0.00..11862.12 rows=112312
>> width=24) (actual time=0.019..60.092 rows=112312 loops=1)
>
>
>Unless I am mistaking, you've very few poster ids in there (since the two rows
>arguments are equal). The Postgres planner will identify this and just seq
>scan the whole thing instead of bothering to randomly access the rows one by
>one using the index. This looks like a case where you actually do not want it
>to use an index scan -- doing so will be slower.
>
>
>D
>
>
>
>
>
>
>
>>________________________________
>>From: Jens Hoffrichter <jens.hoffrich...@gmail.com>
>>To: pgsql-performance@postgresql.org
>>Sent: Monday, June 27, 2011 2:46 PM
>>Subject: [PERFORM] Getting rid of a seq scan in query on a large table
>>
>>
>>Hi everyone,
>>
>>
>>I'm having trouble getting rid of a sequential scan on a table with roughly
>>120k entries it. Creation of an index on that particular column which
>>triggers the sequential scan doesn't do anything, VACUUM and ANALYZE has been
>>done on the table.
>>
>>
>>The table in question has the following definition:
>>
>>
>> Column | Type |
>>Modifiers
>>--------------------+--------------------------+------------------------------------------------------------------
>> post_id | bigint | not null default
>>nextval('posts_post_id_seq'::regclass)
>> forum_id | bigint | not null
>> threadlink | character varying(255) | not null
>> timestamp | timestamp with time zone | not null
>> poster_id | bigint |
>> thread_id | bigint | not null
>> subject | text | not null
>> text | text | not null
>> postername | character varying(255) |
>> internal_post_id | bigint | not null default
>>nextval('posts_internal_post_id_seq'::regclass)
>> internal_thread_id | bigint |
>>Indexes:
>> "posts_pkey" PRIMARY KEY, btree (internal_post_id)
>> "posts_forum_id_key" UNIQUE, btree (forum_id, post_id)
>> "idx_internal_thread_id" btree (internal_thread_id)
>> "idx_posts_poster_id" btree (poster_id)
>>Foreign-key constraints:
>> "posts_forum_id_fkey" FOREIGN KEY (forum_id) REFERENCES forums(forum_id)
>> "posts_internal_thread_id_fkey" FOREIGN KEY (internal_thread_id)
>>REFERENCES threads(internal_thread_id)
>> "posts_poster_id_fkey" FOREIGN KEY (poster_id) REFERENCES
>>posters(poster_id)
>>
>>
>>The query is this:
>>
>>
>>SELECT threads.internal_thread_id AS threads_internal_thread_id,
>>threads.forum_id AS threads_forum_id, threads.thread_id AS threads_thread_id,
>>threads.title AS threads_title, threads.poster_id AS threads_poster_id,
>>threads.postername AS threads_postername, threads.category AS
>>threads_category, threads.posttype AS threads_posttype
>>
>> FROM threads JOIN posts
>>ON threads.internal_thread_id = posts.internal_thread_id JOIN posters ON
>>posts.poster_id = posters.poster_id JOIN posters_groups AS posters_groups_1
>>ON posters.poster_id = posters_groups_1.poster_id JOIN groups ON
>>groups.group_id = posters_groups_1.group_id WHERE groups.group_id = 4 ORDER
>>BY posts.timestamp DESC;
>>
>>
>>The query plan (with an explain analyze) gives me the following:
>>
>>
>>
>>QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=13995.93..14006.63 rows=4279 width=108) (actual
>>time=79.927..79.947 rows=165 loops=1)
>> Sort Key: posts."timestamp"
>> Sort Method: quicksort Memory: 50kB
>> -> Nested Loop (cost=6.97..13737.84 rows=4279 width=108) (actual
>>time=0.605..79.693 rows=165 loops=1)
>> -> Seq Scan on groups (cost=0.00..1.05 rows=1 width=8) (actual
>>time=0.013..0.014 rows=1 loops=1)
>> Filter: (group_id = 4)
>> -> Nested Loop (cost=6.97..13694.00 rows=4279 width=116) (actual
>>time=0.587..79.616 rows=165 loops=1)
>> -> Hash Join (cost=6.97..12343.10 rows=4279 width=24)
>>(actual time=0.568..78.230 rows=165 loops=1)
>> Hash Cond: (posts.poster_id = posters.poster_id)
>> -> Seq Scan on posts (cost=0.00..11862.12 rows=112312
>>width=24) (actual time=0.019..60.092 rows=112312 loops=1)
>> -> Hash (cost=6.79..6.79 rows=14 width=24) (actual
>>time=0.101..0.101 rows=14 loops=1)
>> -> Hash Join (cost=2.14..6.79 rows=14 width=24)
>>(actual time=0.060..0.093 rows=14 loops=1)
>> Hash Cond: (posters.poster_id =
>>posters_groups_1.poster_id)
>> -> Seq Scan on posters (cost=0.00..3.83
>>rows=183 width=8) (actual time=0.006..0.023 rows=185 loops=1)
>> -> Hash (cost=1.96..1.96 rows=14 width=16)
>>(actual time=0.025..0.025 rows=14 loops=1)
>> -> Seq Scan on posters_groups
>>posters_groups_1 (cost=0.00..1.96 rows=14 width=16) (actual
>>time=0.016..0.021 rows=14 loops=1)
>> Filter: (group_id = 4)
>> -> Index Scan using threads_pkey on threads (cost=0.00..0.30
>>rows=1 width=100) (actual time=0.006..0.007 rows=1 loops=165)
>> Index Cond: (threads.internal_thread_id =
>>posts.internal_thread_id)
>> Total runtime: 80.137 ms
>>(20 rows)
>>
>>
>>So the big time lost is in this line:
>>
>>
>>Seq Scan on posts (cost=0.00..11862.12 rows=112312 width=24) (actual
>>time=0.019..60.092 rows=112312 loops=1)
>>
>>
>>which I can understand why it slow ;)
>>
>>
>>But I haven't yet managed to convert the Seq Scan into an Index Scan, and I'm
>>not sure how to continue there.
>>
>>
>>As I am not a big expert on psql optimization, any input would be greatly
>>appreciated.
>>
>>
>>Best regards,
>>Jens
>>
>>
>
>