Hi Neyman, thank you for your answer. Unfortunately this query runs almost at the same time:
Sort (cost=4877693.98..4877702.60 rows=3449 width=20) (actual time=25820.291..25821.845 rows=1640 loops=1) Sort Key: tt.tweet_id Sort Method: quicksort Memory: 97kB Buffers: shared hit=1849 read=32788 -> Nested Loop (cost=247.58..4877491.32 rows=3449 width=20) (actual time=486.839..25814.120 rows=1640 loops=1) Buffers: shared hit=1849 read=32788 -> Hash Semi Join (cost=229.62..88553.23 rows=1681 width=8) (actual time=431.654..13209.159 rows=597 loops=1) Hash Cond: (t.user_id = relationship.followed_id) Buffers: shared hit=3 read=31870 -> Index Scan using tweet_creation_time_index on tweet t (cost=0.57..83308.25 rows=1781234 width=16) (actual time=130.144..10037.764 rows=1759645 loops=1) Index Cond: ((creation_time >= '2013-05-05 00:00:00-03'::timestamp with time zone) AND (creation_time <= '2013-05-06 00:00:00-03'::timestamp with time zone)) Buffers: shared hit=1 read=31867 -> Hash (cost=227.12..227.12 rows=154 width=8) (actual time=94.365..94.365 rows=106 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 3kB Buffers: shared hit=2 read=3 -> Index Only Scan using relationship_id on relationship (cost=0.42..227.12 rows=154 width=8) (actual time=74.540..94.101 rows=106 loops=1) Index Cond: (follower_id = 335093362) Heap Fetches: 0 Buffers: shared hit=2 read=3 -> Bitmap Heap Scan on tweet_topic tt (cost=17.96..2841.63 rows=723 width=20) (actual time=21.014..21.085 rows=3 loops=597) Recheck Cond: (tweet_id = t.id) Buffers: shared hit=1846 read=918 -> Bitmap Index Scan on tweet_topic_pk (cost=0.00..17.78 rows=723 width=0) (actual time=15.012..15.012 rows=3 loops=597) Index Cond: (tweet_id = t.id) Buffers: shared hit=1763 read=632 Total runtime: 25823.386 ms I have noticed that in both queries the index scan on tweet_creation_time_index is very expensive. Is there anything I can do to make the planner choose a index only scan? Thank you, Caio On Mon, Nov 4, 2013 at 6:52 PM, Igor Neyman <iney...@perceptron.com> wrote: > > > From: pgsql-performance-ow...@postgresql.org [mailto: > pgsql-performance-ow...@postgresql.org] On Behalf Of Caio Casimiro > Sent: Monday, November 04, 2013 3:44 PM > To: Jeff Janes > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field > > Thank you very much for your answers guys! > > On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro.lis...@gmail.com > > wrote: > Hello all, > > I have one query running at ~ 7 seconds and I would like to know if it's > possible to make it run faster, once this query runs lots of time in my > experiment. > > > Do you mean you want it to be fast because it runs many times, or that you > want it to become fast after it runs many times (i.e. once the data is > fully cached)? The plan you show takes 24 seconds, not 7 seconds. > > I want it to be fast because it runs many times. I have an experiment that > evaluates recommendation algorithms for a set of twitter users. This query > returns recommendation candidates so it is called a lot of times for > different users and time intervals. > > > > Basically the query return the topics of tweets published by users that > the user N follows and that are published between D1 and D2. > > Query: > > SELECT tt.tweet_id, tt.topic, tt.topic_value > FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = > t.id > WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in > (SELECT followed_id FROM relationship WHERE follower_id = N) > ORDER BY tt.tweet_id; > > > I don't know if this affects the plan at all, but it is silly to do a left > join to "tweet" when the WHERE clause has conditions that can't be > satisfied with a null row. Also, you could try changing the IN-list to an > EXISTS subquery. > > I'm sorry the ignorance, but I don't understand the issue with the left > join, could you explain more? > ........................................... > Thank you very much again! > Caio > > > Just try the following: > > SELECT tt.tweet_id, tt.topic, tt.topic_value > FROM tweet_topic AS tt JOIN tweet AS t ON (tt.tweet_id = t.id > AND t.creation_time > BETWEEN 'D1' AND 'D2' AND t.user_id in > (SELECT followed_id FROM > relationship WHERE follower_id = N)) > ORDER BY tt.tweet_id; > > And see if it helps with performance. > > Regards, > Igor Neyman > >