Hi, (PostgreSQL 8.3)
I'm trying to optimize one of the most often used queries in our system: (Full minimized pastable schema and data below.) create table feeds_users ( user_id int references users(id) not null, feed_id int references feeds(id) not null, unique(user_id, feed_id) ); create table items ( id serial primary key, feed_id int references feeds(id) not null, title text, pub_date timestamp ); create index items_feed_id_idx on items(feed_id); create index items_pub_date_idx on items(pub_date); create index items_pub_date_feed_id_idx on items(pub_date, feed_id); create index feeds_users_feed_id on feeds_users(feed_id); -- Query variant 1: EXPLAIN ANALYZE SELECT i.* FROM items i WHERE feed_id IN ( SELECT f.feed_id FROM feeds_users f WHERE f.user_id = ?) ORDER BY pub_date DESC LIMIT 20 OFFSET 100; -- Query variant 2: EXPLAIN ANALYZE SELECT i.* FROM items i JOIN feeds_users f ON i.feed_id = f.feed_id AND f.user_id = ? ORDER BY pub_date DESC LIMIT 20 OFFSET 100; The table items contains 700000 rows, feeds_users 99000, feeds 10000 and users 1000. The number of feeds for each user is distributed logarithmically, i.e. there are many users with none to little feeds and some users with many feeds. In reality, 99% of the rows in items are being inserted in pub_date order, and the correlation of user_id in feeds_users is not 1 (it is 1 with the test data). I need this query to work blisteringly fast for the common case, and at least not too slow for extreme cases. Extreme cases are: * no feeds for a user * very little feeds for a user, with the top 20 items spread over >10% of table items * normal number of feeds for a user, but big offset (1000 or 10000 or 100000). The maximum offset could be capped in the application if needed, but a solution without that would be preferred. The common case is that the needed rows are within the first (by pub_date desc) <1% of items. I ran some tests of both query variants on a Pentium M 1.6 GHz notebook with 1280 MB RAM, shared_buffers = 32MB, temp_buffers 8MB, work_mem 8MB. Three different user_ids were used for testing; the needed rows for each user are either 1) not existant, 2) spread over 18% of the table, 3) spread over 0.064% of the table. Also I tried a statistics target of 10 and 100 for the two columns in feeds_users. Two query variants were used, one with an inner join and one with IN. I got 4 different plans all in all. Results: no. stat user_id item rows result rows variant plan time target scanned w/o limit query 1 10 3 700000 0 in 1 20000 ms 2 join 2 15000 ms 3 49 46855 (18%) 630 in 1 2300 ms 4 join 2 2300 ms 5 109 448 (0.064%) 206780 in 1 6 ms 6 join 2 9 ms 7 100 3 700000 0 in 3 0.2 ms 8 join 2 16500 ms 9 49 46855 (18%) 630 in 4 10 ms 10 join 2 2300 ms 11 109 448 (0.064%) 206780 in 1 6 ms 12 join 2 9 ms Plans below. Now the questions: Do the differences in characteristics of the test data and the real data somehow invalidate these numbers? I observe, that the query variant with IN is faster in all cases. What's the difference in them that leads to plans being chosen that differ so much performance-wise? Can I somehow trigger the items_pub_date_feed_id_idx to be used? ISTM that scanning by that index in pub_date desc order and using that same index to test for a needed feed_id would be faster than accessing the heap for each tuple. With a statistics target of 100, in queries no 3 and 9 a different, a very much faster plan was chosen. How is the statistics target to be determined such that the faster plan is chosen? Am I going to have to increase the statistics target as one or more table receive more rows? Thanks Markus Plans: 1 (for no 3) Limit (cost=1304.78..1565.74 rows=20 width=27) (actual time= 2121.866..2377.740 rows=20 loops=1) -> Nested Loop IN Join (cost=0.00..57984.39 rows=4444 width=27) (actual time=9.856..2377.421 rows=120 loops=1) -> Index Scan Backward using items_pub_date_idx on items i (cost= 0.00..37484.20 rows=700071 width=27) (actual time=0.131..1152.933rows=127337 loops=1) -> Index Scan using feeds_users_user_id_key on feeds_users f (cost=0.00..0.29 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=127337) Index Cond: ((f.user_id = 49) AND (f.feed_id = i.feed_id)) Total runtime: 2377.899 ms (6 rows) 2 (for no. 4) Limit (cost=542.78..651.33 rows=20 width=27) (actual time= 2133.759..2393.259 rows=20 loops=1) -> Nested Loop (cost=0.00..249705.41 rows=46005 width=27) (actual time= 24.087..2392.950 rows=120 loops=1) -> Index Scan Backward using items_pub_date_idx on items i (cost= 0.00..37484.20 rows=700071 width=27) (actual time=0.067..1171.572rows=127337 loops=1) -> Index Scan using feeds_users_user_id_key on feeds_users f (cost=0.00..0.29 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=127337) Index Cond: ((f.user_id = 49) AND (f.feed_id = i.feed_id)) Total runtime: 2393.392 ms 3 (for no. 7) Limit (cost=2227.06..2227.11 rows=20 width=27) (actual time=0.052..0.052rows=0 loops=1) -> Sort (cost=2226.81..2229.43 rows=1048 width=27) (actual time= 0.047..0.047 rows=0 loops=1) Sort Key: i.pub_date Sort Method: quicksort Memory: 17kB -> Nested Loop (cost=236.45..2185.37 rows=1048 width=27) (actual time=0.036..0.036 rows=0 loops=1) -> HashAggregate (cost=231.72..231.87 rows=15 width=4) (actual time=0.032..0.032 rows=0 loops=1) -> Index Scan using feeds_users_user_id_key on feeds_users f (cost=0.00..231.35 rows=148 width=4) (actual time= 0.027..0.027 rows=0 loops=1) Index Cond: (user_id = 3) -> Bitmap Heap Scan on items i (cost=4.73..129.30 rows=75 width=27) (never executed) Recheck Cond: (i.feed_id = f.feed_id) -> Bitmap Index Scan on items_feed_id_idx (cost= 0.00..4.71 rows=75 width=0) (never executed) Index Cond: (i.feed_id = f.feed_id) Total runtime: 0.136 ms 4 (for no. 9) Limit (cost=2227.06..2227.11 rows=20 width=27) (actual time=8.806..8.906rows=20 loops=1) -> Sort (cost=2226.81..2229.43 rows=1048 width=27) (actual time= 8.456..8.662 rows=120 loops=1) Sort Key: i.pub_date Sort Method: top-N heapsort Memory: 25kB -> Nested Loop (cost=236.45..2185.37 rows=1048 width=27) (actual time=0.225..6.142 rows=630 loops=1) -> HashAggregate (cost=231.72..231.87 rows=15 width=4) (actual time=0.104..0.126 rows=9 loops=1) -> Index Scan using feeds_users_user_id_key on feeds_users f (cost=0.00..231.35 rows=148 width=4) (actual time= 0.037..0.062 rows=9 loops=1) Index Cond: (user_id = 49) -> Bitmap Heap Scan on items i (cost=4.73..129.30 rows=75 width=27) (actual time=0.076..0.369 rows=70 loops=9) Recheck Cond: (i.feed_id = f.feed_id) -> Bitmap Index Scan on items_feed_id_idx (cost= 0.00..4.71 rows=75 width=0) (actual time=0.046..0.046 rows=70 loops=9) Index Cond: (i.feed_id = f.feed_id) Total runtime: 9.061 ms -- Full pastable schema and data generation: create table feeds ( id serial primary key, title text ); create table users ( id serial primary key, name text ); create table feeds_users ( user_id int references users(id) not null, feed_id int references feeds(id) not null, unique(user_id, feed_id) ); create table items ( id serial primary key, feed_id int references feeds(id) not null, title text, pub_date timestamp ); insert into users (name) select 'User ' || i::text as name from generate_series(1, 1000) as i; insert into feeds (title) select 'Feed ' || i::text as name from generate_series(1, 10000) as i; insert into feeds_users (user_id, feed_id) select --(i / 100) + 1 as user_id, floor(log(i)/log(1.1))+1 as user_id, ((i + floor(log(i)/log(1.1)))::int % 10000) + 1 as feed_id from generate_series(1, 99000) as i; insert into items (feed_id, title, pub_date) select ((i * 17) % 10000) + 1 as feed_id, 'Item ' || i::text as title, '12/12/2006'::timestamp + cast(((i * 547) % 12343)::text || ' hours' as interval) + cast((random()*60)::numeric(6,3)::text || ' minutes' as interval) as pub_date from generate_series(1, 700000) as i; create index items_feed_id_idx on items(feed_id); create index items_pub_date_idx on items(pub_date); create index items_pub_date_feed_id_idx on items(pub_date, feed_id); create index feeds_users_feed_id on feeds_users(feed_id); analyze; -- later alter table feeds_users alter column feed_id set statistics 100; alter table feeds_users alter column user_id set statistics 100; analyze;