Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer skrev: But the subselect is not fast for the user with many relationships and matched rows at the beginning of the sorted large_table: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) ORDER BY created_at DESC LIMIT 10; QUERY PLAN - Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=53187.349..53187.424 rows=10 loops=1) - Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=53187.341..53187.360 rows=10 loops=1) Sort Key: lt.created_at - Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=201.728..52673.800 rows=69018 loops=1) - HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=178.777..178.966 rows=40 loops=1) - Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=47.049..178.560 rows=40 loops=1) Recheck Cond: (user_id = 5) - Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0) (actual time=28.721..28.721 rows=40 loops=1) Index Cond: (user_id = 5) - Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6834.04 rows=2268 width=621) (actual time=21.994..1301.375 rows=1725 loops=40) Index Cond: (lt.user_id = relationships.contact_id) Total runtime: 53188.584 ms Using a join now the query for mat Any ideas? It seems to me the subselect plan would benefit quite a bit from not returning all rows, but only the 10 latest for each user. I believe the problem is similar to what is discussed for UNIONs here: http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8 which got implemented here: http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565 It seems to me the planner in this case would actually need to push the limit into the nested loop, since we want the plan to take advantage of the index (using a backwards index scan). I am ready to be corrected though. You could try this (quite hackish) attempt at forcing the query planner to use this plan: SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) AND created_at in (SELECT created_at FROM large_table lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10) ORDER BY created_at DESC LIMIT 10; If that doesn't work, you might have reached the point where you need to use some kind of record-keeping system to keep track of which records to look at. /Nis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow query with backwards index scan
* Nis Jørgensen [EMAIL PROTECTED] [20070730 18:33]: It seems to me the subselect plan would benefit quite a bit from not returning all rows, but only the 10 latest for each user. I believe the problem is similar to what is discussed for UNIONs here: http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8 which got implemented here: http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565 It seems to me the planner in this case would actually need to push the limit into the nested loop, since we want the plan to take advantage of the index (using a backwards index scan). I am ready to be corrected though. If I understand that correctly than this means that it would benefit the planning for something like SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ... if any of q1 or q2 would satisfy the rows requested by limit early, instead of planning q1 and q2 without having the limit of the outer query an influence. Unfortunately I'm having problems making my q2 reasonably efficient in the first place, even before UNIONing it. You could try this (quite hackish) attempt at forcing the query planner to use this plan: SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) AND created_at in (SELECT created_at FROM large_table lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10) ORDER BY created_at DESC LIMIT 10; No for the user with many matches at the beginning: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) AND created_at IN (SELECT created_at FROM large_table lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10) ORDER BY created_at DESC LIMIT 10; QUERY PLAN Limit (cost=4.94..4.97 rows=10 width=621) (actual time=70550.549..70550.616 rows=10 loops=1) - Sort (cost=4.94..45557.46 rows=605 width=621) (actual time=70550.542..70550.569 rows=10 loops=1) Sort Key: lt.created_at - Nested Loop (cost=39.52..45527.99 rows=605 width=621) (actual time=2131.501..70548.313 rows=321 loops=1) - HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=0.406..0.615 rows=40 loops=1) - Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=0.075..0.248 rows=40 loops=1) Recheck Cond: (user_id = 5) - Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0) (actual time=0.052..0.052 rows=40 loops=1) Index Cond: (user_id = 5) - Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..45474.29 rows=1134 width=621) (actual time=1762.067..1763.637 rows=8 loops=40) Index Cond: (lt.user_id = relationships.contact_id) Filter: (subplan) SubPlan - Limit (cost=0.00..34.04 rows=10 width=8) (actual time=0.048..0.147 rows=10 loops=69018) - Index Scan Backward using large_user_id_created_at_index on large_table lt2 (cost=0.00..7721.24 rows=2268 width=8) (actual time=0.040..0.087 rows=10 loops=69018) Index Cond: (user_id = $0) Total runtime: 70550.847 ms The same plan is generated for the user with few matches and executes very fast. If that doesn't work, you might have reached the point where you need to use some kind of record-keeping system to keep track of which records to look at. Yes, I'm considering that unfortunately. Seeing however that there are 2 different queries which result in very efficient plans for one of the 2 different cases, but not the other, makes me hope there is a way to tune the planner into always coming up with the right plan. I'm not sure if I explained the problem well enough and will see if I can come up with a reproduction case with generated data. Til ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer skrev: * Nis Jørgensen [EMAIL PROTECTED] [20070730 18:33]: It seems to me the subselect plan would benefit quite a bit from not returning all rows, but only the 10 latest for each user. I believe the problem is similar to what is discussed for UNIONs here: http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8 which got implemented here: http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565 It seems to me the planner in this case would actually need to push the limit into the nested loop, since we want the plan to take advantage of the index (using a backwards index scan). I am ready to be corrected though. If I understand that correctly than this means that it would benefit the planning for something like SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ... if any of q1 or q2 would satisfy the rows requested by limit early, instead of planning q1 and q2 without having the limit of the outer query an influence. Unfortunately I'm having problems making my q2 reasonably efficient in the first place, even before UNIONing it. The second branch of your UNION is really equivalent to the following pseudo_code: contacts = SELECT contact_id FROM relations WHERE user_id = $id sql = SELECT * FROM ( SELECT * FROM lt WHERE user_id = contacts[0] UNION ALL SELECT * FROM lt WHERE user_id = contacts[1] . . . ) ORDER BY created_at LIMIT 10; Currently, it seems the subqueries are fetching all rows. Thus a plan which makes each of the subqueries aware of the LIMIT might be able to improve performance. Unlike the UNION case, it seems this means making the subqueries aware that the plan is valid, not just changing the cost estimate. How does this imperative approach perform? I realize you probably don't want to use this, but it would give us an idea whether we would be able to get the speed we need by forcing this plan on pg. If that doesn't work, you might have reached the point where you need to use some kind of record-keeping system to keep track of which records to look at. Yes, I'm considering that unfortunately. Seeing however that there are 2 different queries which result in very efficient plans for one of the 2 different cases, but not the other, makes me hope there is a way to tune the planner into always coming up with the right plan. I'm not sure if I explained the problem well enough and will see if I can come up with a reproduction case with generated data. I think the problem is that Postgresql does not have the necessary statistics to determine which of the two plans will perform well. There are basically two unknowns in the query: - How many uninteresting records do we have to scan through before get to the interesting ones (if using plan 1). - How many matching rows do we find in relations The first one it is not surprising that pg cannot estimate. I am a little surprised that pg is not able to estimate the second one better. Increasing the statistics settings might help in getting a different plan. I am also slightly surprised that the two equivalent formulations of the query yield such vastly different query plans. In my experience, pg is quite good at coming up with similar query plans for equivalent queries. You might want to fiddle with DISTINCT or indexing to make sure that they are indeed logically equivalent. Anyway, it seems likely that you will at some point run into the query for many matches at the end of the table - which none of your plans will be good at supplying. So perhaps you can just as well prepare for it now. Nis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Slow query with backwards index scan
* Nis Jørgensen [EMAIL PROTECTED] [20070727 20:31]: How does the obvious UNION query do - ie: SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 12345 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ) q ORDER BY created_at DESC LIMIT 10; Great for the user with little data: testdb=# EXPLAIN ANALYZE SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 12345 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ) q ORDER BY created_at DESC LIMIT 10; QUERY PLAN Limit (cost=14673.77..14673.80 rows=10 width=3140) (actual time=133.877..133.946 rows=10 loops=1) - Sort (cost=14673.77..14682.22 rows=3378 width=3140) (actual time=133.870..133.894 rows=10 loops=1) Sort Key: q.created_at - Unique (cost=14315.34..14442.01 rows=3378 width=622) (actual time=133.344..133.705 rows=38 loops=1) - Sort (cost=14315.34..14323.78 rows=3378 width=622) (actual time=133.337..133.432 rows=38 loops=1) Sort Key: id, user_id, plaze_id, device, started_at, updated_at, status, type, duration, permission, created_at, mac_address, subnet, msc - Append (cost=0.00..14117.35 rows=3378 width=622) (actual time=39.144..133.143 rows=38 loops=1) - Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..7243.59 rows=2158 width=622) (actual time=39.138..109.831 rows=34 loops=1) Index Cond: (user_id = 12345) - Nested Loop (cost=42.78..6839.98 rows=1220 width=622) (actual time=14.859..23.112 rows=4 loops=1) - HashAggregate (cost=42.78..42.79 rows=1 width=4) (actual time=8.092..8.095 rows=1 loops=1) - Bitmap Heap Scan on relationships (cost=4.34..42.75 rows=11 width=4) (actual time=8.067..8.070 rows=1 loops=1) Recheck Cond: (user_id = 12345) - Bitmap Index Scan on relationships_user_id_index (cost=0.00..4.34 rows=11 width=0) (actual time=8.057..8.057 rows=1 loops=1) Index Cond: (user_id = 12345) - Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6768.48 rows=2297 width=622) (actual time=6.751..14.970 rows=4 loops=1) Index Cond: (lt.user_id = relationships.contact_id) Total runtime: 134.220 ms Not so great for the user with many early matches: testdb=# EXPLAIN ANALYZE SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 5 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5) ) q ORDER BY created_at DESC LIMIT 10; QUERY PLAN Limit (cost=14673.77..14673.80 rows=10 width=3140) (actual time=3326.304..3326.367 rows=10 loops=1) - Sort (cost=14673.77..14682.22 rows=3378 width=3140) (actual time=3326.297..3326.318 rows=10 loops=1) Sort Key: q.created_at - Unique (cost=14315.34..14442.01 rows=3378 width=622) (actual time=2413.070..3019.385 rows=69757 loops=1) - Sort (cost=14315.34..14323.78 rows=3378 width=622) (actual time=2413.062..2590.354 rows=69757 loops=1) Sort Key: id, user_id, plaze_id, device, started_at, updated_at, status, type, duration, permission, created_at, mac_address, subnet, msc - Append (cost=0.00..14117.35 rows=3378 width=622) (actual time=0.067..1911.626 rows=69757 loops=1) - Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..7243.59 rows=2158 width=622) (actual time=0.062..3.440 rows=739 loops=1) Index Cond: (user_id = 5) - Nested Loop (cost=42.78..6839.98 rows=1220 width=622) (actual time=0.451..1557.901 rows=69018 loops=1) - HashAggregate (cost=42.78..42.79 rows=1 width=4) (actual time=0.404..0.580 rows=40 loops=1)
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer [EMAIL PROTECTED] wrote .. * Nis Jørgensen [EMAIL PROTECTED] [20070727 20:31]: How does the obvious UNION query do - ie: SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 12345 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ) q ORDER BY created_at DESC LIMIT 10; Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow query with backwards index scan
* [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION Would that make sense at all given the way the postgresql planner works? Does that work in other DB's? Til ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer wrote: * [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION If I'm reading this documentation correctly, it *is* possible, as long as they're inside of a sub-select, as in this case. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer wrote: * [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow query with backwards index scan
As other posters have pointed out, you can overcome the ORDER BY/LIMIT restriction on UNIONs with parentheses. I think I misbalanced the parentheses in my original post, which would have caused an error if you just copied and pasted. I don't think the limitation has to do with planning--just parsing the query. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Slow query with backwards index scan
Dear list, I am having problems selecting the 10 most recent rows from a large table (4.5M rows), sorted by a date column of that table. The large table has a column user_id which either should match a given user_id, or should match the column contact_id in a correlated table where the user_id of that correlated table must match the given user_id. The user_id values in the large table are distributed in such a way that in the majority of cases for a given user_id 10 matching rows can be found very early when looking at the table sorted by the date - propably within the first 1%. Sometimes the given user_id however would match any rows only very far towards the end of the sorted large table, or not at all. The query works fine for the common cases when matching rows are found early in the sorted large table, like this: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt LEFT JOIN relationships r ON lt.user_id=r.contact_id WHERE r.user_id = 5 OR lt.user_id = 5 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN -- Limit (cost=0.00..33809.31 rows=10 width=646) (actual time=0.088..69.448 rows=10 loops=1) - Nested Loop Left Join (cost=0.00..156983372.66 rows=46432 width=646) (actual time=0.082..69.393 rows=10 loops=1) Filter: ((r.user_id = 5) OR (lt.user_id = 5)) - Index Scan Backward using large_created_at_index on large_table lt (cost=0.00..914814.94 rows=4382838 width=622) (actual time=0.028..0.067 rows=13 loops=1) - Index Scan using relationships_contact_id_index on relationships r (cost=0.00..35.33 rows=16 width=24) (actual time=0.017..2.722 rows=775 loops=13) Index Cond: (lt.user_id = r.contact_id) Total runtime: 69.640 ms but for the following user_id there are 3M rows in the large table which are more recent then the 10th matching one. The query then does not perform so well: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt LEFT JOIN relationships r ON lt.user_id=r.contact_id WHERE r.user_id = 12345 OR lt.user_id = 12345 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN --- Limit (cost=0.00..33809.31 rows=10 width=646) (actual time=203454.353..425978.718 rows=10 loops=1) - Nested Loop Left Join (cost=0.00..156983372.66 rows=46432 width=646) (actual time=203454.347..425978.662 rows=10 loops=1) Filter: ((r.user_id = 12345) OR (lt.user_id = 12345)) - Index Scan Backward using large_created_at_index on large_table lt (cost=0.00..914814.94 rows=4382838 width=622) (actual time=0.031..78386.769 rows=3017547 loops=1) - Index Scan using relationships_contact_id_index on relationships r (cost=0.00..35.33 rows=16 width=24) (actual time=0.006..0.060 rows=18 loops=3017547) Index Cond: (lt.user_id = r.contact_id) Total runtime: 425978.903 ms When split it up into the two following queries it performs much better for that user_id. Since the results of the two could be combined into the desired result, I'm assuming it could also be done efficiently within one query, if only a better plan would be used. testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN - Limit (cost=0.00..33.57 rows=10 width=622) (actual time=64.030..71.720 rows=10 loops=1) - Index Scan Backward using large_user_id_created_at_index on large_table lt (cost=0.00..7243.59 rows=2158 width=622) (actual time=64.023..71.662 rows=10 loops=1) Index Cond: (user_id = 12345) Total runtime: 71.826 ms testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10; QUERY PLAN
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer skrev: The query works fine for the common cases when matching rows are found early in the sorted large table, like this: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt LEFT JOIN relationships r ON lt.user_id=r.contact_id WHERE r.user_id = 5 OR lt.user_id = 5 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN but for the following user_id there are 3M rows in the large table which are more recent then the 10th matching one. The query then does not perform so well: testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt LEFT JOIN relationships r ON lt.user_id=r.contact_id WHERE r.user_id = 12345 OR lt.user_id = 12345 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN When split it up into the two following queries it performs much better for that user_id. Since the results of the two could be combined into the desired result, I'm assuming it could also be done efficiently within one query, if only a better plan would be used. testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY lt.created_at DESC LIMIT 10; QUERY PLAN testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10; QUERY PLAN I'm not very experienced reading query plans and don't know how to go about this from here - is it theoretically possible to have a query that performs well with the given data in both cases or is there a conceptual problem? How does the obvious UNION query do - ie: SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 12345 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ) q ORDER BY created_at DESC LIMIT 10; ? How about SELECT * FROM large_table lt WHERE lt.user_id = 12345 OR user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10; ? I am missing a unique constraint on (user_id, contact_id) - otherwise the subselect is not equivalent to the join. Probably you also should have foreign key constraints on relationships.user_id and relationships.contact_id. These are unlikely to affect performance though, in my experience. It might be good to know whether contact_id = user_id is possible - since this would rule out the possibility of a row satisfying both branches of the union. Nis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq