Re: [PERFORM] Slow query with backwards index scan

2007-07-30 Thread Nis Jørgensen
Tilmann Singer skrev:

 But the subselect is not fast for the user with many relationships and
 matched rows at the beginning of the sorted large_table:
 
 testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
  WHERE user_id IN (SELECT contact_id FROM relationships WHERE
  user_id=5)
  ORDER BY created_at DESC LIMIT 10;
   
  QUERY PLAN 
 -
  Limit  (cost=6963.94..6963.96 rows=10 width=621) (actual 
 time=53187.349..53187.424 rows=10 loops=1)
-  Sort  (cost=6963.94..6966.96 rows=1211 width=621) (actual 
 time=53187.341..53187.360 rows=10 loops=1)
  Sort Key: lt.created_at
  -  Nested Loop  (cost=39.52..6901.92 rows=1211 width=621) (actual 
 time=201.728..52673.800 rows=69018 loops=1)
-  HashAggregate  (cost=39.52..39.53 rows=1 width=4) (actual 
 time=178.777..178.966 rows=40 loops=1)
  -  Bitmap Heap Scan on relationships  (cost=4.33..39.49 
 rows=10 width=4) (actual time=47.049..178.560 rows=40 loops=1)
Recheck Cond: (user_id = 5)
-  Bitmap Index Scan on 
 relationships_user_id_contact_id_index  (cost=0.00..4.33 rows=10 width=0) 
 (actual time=28.721..28.721 rows=40 loops=1)
  Index Cond: (user_id = 5)
-  Index Scan using large_user_id_started_at_index on 
 large_table lt  (cost=0.00..6834.04 rows=2268 width=621) (actual 
 time=21.994..1301.375 rows=1725 loops=40)
  Index Cond: (lt.user_id = relationships.contact_id)
  Total runtime: 53188.584 ms
 
 
 
 Using a join now the query for mat

 Any ideas?

It seems to me the subselect plan would benefit quite a bit from not
returning all rows, but only the 10 latest for each user. I believe the
problem is similar to what is discussed for UNIONs here:

http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8

which got implemented here:

http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565

It seems to me the planner in this case would actually need to push the
limit into the nested loop, since we want the plan to take advantage of
the index (using a backwards index scan). I am ready to be corrected though.

You could try this (quite hackish) attempt at forcing the query planner
to use this plan:

SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE
 user_id=5) AND created_at in (SELECT created_at FROM large_table
lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
 ORDER BY created_at DESC LIMIT 10;

If that doesn't work, you might have reached the point where you need to
use some kind of record-keeping system to keep track of which records to
look at.

/Nis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Slow query with backwards index scan

2007-07-30 Thread Tilmann Singer
* Nis Jørgensen [EMAIL PROTECTED] [20070730 18:33]:
 It seems to me the subselect plan would benefit quite a bit from not
 returning all rows, but only the 10 latest for each user. I believe the
 problem is similar to what is discussed for UNIONs here:
 
 http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8
 
 which got implemented here:
 
 http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565
 
 It seems to me the planner in this case would actually need to push the
 limit into the nested loop, since we want the plan to take advantage of
 the index (using a backwards index scan). I am ready to be corrected though.

If I understand that correctly than this means that it would benefit
the planning for something like

SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ...

if any of q1 or q2 would satisfy the rows requested by limit early,
instead of planning q1 and q2 without having the limit of the outer
query an influence.

Unfortunately I'm having problems making my q2 reasonably efficient in
the first place, even before UNIONing it.

 You could try this (quite hackish) attempt at forcing the query planner
 to use this plan:
 
 SELECT * FROM large_table lt
  WHERE user_id IN (SELECT contact_id FROM relationships WHERE
  user_id=5) AND created_at in (SELECT created_at FROM large_table
 lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
  ORDER BY created_at DESC LIMIT 10;

No for the user with many matches at the beginning:

testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE
 user_id=5) AND created_at IN (SELECT created_at FROM large_table lt2
   WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
 ORDER BY created_at DESC LIMIT 10;

 QUERY PLAN

 Limit  (cost=4.94..4.97 rows=10 width=621) (actual 
time=70550.549..70550.616 rows=10 loops=1)
   -  Sort  (cost=4.94..45557.46 rows=605 width=621) (actual 
time=70550.542..70550.569 rows=10 loops=1)
 Sort Key: lt.created_at
 -  Nested Loop  (cost=39.52..45527.99 rows=605 width=621) (actual 
time=2131.501..70548.313 rows=321 loops=1)
   -  HashAggregate  (cost=39.52..39.53 rows=1 width=4) (actual 
time=0.406..0.615 rows=40 loops=1)
 -  Bitmap Heap Scan on relationships  (cost=4.33..39.49 
rows=10 width=4) (actual time=0.075..0.248 rows=40 loops=1)
   Recheck Cond: (user_id = 5)
   -  Bitmap Index Scan on 
relationships_user_id_contact_id_index  (cost=0.00..4.33 rows=10 width=0) 
(actual time=0.052..0.052 rows=40 loops=1)
 Index Cond: (user_id = 5)
   -  Index Scan using large_user_id_started_at_index on 
large_table lt  (cost=0.00..45474.29 rows=1134 width=621) (actual 
time=1762.067..1763.637 rows=8 loops=40)
 Index Cond: (lt.user_id = relationships.contact_id)
 Filter: (subplan)
 SubPlan
   -  Limit  (cost=0.00..34.04 rows=10 width=8) (actual 
time=0.048..0.147 rows=10 loops=69018)
 -  Index Scan Backward using 
large_user_id_created_at_index on large_table lt2  (cost=0.00..7721.24 
rows=2268 width=8) (actual time=0.040..0.087 rows=10 loops=69018)
   Index Cond: (user_id = $0)
 Total runtime: 70550.847 ms


The same plan is generated for the user with few matches and executes
very fast.


 If that doesn't work, you might have reached the point where you need to
 use some kind of record-keeping system to keep track of which records to
 look at.

Yes, I'm considering that unfortunately.

Seeing however that there are 2 different queries which result in very
efficient plans for one of the 2 different cases, but not the other,
makes me hope there is a way to tune the planner into always coming up
with the right plan. I'm not sure if I explained the problem well
enough and will see if I can come up with a reproduction case with
generated data.


Til

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow query with backwards index scan

2007-07-30 Thread Nis Jørgensen
Tilmann Singer skrev:
 * Nis Jørgensen [EMAIL PROTECTED] [20070730 18:33]:
 It seems to me the subselect plan would benefit quite a bit from not
 returning all rows, but only the 10 latest for each user. I believe the
 problem is similar to what is discussed for UNIONs here:

 http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=stq=postgresql+limit++unionrnum=1hl=en#367f5052b1bbf1c8

 which got implemented here:

 http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=stq=postgresql+limit++unionrnum=26hl=en#9b3e5bd2d612f565

 It seems to me the planner in this case would actually need to push the
 limit into the nested loop, since we want the plan to take advantage of
 the index (using a backwards index scan). I am ready to be corrected though.
 
 If I understand that correctly than this means that it would benefit
 the planning for something like

 SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ...
 
 if any of q1 or q2 would satisfy the rows requested by limit early,
 instead of planning q1 and q2 without having the limit of the outer
 query an influence.

 Unfortunately I'm having problems making my q2 reasonably efficient in
 the first place, even before UNIONing it.

The second branch of your UNION is really equivalent to the following
pseudo_code:

contacts = SELECT contact_id FROM relations WHERE user_id = $id

sql = SELECT * FROM (
SELECT * FROM lt WHERE user_id = contacts[0]
UNION ALL
SELECT * FROM lt WHERE user_id = contacts[1]
.
.
.
) ORDER BY created_at LIMIT 10;

Currently, it seems the subqueries are fetching all rows.

Thus a plan which makes each of the subqueries aware of the LIMIT  might
be able to improve performance. Unlike the UNION case, it seems this
means making the subqueries aware that the plan is valid, not just
changing the cost estimate.


How does this imperative approach perform? I realize you probably
don't want to use this, but it would give us an idea whether we would be
able to get the speed we need by forcing this plan on pg.

 If that doesn't work, you might have reached the point where you need to
 use some kind of record-keeping system to keep track of which records to
 look at.
 
 Yes, I'm considering that unfortunately.
 
 Seeing however that there are 2 different queries which result in very
 efficient plans for one of the 2 different cases, but not the other,
 makes me hope there is a way to tune the planner into always coming up
 with the right plan. I'm not sure if I explained the problem well
 enough and will see if I can come up with a reproduction case with
 generated data.

I think the problem is that Postgresql does not have the necessary
statistics to determine which of the two plans will perform well. There
are basically two unknowns in the query:

- How many uninteresting records do we have to scan through before get
to the interesting ones (if using plan 1).
- How many matching rows do we find in relations

The first one it is not surprising that pg cannot estimate.

I am a little surprised that pg is not able to estimate the second one
better. Increasing the statistics settings might help in getting a
different plan.

I am also slightly surprised that the two equivalent formulations of the
query yield such vastly different query plans. In my experience, pg is
quite good at coming up with similar query plans for equivalent queries.
 You might want to fiddle with DISTINCT or indexing to make sure that
they are indeed logically equivalent.

Anyway, it seems likely that you will at some point run into the query
for many matches at the end of the table - which none of your plans will
be good at supplying. So perhaps you can just as well prepare for it now.


Nis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread Tilmann Singer
* Nis Jørgensen [EMAIL PROTECTED] [20070727 20:31]:
 How does the obvious UNION query do - ie:
 
 SELECT * FROM (
 SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 
 UNION
 
 SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ) q
 
 ORDER BY created_at DESC LIMIT 10;

Great for the user with little data:

testdb=# EXPLAIN ANALYZE SELECT * FROM (
SELECT * FROM large_table lt
WHERE lt.user_id = 12345
UNION
SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
) q
ORDER BY created_at DESC LIMIT 10;

 QUERY PLAN   

 Limit  (cost=14673.77..14673.80 rows=10 width=3140) (actual 
time=133.877..133.946 rows=10 loops=1)
   -  Sort  (cost=14673.77..14682.22 rows=3378 width=3140) (actual 
time=133.870..133.894 rows=10 loops=1)
 Sort Key: q.created_at
 -  Unique  (cost=14315.34..14442.01 rows=3378 width=622) (actual 
time=133.344..133.705 rows=38 loops=1)
   -  Sort  (cost=14315.34..14323.78 rows=3378 width=622) (actual 
time=133.337..133.432 rows=38 loops=1)
 Sort Key: id, user_id, plaze_id, device, started_at, 
updated_at, status, type, duration, permission, created_at, mac_address, 
subnet, msc
 -  Append  (cost=0.00..14117.35 rows=3378 width=622) 
(actual time=39.144..133.143 rows=38 loops=1)
   -  Index Scan using large_user_id_started_at_index 
on large_table lt  (cost=0.00..7243.59 rows=2158 width=622) (actual 
time=39.138..109.831 rows=34 loops=1)
 Index Cond: (user_id = 12345)
   -  Nested Loop  (cost=42.78..6839.98 rows=1220 
width=622) (actual time=14.859..23.112 rows=4 loops=1)
 -  HashAggregate  (cost=42.78..42.79 rows=1 
width=4) (actual time=8.092..8.095 rows=1 loops=1)
   -  Bitmap Heap Scan on relationships  
(cost=4.34..42.75 rows=11 width=4) (actual time=8.067..8.070 rows=1 loops=1)
 Recheck Cond: (user_id = 12345)
 -  Bitmap Index Scan on 
relationships_user_id_index  (cost=0.00..4.34 rows=11 width=0) (actual 
time=8.057..8.057 rows=1 loops=1)
   Index Cond: (user_id = 12345)
 -  Index Scan using 
large_user_id_started_at_index on large_table lt  (cost=0.00..6768.48 rows=2297 
width=622) (actual time=6.751..14.970 rows=4 loops=1)
   Index Cond: (lt.user_id = 
relationships.contact_id)
 Total runtime: 134.220 ms


Not so great for the user with many early matches:

testdb=# EXPLAIN ANALYZE SELECT * FROM (
SELECT * FROM large_table lt
WHERE lt.user_id = 5
UNION
SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=5)
) q
ORDER BY created_at DESC LIMIT 10;

   QUERY PLAN 

 Limit  (cost=14673.77..14673.80 rows=10 width=3140) (actual 
time=3326.304..3326.367 rows=10 loops=1)
   -  Sort  (cost=14673.77..14682.22 rows=3378 width=3140) (actual 
time=3326.297..3326.318 rows=10 loops=1)
 Sort Key: q.created_at
 -  Unique  (cost=14315.34..14442.01 rows=3378 width=622) (actual 
time=2413.070..3019.385 rows=69757 loops=1)
   -  Sort  (cost=14315.34..14323.78 rows=3378 width=622) (actual 
time=2413.062..2590.354 rows=69757 loops=1)
 Sort Key: id, user_id, plaze_id, device, started_at, 
updated_at, status, type, duration, permission, created_at, mac_address, 
subnet, msc
 -  Append  (cost=0.00..14117.35 rows=3378 width=622) 
(actual time=0.067..1911.626 rows=69757 loops=1)
   -  Index Scan using large_user_id_started_at_index 
on large_table lt  (cost=0.00..7243.59 rows=2158 width=622) (actual 
time=0.062..3.440 rows=739 loops=1)
 Index Cond: (user_id = 5)
   -  Nested Loop  (cost=42.78..6839.98 rows=1220 
width=622) (actual time=0.451..1557.901 rows=69018 loops=1)
 -  HashAggregate  (cost=42.78..42.79 rows=1 
width=4) (actual time=0.404..0.580 rows=40 loops=1)
 

Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread andrew
Tilmann Singer [EMAIL PROTECTED] wrote ..
 * Nis Jørgensen [EMAIL PROTECTED] [20070727 20:31]:
  How does the obvious UNION query do - ie:
  
  SELECT * FROM (
  SELECT * FROM large_table lt
  WHERE lt.user_id = 12345
  
  UNION
  
  SELECT * FROM large_table lt
  WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
  ) q
  
  ORDER BY created_at DESC LIMIT 10;

Let's try putting the sort/limit in each piece of the UNION to speed them up 
separately.

SELECT * FROM (
 (SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 ORDER BY created_at DESC LIMIT 10) AS q1
 UNION
 (SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ORDER BY created_at DESC LIMIT 10) AS q2
ORDER BY created_at DESC LIMIT 10;

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread Tilmann Singer
* [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]:
 Let's try putting the sort/limit in each piece of the UNION to speed them up 
 separately.
 
 SELECT * FROM (
  (SELECT * FROM large_table lt
  WHERE lt.user_id = 12345
  ORDER BY created_at DESC LIMIT 10) AS q1
  UNION
  (SELECT * FROM large_table lt
  WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
  ORDER BY created_at DESC LIMIT 10) AS q2
 ORDER BY created_at DESC LIMIT 10;

It's not possible to use ORDER BY or LIMIT within unioned queries.

http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION

Would that make sense at all given the way the postgresql planner
works?  Does that work in other DB's?


Til

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread Craig James

Tilmann Singer wrote:

* [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]:

Let's try putting the sort/limit in each piece of the UNION to speed them up 
separately.

SELECT * FROM (
 (SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 ORDER BY created_at DESC LIMIT 10) AS q1
 UNION
 (SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ORDER BY created_at DESC LIMIT 10) AS q2
ORDER BY created_at DESC LIMIT 10;


It's not possible to use ORDER BY or LIMIT within unioned queries.

http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION


If I'm reading this documentation correctly, it *is* possible, as long as 
they're inside of a sub-select, as in this case.

Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread Jeremy Harris

Tilmann Singer wrote:

* [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]:

Let's try putting the sort/limit in each piece of the UNION to speed them up 
separately.

SELECT * FROM (
 (SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 ORDER BY created_at DESC LIMIT 10) AS q1
 UNION
 (SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ORDER BY created_at DESC LIMIT 10) AS q2
ORDER BY created_at DESC LIMIT 10;


It's not possible to use ORDER BY or LIMIT within unioned queries.

http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION


ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in 
parentheses

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread andrew
As other posters have pointed out, you can overcome the ORDER BY/LIMIT 
restriction on UNIONs with parentheses. I think I misbalanced the parentheses 
in my original post, which would have caused an error if you just copied and 
pasted.

I don't think the limitation has to do with planning--just parsing the query.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] Slow query with backwards index scan

2007-07-27 Thread Tilmann Singer
Dear list,


I am having problems selecting the 10 most recent rows from a large
table (4.5M rows), sorted by a date column of that table. The large
table has a column user_id which either should match a given user_id,
or should match the column contact_id in a correlated table where the
user_id of that correlated table must match the given user_id.

The user_id values in the large table are distributed in such a way
that in the majority of cases for a given user_id 10 matching rows can
be found very early when looking at the table sorted by the date -
propably within the first 1%. Sometimes the given user_id however
would match any rows only very far towards the end of the sorted large
table, or not at all.

The query works fine for the common cases when matching rows are found
early in the sorted large table, like this:

testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
LEFT JOIN relationships r ON lt.user_id=r.contact_id
WHERE r.user_id = 5 OR lt.user_id = 5
ORDER BY lt.created_at DESC LIMIT 10;

QUERY PLAN 
--
 Limit  (cost=0.00..33809.31 rows=10 width=646) (actual time=0.088..69.448 
rows=10 loops=1)
   -  Nested Loop Left Join  (cost=0.00..156983372.66 rows=46432 width=646) 
(actual time=0.082..69.393 rows=10 loops=1)
 Filter: ((r.user_id = 5) OR (lt.user_id = 5))
 -  Index Scan Backward using large_created_at_index on large_table lt 
 (cost=0.00..914814.94 rows=4382838 width=622) (actual time=0.028..0.067 
rows=13 loops=1)
 -  Index Scan using relationships_contact_id_index on relationships r 
 (cost=0.00..35.33 rows=16 width=24) (actual time=0.017..2.722 rows=775 
loops=13)
   Index Cond: (lt.user_id = r.contact_id)
 Total runtime: 69.640 ms


but for the following user_id there are 3M rows in the large table
which are more recent then the 10th matching one. The query then does
not perform so well:


testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
LEFT JOIN relationships r ON lt.user_id=r.contact_id
WHERE r.user_id = 12345 OR lt.user_id = 12345
ORDER BY lt.created_at DESC LIMIT 10;

QUERY PLAN 
---
 Limit  (cost=0.00..33809.31 rows=10 width=646) (actual 
time=203454.353..425978.718 rows=10 loops=1)
   -  Nested Loop Left Join  (cost=0.00..156983372.66 rows=46432 width=646) 
(actual time=203454.347..425978.662 rows=10 loops=1)
 Filter: ((r.user_id = 12345) OR (lt.user_id = 12345))
 -  Index Scan Backward using large_created_at_index on large_table lt 
 (cost=0.00..914814.94 rows=4382838 width=622) (actual time=0.031..78386.769 
rows=3017547 loops=1)
 -  Index Scan using relationships_contact_id_index on relationships r 
 (cost=0.00..35.33 rows=16 width=24) (actual time=0.006..0.060 rows=18 
loops=3017547)
   Index Cond: (lt.user_id = r.contact_id)
 Total runtime: 425978.903 ms



When split it up into the two following queries it performs much
better for that user_id. Since the results of the two could be
combined into the desired result, I'm assuming it could also be done
efficiently within one query, if only a better plan would be used.


testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
WHERE lt.user_id = 12345
ORDER BY lt.created_at DESC LIMIT 10;

   QUERY PLAN  
-
 Limit  (cost=0.00..33.57 rows=10 width=622) (actual time=64.030..71.720 
rows=10 loops=1)
   -  Index Scan Backward using large_user_id_created_at_index on large_table 
lt  (cost=0.00..7243.59 rows=2158 width=622) (actual time=64.023..71.662 
rows=10 loops=1)
 Index Cond: (user_id = 12345)
 Total runtime: 71.826 ms


testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
ORDER BY created_at DESC LIMIT 10;

   QUERY PLAN  

Re: [PERFORM] Slow query with backwards index scan

2007-07-27 Thread Nis Jørgensen
Tilmann Singer skrev:

 The query works fine for the common cases when matching rows are found
 early in the sorted large table, like this:
 
 testdb=# EXPLAIN ANALYZE
 SELECT * FROM large_table lt
 LEFT JOIN relationships r ON lt.user_id=r.contact_id
 WHERE r.user_id = 5 OR lt.user_id = 5
 ORDER BY lt.created_at DESC LIMIT 10;
   
   QUERY PLAN 
 but for the following user_id there are 3M rows in the large table
 which are more recent then the 10th matching one. The query then does
 not perform so well:
 
 
 testdb=# EXPLAIN ANALYZE
 SELECT * FROM large_table lt
 LEFT JOIN relationships r ON lt.user_id=r.contact_id
 WHERE r.user_id = 12345 OR lt.user_id = 12345
 ORDER BY lt.created_at DESC LIMIT 10;
   
   QUERY PLAN 
 When split it up into the two following queries it performs much
 better for that user_id. Since the results of the two could be
 combined into the desired result, I'm assuming it could also be done
 efficiently within one query, if only a better plan would be used.
 
 
 testdb=# EXPLAIN ANALYZE
 SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 ORDER BY lt.created_at DESC LIMIT 10;
   
  QUERY PLAN  
 testdb=# EXPLAIN ANALYZE
 SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ORDER BY created_at DESC LIMIT 10;
   
  QUERY PLAN  
 I'm not very experienced reading query plans and don't know how to go
 about this from here - is it theoretically possible to have a query
 that performs well with the given data in both cases or is there a
 conceptual problem?

How does the obvious UNION query do - ie:

SELECT * FROM (
SELECT * FROM large_table lt
WHERE lt.user_id = 12345

UNION

SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
) q

ORDER BY created_at DESC LIMIT 10;

?

How about

SELECT * FROM large_table lt
WHERE lt.user_id = 12345 OR user_id IN (SELECT contact_id FROM
relationships WHERE user_id=12345)
ORDER BY created_at DESC LIMIT 10;

?

I am missing a unique constraint on (user_id, contact_id) - otherwise
the subselect is not equivalent to the join.

Probably you also should have foreign key constraints on
relationships.user_id and relationships.contact_id. These are unlikely
to affect performance though, in my experience.

It might be good to know whether contact_id = user_id is possible -
since this would rule out the possibility of a row satisfying both
branches of the union.

Nis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq