subject:"\[GENERAL\] massive performance hit when using Limit 1"

Re: [GENERAL] massive performance hit when using "Limit 1"

2005-12-06 Thread Rich Doughty


Rich Doughty wrote:

Richard Huxton wrote:


Rich Doughty wrote:


[snip]

Try the same query but with a low retailer_id (100 or something) and 
see if it goes a lot quicker. If that is what the problem is, try 
changing the ORDER BY to something like "_s.retailer_id, _t.value, 
_t.number" and see if that gives the planner a nudge in the right 
direction.



the retailer_id would make no difference as thee are only 4000-ish rows in
ta_tokens_stock and they all (for now) have the same retailer_id.


ooops. i (sort of) spoke too soon. i didn't read the second half of the
comment properly. changing the ORDER BY clause does force a more sensible
query plan.

many thanks. so that's one way to give the planner hints...


Failing that, a change to your indexes will almost certainly help.



i'm not sure that's the case. the exact same query, but limited to >2 rows
is fine.

I found this in the 8.0.4 relnotes. i reckon its a good guess that's 
what the

problem is:

* Fix mis-planning of queries with small LIMIT values due to poorly thought
  out "fuzzy" cost comparison



--

  - Rich Doughty

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] massive performance hit when using "Limit 1"

2005-12-06 Thread Rich Doughty


Richard Huxton wrote:

Rich Doughty wrote:



This one goes nuts and doesn't return. is there any way i can
force a query plan similar to the one above?

  EXPLAIN SELECT _t.* FROM
   tokens.ta_tokens   _t INNER JOIN
   tokens.ta_tokens_stock _s ON _t.token_id = _s.token_id
  WHERE
   _s.retailer_id = '96599' AND
   _t.value   = '10'
  ORDER BY
   _t.number ASC
  LIMIT '1';
QUERY PLAN
--- 


 Limit  (cost=0.00..14967.39 rows=1 width=27)
   ->  Nested Loop  (cost=0.00..22316378.56 rows=1491 width=27)
 ->  Index Scan using ta_tokens_number_key on ta_tokens _t  
(cost=0.00..15519868.33 rows=1488768 width=27)

   Filter: ((value)::numeric = 10::numeric)
 ->  Index Scan using ta_tokens_stock_pkey on ta_tokens_stock 
_s  (cost=0.00..4.55 rows=1 width=4)
   Index Cond: (("outer".token_id)::integer = 
(_s.token_id)::integer)

   Filter: ((retailer_id)::integer = 96599)



I *think* what's happening here is that PG thinks it will use the index 
on _t.number (since you are going to sort by that anyway) and pretty 
soon find a row that will:

  1. have value=10
  2. join to a row in _s with the right retailer_id
It turns out that isn't the case, and so the query takes forever. 
Without knowing what "value" and "number" mean it's difficult to be 
sure, but I'd guess it's the "token_id" join part that's the problem, 
since at a guess a high-numbered retailer will have tokens with 
high-numbered "retailer_id".


If you'd posted EXPLAIN ANALYSE then we'd be able to see what actually 
did happen.


no chance. it takes far too long to return (days...).

Try the same query but with a low retailer_id (100 or something) and see 
if it goes a lot quicker. If that is what the problem is, try changing 
the ORDER BY to something like "_s.retailer_id, _t.value, _t.number" and 
see if that gives the planner a nudge in the right direction.


the retailer_id would make no difference as thee are only 4000-ish rows in
ta_tokens_stock and they all (for now) have the same retailer_id.


Failing that, a change to your indexes will almost certainly help.


i'm not sure that's the case. the exact same query, but limited to >2 rows
is fine.

I found this in the 8.0.4 relnotes. i reckon its a good guess that's what the
problem is:

* Fix mis-planning of queries with small LIMIT values due to poorly thought
  out "fuzzy" cost comparison



--

  - Rich Doughty

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] massive performance hit when using "Limit 1"

2005-12-06 Thread Richard Huxton


Rich Doughty wrote:


This one goes nuts and doesn't return. is there any way i can
force a query plan similar to the one above?

  EXPLAIN SELECT _t.* FROM
   tokens.ta_tokens   _t INNER JOIN
   tokens.ta_tokens_stock _s ON _t.token_id = _s.token_id
  WHERE
   _s.retailer_id = '96599' AND
   _t.value   = '10'
  ORDER BY
   _t.number ASC
  LIMIT '1';
QUERY PLAN
--- 


 Limit  (cost=0.00..14967.39 rows=1 width=27)
   ->  Nested Loop  (cost=0.00..22316378.56 rows=1491 width=27)
 ->  Index Scan using ta_tokens_number_key on ta_tokens _t  
(cost=0.00..15519868.33 rows=1488768 width=27)

   Filter: ((value)::numeric = 10::numeric)
 ->  Index Scan using ta_tokens_stock_pkey on ta_tokens_stock 
_s  (cost=0.00..4.55 rows=1 width=4)
   Index Cond: (("outer".token_id)::integer = 
(_s.token_id)::integer)

   Filter: ((retailer_id)::integer = 96599)


I *think* what's happening here is that PG thinks it will use the index 
on _t.number (since you are going to sort by that anyway) and pretty 
soon find a row that will:

  1. have value=10
  2. join to a row in _s with the right retailer_id
It turns out that isn't the case, and so the query takes forever. 
Without knowing what "value" and "number" mean it's difficult to be 
sure, but I'd guess it's the "token_id" join part that's the problem, 
since at a guess a high-numbered retailer will have tokens with 
high-numbered "retailer_id".


If you'd posted EXPLAIN ANALYSE then we'd be able to see what actually 
did happen.


Try the same query but with a low retailer_id (100 or something) and see 
if it goes a lot quicker. If that is what the problem is, try changing 
the ORDER BY to something like "_s.retailer_id, _t.value, _t.number" and 
see if that gives the planner a nudge in the right direction.


Failing that, a change to your indexes will almost certainly help.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] massive performance hit when using "Limit 1"

2005-12-06 Thread Rich Doughty


Rich Doughty wrote:

can anyone explain the reason for the difference in the
following 2 query plans, or offer any advice? the two queries
are identical apart from the limit clause.


[snip]

fwiw, join order makes no difference here either. i get a slightly
different plan, but with LIMIT 1 postgres make a really strange
planner choice.

As before LIMIT > 1 the choice is logical and performance fine.



  - Rich Doughty

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] massive performance hit when using "Limit 1"

2005-12-05 Thread Rich Doughty


can anyone explain the reason for the difference in the
following 2 query plans, or offer any advice? the two queries
are identical apart from the limit clause.

the performance here is fine and is the same for LIMIT >= 2

  EXPLAIN SELECT _t.* FROM
tokens.ta_tokens   _t INNER JOIN
tokens.ta_tokens_stock _s ON _t.token_id = _s.token_id
  WHERE
_s.retailer_id = '96599' AND
_t.value   = '10'
  ORDER BY
_t.number ASC
  LIMIT '2';

  QUERY PLAN
--
 Limit  (cost=22757.15..22757.15 rows=2 width=27)
   ->  Sort  (cost=22757.15..22760.88 rows=1491 width=27)
 Sort Key: _t.number
 ->  Nested Loop  (cost=0.00..22678.56 rows=1491 width=27)
   ->  Seq Scan on ta_tokens_stock _s  (cost=0.00..75.72 rows=4058 
width=4)
 Filter: ((retailer_id)::integer = 96599)
   ->  Index Scan using ta_tokens_pkey on ta_tokens _t  
(cost=0.00..5.56 rows=1 width=27)
 Index Cond: ((_t.token_id)::integer = 
("outer".token_id)::integer)
 Filter: ((value)::numeric = 10::numeric)
(9 rows)

This one goes nuts and doesn't return. is there any way i can
force a query plan similar to the one above?

  EXPLAIN SELECT _t.* FROM
   tokens.ta_tokens   _t INNER JOIN
   tokens.ta_tokens_stock _s ON _t.token_id = _s.token_id
  WHERE
   _s.retailer_id = '96599' AND
   _t.value   = '10'
  ORDER BY
   _t.number ASC
  LIMIT '1';
QUERY PLAN
---
 Limit  (cost=0.00..14967.39 rows=1 width=27)
   ->  Nested Loop  (cost=0.00..22316378.56 rows=1491 width=27)
 ->  Index Scan using ta_tokens_number_key on ta_tokens _t  
(cost=0.00..15519868.33 rows=1488768 width=27)
   Filter: ((value)::numeric = 10::numeric)
 ->  Index Scan using ta_tokens_stock_pkey on ta_tokens_stock _s  
(cost=0.00..4.55 rows=1 width=4)
   Index Cond: (("outer".token_id)::integer = 
(_s.token_id)::integer)
   Filter: ((retailer_id)::integer = 96599)
(7 rows)


All tables are vacuumed and analysed. the row estimates in the
plans are accurate.


select version();
   version
--
 PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 
(prerelease) (Debian 4.0.1-6)



Thanks a lot,


  - Rich Doughty

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] massive performance hit when using "Limit 1"

Re: [GENERAL] massive performance hit when using "Limit 1"

Re: [GENERAL] massive performance hit when using "Limit 1"

Re: [GENERAL] massive performance hit when using "Limit 1"

[GENERAL] massive performance hit when using "Limit 1"

5 matches

Site Navigation

Mail list logo

Footer information