[PERFORM] Performs WAY better with enable_seqscan = off

2006-05-21 Thread Brendan Duddridge
Hi,I have a query that performs WAY better when I have enable_seqscan = off:explain analyze select ac.attribute_id, la.name, ac.sort_order from attribute_category ac, localized_attribute la where ac.category_id = 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and la.attribute_id = ac.attribute_id and exists ( select 'x' from product_attribute_value pav, category_product cp where (pav.product_id || '.' || pav.attribute_id) = (cp.product_id || '.' || ac.attribute_id) and pav.status_code is null and (cp.category_id || '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is null), ac.sort_order, la.name asc;                                                                                                          QUERY PLAN                                                                                                          -- Sort  (cost=47.97..47.98 rows=7 width=34) (actual time=33368.721..33368.721 rows=2 loops=1)   Sort Key: (ac.sort_order IS NULL), ac.sort_order, la.name   -  Nested Loop  (cost=2.00..47.87 rows=7 width=34) (actual time=13563.049..33368.679 rows=2 loops=1)         -  Index Scan using attribute_category__category_id_fk_idx on attribute_category ac  (cost=0.00..26.73 rows=7 width=8) (actual time=13562.918..33368.370 rows=2 loops=1)               Index Cond: (category_id = 1001402)               Filter: (((is_browsable)::text = 'true'::text) AND (subplan))               SubPlan                 -  Nested Loop  (cost=0.02..278217503.21 rows=354763400 width=0) (actual time=4766.821..4766.821 rows=0 loops=7)                       -  Seq Scan on category_product cp  (cost=0.00..158150.26 rows=18807 width=4) (actual time=113.595..4585.461 rows=12363 loops=7)                             Filter: category_id)::text || '.'::text) || (is_visible)::text) = '1001402.true'::text)                       -  Index Scan using product_attribute_value__prod_id_att_id_status_is_null_ids on product_attribute_value pav  (cost=0.02..14171.84 rows=18863 width=8) (actual time=0.012..0.012 rows=0 loops=86538)                             Index Cond: pav.product_id)::text || '.'::text) || (pav.attribute_id)::text) = ((("outer".product_id)::text || '.'::text) || ($0)::text))         -  Bitmap Heap Scan on localized_attribute la  (cost=2.00..3.01 rows=1 width=30) (actual time=0.129..0.129 rows=1 loops=2)               Recheck Cond: (la.attribute_id = "outer".attribute_id)               Filter: (locale_id = 101)               -  Bitmap Index Scan on localized_attribute__attribute_id_fk_idx  (cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1 loops=2)                     Index Cond: (la.attribute_id = "outer".attribute_id) Total runtime: 33369.105 msNow when I disable sequential scans:set enable_seqscan = off;explain analyze select ac.attribute_id, la.name, ac.sort_order from attribute_category ac, localized_attribute la where ac.category_id = 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and la.attribute_id = ac.attribute_id and exists ( select 'x' from product_attribute_value pav, category_product cp where (pav.product_id || '.' || pav.attribute_id) = (cp.product_id || '.' || ac.attribute_id) and pav.status_code is null and (cp.category_id || '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is null), ac.sort_order, la.name asc;                                                                                                          QUERY PLAN                                                                                                          -- Sort  (cost=48.09..48.11 rows=7 width=34) (actual time=1675.944..1675.945 rows=2 loops=1)   Sort Key: (ac.sort_order IS NULL), ac.sort_order, la.name   -  Nested Loop  (cost=2.00..48.00 rows=7 width=34) (actual time=687.600..1675.831 rows=2 loops=1)         -  Index Scan using attribute_category__category_id_fk_idx on attribute_category ac  (cost=0.00..26.86 rows=7 width=8) (actual time=687.441..1675.584 rows=2 loops=1)               Index Cond: (category_id = 1001402)               Filter: (((is_browsable)::text = 'true'::text) AND (subplan))               SubPlan                 -  Nested Loop  (cost=0.03..278076992.97 rows=354763400 width=0) (actual time=239.299..239.299 rows=0 loops=7)                       -  Index Scan using category_product__cat_id_is_visible_idx on category_product cp  (cost=0.01..17640.02 rows=18807 width=4) (actual time=0.036..30.205 rows=12363 loops=7)                             Index Cond: category_id)::text || '.'::text) || 

Re: [PERFORM] Performs WAY better with enable_seqscan = off

2006-05-21 Thread Ragnar
On sun, 2006-05-21 at 02:21 -0600, Brendan Duddridge wrote:
 Hi,
 
 
 I have a query that performs WAY better when I have enable_seqscan =
 off:
 
 
 explain analyze select ac.attribute_id, la.name, ac.sort_order from
 attribute_category ac, localized_attribute la where ac.category_id =
 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and
 la.attribute_id = ac.attribute_id and exists ( select 'x' from
 product_attribute_value pav, category_product cp where (pav.product_id
 || '.' || pav.attribute_id) = (cp.product_id || '.' ||
 ac.attribute_id) and pav.status_code is null and (cp.category_id ||
 '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is
 null), ac.sort_order, la.name asc;

is there some reason for the complicated form of the
join conditions in the subselect?

would this not be clearer:

explain analyze 
  select ac.attribute_id,
 la.name, 
 ac.sort_order
  from attribute_category ac,
   localized_attribute la
  where ac.category_id = 1001402 
and la.locale_id = 101 
and ac.is_browsable = 'true' 
and la.attribute_id = ac.attribute_id 
and exists 
  (select 'x' from product_attribute_value pav,
   category_product cp 
   where pav.product_id = cp.product_id
 and pav.attribute_id = ac.attribute_id
 and pav.status_code is null
 and cp.category_id= '1001402'
 and cp.is_visible = 'true'
  ) 
   order by (ac.sort_order is null), 
 ac.sort_order, 
 la.name asc;


possibly the planner would have a better time
figuring out if any indexes are usable or estimating
the subselect rowcount

gnari



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Benchmarking Function

2006-05-21 Thread Josh Berkus
DW,

 The idea of the control is, of course, to subtract the overhead of  
 the benchmarking function from the code actually being tested. So I  
 guess my question is, how important is it to have the control there,  
 and, if it is important, how should it actually work?

Well, per our conversation the approach doesn't really work.   EXECUTE 
'string' + generate_series seems to carry a substantial and somewhat random 
overhead, between 100ms and 200ms -- enough to wipe out any differences 
between queries.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performs WAY better with enable_seqscan = off

2006-05-21 Thread Brendan Duddridge

is there some reason for the complicated form of the
join conditions in the subselect?



Yes, the simpler form query definitely works, but it's not always as  
fast as the index version with the complicated join syntax. Although  
even that query varies significantly with different category_id  
values. Not sure why. Sometimes it finishes in 150 ms, other times it  
takes over a second.


Here's the explain plan from your query:

explain analyze select ac.attribute_id, la.name, ac.sort_order from  
attribute_category ac, localized_attribute la where ac.category_id =  
1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and  
la.attribute_id = ac.attribute_id and exists (select 'x' from  
product_attribute_value pav, category_product cp where pav.product_id  
= cp.product_id and pav.attribute_id = ac.attribute_id and  
pav.status_code is null and cp.category_id= '1001402' and  
cp.is_visible = 'true') order by ac.sort_order, la.name asc;

   QUERY PLAN
 
 
---
Sort  (cost=6343.18..6343.20 rows=7 width=34) (actual  
time=2244.241..2244.242 rows=2 loops=1)

   Sort Key: ac.sort_order, la.name
   -  Nested Loop  (cost=2.00..6343.08 rows=7 width=34) (actual  
time=1831.970..2244.209 rows=2 loops=1)
 -  Index Scan using attribute_category__category_id_fk_idx  
on attribute_category ac  (cost=0.00..6321.95 rows=7 width=8) (actual  
time=1831.938..2244.142 rows=2 loops=1)

   Index Cond: (category_id = 1001402)
   Filter: (((is_browsable)::text = 'true'::text) AND  
(subplan))

   SubPlan
 -  Nested Loop  (cost=2.00..10458.04 rows=30  
width=0) (actual time=320.572..320.572 rows=0 loops=7)
   -  Index Scan using  
product_attribute_value__attribute_id_fk_idx on  
product_attribute_value pav  (cost=0.00..2661.39 rows=2572 width=4)  
(actual time=0.020..33.589 rows=18468 loops=7)

 Index Cond: (attribute_id = $0)
 Filter: (status_code IS NULL)
   -  Bitmap Heap Scan on category_product cp   
(cost=2.00..3.02 rows=1 width=4) (actual time=0.011..0.011 rows=0  
loops=129274)
 Recheck Cond: (outer.product_id =  
cp.product_id)
 Filter: ((category_id = 1001402) AND  
((is_visible)::text = 'true'::text))
 -  Bitmap Index Scan on  
x_category_product__product_id_fk_idx  (cost=0.00..2.00 rows=1  
width=0) (actual time=0.008..0.008 rows=1 loops=129274)
   Index Cond: (outer.product_id =  
cp.product_id)
 -  Bitmap Heap Scan on localized_attribute la   
(cost=2.00..3.01 rows=1 width=30) (actual time=0.019..0.019 rows=1  
loops=2)

   Recheck Cond: (la.attribute_id = outer.attribute_id)
   Filter: (locale_id = 101)
   -  Bitmap Index Scan on  
localized_attribute__attribute_id_fk_idx  (cost=0.00..2.00 rows=1  
width=0) (actual time=0.015..0.015 rows=1 loops=2)
 Index Cond: (la.attribute_id =  
outer.attribute_id)

Total runtime: 2244.542 ms


Here's the schema for the two tables involved with the sub-select:

 \d category_product;
 Table public.category_product
   Column|  Type  | Modifiers
-++---
category_id | integer| not null
product_id  | integer| not null
en_name_sort_order  | integer|
fr_name_sort_order  | integer|
merchant_sort_order | integer|
price_sort_order| integer|
merchant_count  | integer|
is_active   | character varying(5)   |
product_is_active   | character varying(5)   |
product_status_code | character varying(32)  |
product_name_en | character varying(512) |
product_name_fr | character varying(512) |
product_click_count | integer|
is_visible  | character varying(5)   |
is_pending_visible  | character varying(5)   |
min_price_cad   | numeric(12,4)  |
max_price_cad   | numeric(12,4)  |
Indexes:
x_category_product_pk PRIMARY KEY, btree (category_id,  
product_id)
category_product__cat_id_is_visible_idx btree  
(((category_id::text || '.'::text) || is_visible::text))
category_product__cat_id_prod_is_act_status_idx btree  
(category_id, product_is_active, product_status_code)
category_product__category_id_is_active_and_status_idx btree  
(category_id, product_is_active, product_status_code)

category_product__is_active_idx btree (is_active)

Re: [PERFORM] Benchmarking Function

2006-05-21 Thread David Wheeler

On May 21, 2006, at 12:23, Josh Berkus wrote:


Well, per our conversation the approach doesn't really work.   EXECUTE
'string' + generate_series seems to carry a substantial and  
somewhat random
overhead, between 100ms and 200ms -- enough to wipe out any  
differences

between queries.


Per our conversation I eliminated the EXECUTE 'string' +  
generate_series. Check it out.


  http://theory.kineticode.com/computers/databases/postgresql/ 
benchmarking_functions.html


(Temporary URL; justatheory.com seems to have disappeared from DNS...

Best,

David

---(end of broadcast)---
TIP 6: explain analyze is your friend