Re: [PERFORM] Performs WAY better with enable_seqscan = off

2006-05-22 Thread Jim C. Nasby
The problem is that the planner is guessing horribly at what the nodes
will return, and I'm betting the reason for that is your join criteria.
Why are you joining on fields that are concatenated together, instead of
just joining on the fields themselves? That's a sure-fire way to confuse
the planner, and greatly limit your options.

On Sun, May 21, 2006 at 02:21:55AM -0600, Brendan Duddridge wrote:
> Hi,
> 
> I have a query that performs WAY better when I have enable_seqscan =  
> off:
> 
> explain analyze select ac.attribute_id, la.name, ac.sort_order from  
> attribute_category ac, localized_attribute la where ac.category_id =  
> 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and  
> la.attribute_id = ac.attribute_id and exists ( select 'x' from  
> product_attribute_value pav, category_product cp where  
> (pav.product_id || '.' || pav.attribute_id) = (cp.product_id || '.'  
> || ac.attribute_id) and pav.status_code is null and (cp.category_id  
> || '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is  
> null), ac.sort_order, la.name asc;
> 
>QUERY PLAN
>  
>  
>  
> --
> Sort  (cost=47.97..47.98 rows=7 width=34) (actual  
> time=33368.721..33368.721 rows=2 loops=1)
>Sort Key: (ac.sort_order IS NULL), ac.sort_order, la.name
>->  Nested Loop  (cost=2.00..47.87 rows=7 width=34) (actual  
> time=13563.049..33368.679 rows=2 loops=1)
>  ->  Index Scan using attribute_category__category_id_fk_idx  
> on attribute_category ac  (cost=0.00..26.73 rows=7 width=8) (actual  
> time=13562.918..33368.370 rows=2 loops=1)
>Index Cond: (category_id = 1001402)
>Filter: (((is_browsable)::text = 'true'::text) AND  
> (subplan))
>SubPlan
>  ->  Nested Loop  (cost=0.02..278217503.21  
> rows=354763400 width=0) (actual time=4766.821..4766.821 rows=0 loops=7)
>->  Seq Scan on category_product cp   
> (cost=0.00..158150.26 rows=18807 width=4) (actual  
> time=113.595..4585.461 rows=12363 loops=7)
>  Filter: category_id)::text ||  
> '.'::text) || (is_visible)::text) = '1001402.true'::text)
>->  Index Scan using  
> product_attribute_value__prod_id_att_id_status_is_null_ids on  
> product_attribute_value pav  (cost=0.02..14171.84 rows=18863 width=8)  
> (actual time=0.012..0.012 rows=0 loops=86538)
>  Index Cond: pav.product_id)::text  
> || '.'::text) || (pav.attribute_id)::text) =  
> ((("outer".product_id)::text || '.'::text) || ($0)::text))
>  ->  Bitmap Heap Scan on localized_attribute la   
> (cost=2.00..3.01 rows=1 width=30) (actual time=0.129..0.129 rows=1  
> loops=2)
>Recheck Cond: (la.attribute_id = "outer".attribute_id)
>Filter: (locale_id = 101)
>->  Bitmap Index Scan on  
> localized_attribute__attribute_id_fk_idx  (cost=0.00..2.00 rows=1  
> width=0) (actual time=0.091..0.091 rows=1 loops=2)
>  Index Cond: (la.attribute_id =  
> "outer".attribute_id)
> Total runtime: 33369.105 ms
> 
> Now when I disable sequential scans:
> 
> set enable_seqscan = off;
> 
> explain analyze select ac.attribute_id, la.name, ac.sort_order from  
> attribute_category ac, localized_attribute la where ac.category_id =  
> 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and  
> la.attribute_id = ac.attribute_id and exists ( select 'x' from  
> product_attribute_value pav, category_product cp where  
> (pav.product_id || '.' || pav.attribute_id) = (cp.product_id || '.'  
> || ac.attribute_id) and pav.status_code is null and (cp.category_id  
> || '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is  
> null), ac.sort_order, la.name asc;
> 
>QUERY PLAN
>  
>  
>  
> --
> Sort  (cost=48.09..48.11 rows=7 width=34) (actual  
> time=1675.944..1675.945 rows=2 loops=1)
>Sort Key: (ac.sort_order IS NULL), ac.sort_order, la.name
>->  Nested Loop  (cost=2.00..48.00 rows=7 width=34) (actual  
> time=687.600..1675.831 rows=2 loops=1)
>  ->  Index Scan using attribute_category__category_id_fk_idx  
> on attribute_category ac  (cost=0.00..26.86 rows=7 width=8) (actual  
> time=687.441..1675.584 rows=2 loops=1)
>Index Cond: (category_id = 1001402)
> 

Re: [PERFORM] Performs WAY better with enable_seqscan = off

2006-05-22 Thread Jim C. Nasby
On Sun, May 21, 2006 at 02:01:14PM -0600, Brendan Duddridge wrote:
> When the query planner uses the indexes with the concatenated values  
> and the where clause, the query can be sub-second response times (but  
> not always depending on the category_id value). By just doing a  
> regular join as you suggested, it's always slower. The trick is  
> getting Postgres to use the proper index all the time. And so far the  
> only way I can do that is by turning off sequential scans, but that's  
> something I didn't want to do because I don't know how it would  
> affect the performance of the rest of my application.
 
You can always disable them for just that query...
BEGIN;
SET LOCAL enable_seqscan=off;
SELECT ...
COMMIT;

> Just a note, I have random_page_cost set to 1 to try and get it to  
> favour index scans. The database machine has 8GB of RAM and I have  
> effective_cache_size set to 2/3 of that.

That's rather low for that much memory; I'd set it to 7GB.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performs WAY better with enable_seqscan = off

2006-05-21 Thread Brendan Duddridge

is there some reason for the complicated form of the
join conditions in the subselect?



Yes, the simpler form query definitely works, but it's not always as  
fast as the index version with the complicated join syntax. Although  
even that query varies significantly with different category_id  
values. Not sure why. Sometimes it finishes in 150 ms, other times it  
takes over a second.


Here's the explain plan from your query:

explain analyze select ac.attribute_id, la.name, ac.sort_order from  
attribute_category ac, localized_attribute la where ac.category_id =  
1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and  
la.attribute_id = ac.attribute_id and exists (select 'x' from  
product_attribute_value pav, category_product cp where pav.product_id  
= cp.product_id and pav.attribute_id = ac.attribute_id and  
pav.status_code is null and cp.category_id= '1001402' and  
cp.is_visible = 'true') order by ac.sort_order, la.name asc;

   QUERY PLAN
 
 
---
Sort  (cost=6343.18..6343.20 rows=7 width=34) (actual  
time=2244.241..2244.242 rows=2 loops=1)

   Sort Key: ac.sort_order, la.name
   ->  Nested Loop  (cost=2.00..6343.08 rows=7 width=34) (actual  
time=1831.970..2244.209 rows=2 loops=1)
 ->  Index Scan using attribute_category__category_id_fk_idx  
on attribute_category ac  (cost=0.00..6321.95 rows=7 width=8) (actual  
time=1831.938..2244.142 rows=2 loops=1)

   Index Cond: (category_id = 1001402)
   Filter: (((is_browsable)::text = 'true'::text) AND  
(subplan))

   SubPlan
 ->  Nested Loop  (cost=2.00..10458.04 rows=30  
width=0) (actual time=320.572..320.572 rows=0 loops=7)
   ->  Index Scan using  
product_attribute_value__attribute_id_fk_idx on  
product_attribute_value pav  (cost=0.00..2661.39 rows=2572 width=4)  
(actual time=0.020..33.589 rows=18468 loops=7)

 Index Cond: (attribute_id = $0)
 Filter: (status_code IS NULL)
   ->  Bitmap Heap Scan on category_product cp   
(cost=2.00..3.02 rows=1 width=4) (actual time=0.011..0.011 rows=0  
loops=129274)
 Recheck Cond: ("outer".product_id =  
cp.product_id)
 Filter: ((category_id = 1001402) AND  
((is_visible)::text = 'true'::text))
 ->  Bitmap Index Scan on  
x_category_product__product_id_fk_idx  (cost=0.00..2.00 rows=1  
width=0) (actual time=0.008..0.008 rows=1 loops=129274)
   Index Cond: ("outer".product_id =  
cp.product_id)
 ->  Bitmap Heap Scan on localized_attribute la   
(cost=2.00..3.01 rows=1 width=30) (actual time=0.019..0.019 rows=1  
loops=2)

   Recheck Cond: (la.attribute_id = "outer".attribute_id)
   Filter: (locale_id = 101)
   ->  Bitmap Index Scan on  
localized_attribute__attribute_id_fk_idx  (cost=0.00..2.00 rows=1  
width=0) (actual time=0.015..0.015 rows=1 loops=2)
 Index Cond: (la.attribute_id =  
"outer".attribute_id)

Total runtime: 2244.542 ms


Here's the schema for the two tables involved with the sub-select:

 \d category_product;
 Table "public.category_product"
   Column|  Type  | Modifiers
-++---
category_id | integer| not null
product_id  | integer| not null
en_name_sort_order  | integer|
fr_name_sort_order  | integer|
merchant_sort_order | integer|
price_sort_order| integer|
merchant_count  | integer|
is_active   | character varying(5)   |
product_is_active   | character varying(5)   |
product_status_code | character varying(32)  |
product_name_en | character varying(512) |
product_name_fr | character varying(512) |
product_click_count | integer|
is_visible  | character varying(5)   |
is_pending_visible  | character varying(5)   |
min_price_cad   | numeric(12,4)  |
max_price_cad   | numeric(12,4)  |
Indexes:
"x_category_product_pk" PRIMARY KEY, btree (category_id,  
product_id)
"category_product__cat_id_is_visible_idx" btree  
(((category_id::text || '.'::text) || is_visible::text))
"category_product__cat_id_prod_is_act_status_idx" btree  
(category_id, product_is_active, product_status_code)
"category_product__category_id_is_active_and_status_idx" btree  
(category_id, product_is_active, product_status_code)

"category_product__is_active_idx" btree (is_activ

Re: [PERFORM] Performs WAY better with enable_seqscan = off

2006-05-21 Thread Ragnar
On sun, 2006-05-21 at 02:21 -0600, Brendan Duddridge wrote:
> Hi,
> 
> 
> I have a query that performs WAY better when I have enable_seqscan =
> off:
> 
> 
> explain analyze select ac.attribute_id, la.name, ac.sort_order from
> attribute_category ac, localized_attribute la where ac.category_id =
> 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and
> la.attribute_id = ac.attribute_id and exists ( select 'x' from
> product_attribute_value pav, category_product cp where (pav.product_id
> || '.' || pav.attribute_id) = (cp.product_id || '.' ||
> ac.attribute_id) and pav.status_code is null and (cp.category_id ||
> '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is
> null), ac.sort_order, la.name asc;

is there some reason for the complicated form of the
join conditions in the subselect?

would this not be clearer:

explain analyze 
  select ac.attribute_id,
 la.name, 
 ac.sort_order
  from attribute_category ac,
   localized_attribute la
  where ac.category_id = 1001402 
and la.locale_id = 101 
and ac.is_browsable = 'true' 
and la.attribute_id = ac.attribute_id 
and exists 
  (select 'x' from product_attribute_value pav,
   category_product cp 
   where pav.product_id = cp.product_id
 and pav.attribute_id = ac.attribute_id
 and pav.status_code is null
 and cp.category_id= '1001402'
 and cp.is_visible = 'true'
  ) 
   order by (ac.sort_order is null), 
 ac.sort_order, 
 la.name asc;


possibly the planner would have a better time
figuring out if any indexes are usable or estimating
the subselect rowcount

gnari



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings