[PERFORM] index not used again

2006-03-31 Thread Jan Kesten

Hi folks!

I have just a issue again with unused indexes. I have a database with a
couple of tables and I have to do an sync job with them. For marking
which row has to be transfered I added a new column token (integer, I
will need some more tokens in near future) to every table.

Before determining wich rows to mark I first had a simple

update  set token=0;

Okay, this uses seq scan of course. For speeding things up, I created an
partial index on every table like this:

create index _idx_token on  using (token) where token=1;

After that I run vacuum analyse to update statistics and changed my to:

update  set token=0 where token=1;

I think this should be able to use my index, and indeed on one table
this works quite fine:

transfer=> explain analyse update ku set token=0 where token=1;

QUERY PLAN

 Index Scan using ku_idx_token on ku  (cost=0.00..1.01 rows=1
width=1871) (actual time=0.169..0.169 rows=0 loops=1)
   Index Cond: (token = 1)
 Total runtime: 3.816 ms
(3 rows)

But on most of the other tables a seq scan is still used:

transfer=> explain analyse update fak6 set token=0 where token=1;

QUERY PLAN

 Seq Scan on fak6  (cost=0.00..301618.71 rows=24217 width=1895) (actual
time=96987.417..127020.919 rows=24251 loops=1)
   Filter: (token = 1)
 Total runtime: 181828.281 ms
(3 rows)

So I tried to force using an index with setting enable_seqscan to off,
here are the results:

transfer=> set enable_seqscan to off;
SET
transfer=> explain analyse update fak6 set token=0 where token=1;

QUERY PLAN

 Index Scan using fak6_idx_token on fak6  (cost=0.00..301697.93
rows=24217 width=1895) (actual time=1271.273..1271.273 rows=0 loops=1)
   Index Cond: (token = 1)
 Total runtime: 1272.572 ms
(3 rows)

transfer=> set enable_seqscan to on;
SET
transfer=> explain analyse update fak6 set token=0 where token=1;

QUERY PLAN

 Seq Scan on fak6  (cost=0.00..301618.71 rows=24217 width=1895) (actual
time=93903.379..93903.379 rows=0 loops=1)
   Filter: (token = 1)
 Total runtime: 93904.679 ms
(3 rows)

transfer=> set enable_seqscan to off;
SET
transfer=> explain analyse update fak6 set token=0 where token=1;

QUERY PLAN

 Index Scan using fak6_idx_token on fak6  (cost=0.00..301697.93
rows=24217 width=1895) (actual time=223.721..223.721 rows=0 loops=1)
   Index Cond: (token = 1)
 Total runtime: 226.851 ms
(3 rows)

Now I'm a bit confused. The costs are nearly the same if using index or
not - but runtime is about 70 times faster? Any idea how I can fix this
issue - I thought a partial index would be the right way?

Cheers,
Jan



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] index not used again

2006-03-31 Thread Stephan Szabo
On Fri, 31 Mar 2006, Jan Kesten wrote:

>
> Hi folks!
>
> I have just a issue again with unused indexes. I have a database with a
> couple of tables and I have to do an sync job with them. For marking
> which row has to be transfered I added a new column token (integer, I
> will need some more tokens in near future) to every table.
>
> Before determining wich rows to mark I first had a simple
>
> update  set token=0;
>
> Okay, this uses seq scan of course. For speeding things up, I created an
> partial index on every table like this:
>
> create index _idx_token on  using (token) where token=1;
>
> After that I run vacuum analyse to update statistics and changed my to:
>
> update  set token=0 where token=1;
>
> I think this should be able to use my index, and indeed on one table
> this works quite fine:
>
> transfer=> explain analyse update ku set token=0 where token=1;
>
> QUERY PLAN
> 
>  Index Scan using ku_idx_token on ku  (cost=0.00..1.01 rows=1
> width=1871) (actual time=0.169..0.169 rows=0 loops=1)
>Index Cond: (token = 1)
>  Total runtime: 3.816 ms
> (3 rows)
>
> But on most of the other tables a seq scan is still used:
>
> transfer=> explain analyse update fak6 set token=0 where token=1;
>
> QUERY PLAN
> 
>  Seq Scan on fak6  (cost=0.00..301618.71 rows=24217 width=1895) (actual
> time=96987.417..127020.919 rows=24251 loops=1)
>Filter: (token = 1)
>  Total runtime: 181828.281 ms
> (3 rows)
>
> So I tried to force using an index with setting enable_seqscan to off,
> here are the results:
>
> transfer=> set enable_seqscan to off;
> SET
> transfer=> explain analyse update fak6 set token=0 where token=1;
>
> QUERY PLAN
> 
>  Index Scan using fak6_idx_token on fak6  (cost=0.00..301697.93
> rows=24217 width=1895) (actual time=1271.273..1271.273 rows=0 loops=1)
>Index Cond: (token = 1)
>  Total runtime: 1272.572 ms
> (3 rows)

Did you reset the table contents between these two (remember that explain
analyze actually runs the query)?  The second appears to be changing no
rows from the output.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] simple join uses indexes, very slow

2006-03-31 Thread Jim C. Nasby
On Wed, Mar 29, 2006 at 01:08:15AM -0500, stef wrote:
> 
> If your looking for suggestions, I would suggest updating the 8.1.x you 
> have installed to the latest version, as of typing this is 8.1.3 ;) Most 
> notable is some of the  -bug- fixes that are in since 8.1.0, for example;
> 
> * Fix incorrect optimizations of outer-join conditions (Tom)
> 
> You know, minor point releases aren't adding new features or changing 
> basic functionality, they are pure and simple bugfixes. If I was in 
> -your- position, I would run (don't walk ;) and install upto 8.1.3

More important, there are data loss bugfixes between 8.1.0 and 8.1.3.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Indexes with descending date columns

2006-03-31 Thread Jim C. Nasby
On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote:
> On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote:
> > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> > >an index in column descending and column ascending order - assuming
> > >an order by on more than one column with column order not 
> > >in the same direction and indexes existing? ... if that makes sense.
> > 
> > Yes.
> > 
> > stats=# explain select * from email_contrib order by project_id desc, id 
> > desc, date desc limit 10;
> >QUERY PLAN   
> > 
> > 
> >  Limit  (cost=0.00..31.76 rows=10 width=24)
> >->  Index Scan Backward using email_contrib_pkey on email_contrib  
> > (cost=0.00..427716532.18 rows=134656656 width=24)
> > (2 rows)
> 
> Not quite what I mean - redo the above as follows and then see what
> explain returns
> 
> explain select * from email_contrib order by project_id, id, date desc
> limit 10;

Ahh. There's a hack to do that by defining a new opclass that reverses <
and >, and then doing ORDER BY project_id, id, date USING new_opclass.

I think there's a TODO about this, but I'm not sure...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread Jim C. Nasby
What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
> Hi,
> 
> I have a query that is using a sequential scan instead of an index  
> scan. I've turned off sequential scans and it is in fact faster with  
> the index scan.
> 
> Here's my before and after.
> 
> Before:
> 
> ssdev=# SET enable_seqscan TO DEFAULT;
> ssdev=# explain analyze select cp.product_id
>   from category_product cp, product_attribute_value pav
>   where cp.category_id = 1001082 and cp.product_id = 
>   pav.product_id;
> 
> 
>QUERY PLAN
>  
>  
> --
> Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual  
> time=4.521..2580.520 rows=19695 loops=1)
>Hash Cond: ("outer".product_id = "inner".product_id)
>->  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12  
> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1)
>->  Hash  (cost=23.10..23.10 rows=970 width=4) (actual  
> time=2.267..2.267 rows=1140 loops=1)
>  ->  Index Scan using x_category_product__category_id_fk_idx  
> on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual  
> time=0.122..1.395 rows=1140 loops=1)
>Index Cond: (category_id = 1001082)
> Total runtime: 2584.221 ms
> (7 rows)
> 
> 
> After:
> 
> ssdev=# SET enable_seqscan TO false;
> ssdev=# explain analyze select cp.product_id
>   from category_product cp, product_attribute_value pav
>   where cp.category_id = 1001082 and cp.product_id = 
>   pav.product_id;
> 
> 
>   QUERY PLAN
>  
>  
> -
> Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual  
> time=0.373..71.177 rows=19695 loops=1)
>->  Index Scan using x_category_product__category_id_fk_idx on  
> category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual  
> time=0.129..1.438 rows=1140 loops=1)
>  Index Cond: (category_id = 1001082)
>->  Index Scan using product_attribute_value__product_id_fk_idx  
> on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)  
> (actual time=0.016..0.053 rows=17 loops=1140)
>  Index Cond: ("outer".product_id = pav.product_id)
> Total runtime: 74.747 ms
> (6 rows)
> 
> There's quite a big difference in speed there. 2584.221 ms vs. 74.747  
> ms.
> 
> Any ideas what I can do to improve this without turning sequential  
> scanning off?
> 
> Thanks,
> 
> 
> Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]
> 
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
> 
> http://www.clickspace.com
> 



-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] un-'vacuum analyse'

2006-03-31 Thread Frederic Back
Hello,

I would like to know how my application works before
and after data from VACUUM ANALYSE is available.

Is there a way to de-'vacuum analyse' a database for
testing purposes?

Thank you,
Fred

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] un-'vacuum analyse'

2006-03-31 Thread Tom Lane
Frederic Back <[EMAIL PROTECTED]> writes:
> Is there a way to de-'vacuum analyse' a database for
> testing purposes?

"DELETE FROM pg_statistic" will get you most of the way there.
It doesn't get rid of the accurate relpages/reltuples entries
in pg_class, but since CREATE INDEX also updates those counts,
I think it's reasonable to consider that a freshly loaded database
would normally have correct counts.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb

2006-03-31 Thread Antoine
This is a blatant thread steal... but here we go...
Do people have any opinions on the pgsql driver? How does it compare
with the odbc in terms of performance? Is it fully production ready?
The boss wants to go .net (instead of Java, which is my preference...)
- will I have to spend my time defending postgres against
mysql/postgres/sqlserver?
Cheers
Antoine
ps. I try my best not to steal threads but sometimes... :-)

On 30/03/06, Dave Dutcher <[EMAIL PROTECTED]> wrote:
> I use Npgsql, and the connection string I use is real simple:
>
> Server=192.168.0.36;Database=mydb;User Id=myuserid;Password=123456
>
> Hope that helps,
>
> Dave
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> [mailto:pgsql-performance-
> > [EMAIL PROTECTED] On Behalf Of Greg Quinn
> > Sent: Wednesday, March 29, 2006 11:57 PM
> > To: pgsql-performance@postgresql.org
> > Subject: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb
> >
> > This problem was caused by the OleDb driver. I used a 3rd party .NET
> > provider and it worked, 8000 rows in just over 100ms!
> >
> > Can somebody send me a sample connection string for the PostGreSql
> native
> > .net driver please? I'm battling to find a valid connection string.
> >
> > Thanks
> >
> >
> >
> > ---(end of
> broadcast)---
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >subscribe-nomail command to [EMAIL PROTECTED] so that
> your
> >message can get through to the mailing list cleanly
>
>
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
>


--
This is where I should put some witty comment.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb

2006-03-31 Thread Magnus Hagander
> This is a blatant thread steal... but here we go...
> Do people have any opinions on the pgsql driver?

It's very nice.

> How does it compare with the odbc in terms of performance?

I haven't measured specifically, but if you're tlaking .net it should be
better. It's all in managed code, so you won't pay the repeated penalty
of switching down to unmanaged and back all the time (the
.net-ODBC-bridge is known not to be very fast). As a bonus your program
will run in an environment where the CAS policy prevents native code. 
And I've never had any performance problems with it myself.


> Is it fully production ready?

I beleive so. I've been using it for a long time with zero problems.
While I don't use many of the exotic features in it, I doubt most people
do ;-) Don't get scared by the claim it's in beta - IIRC there's an RC
out any day now, and it's been stable long before 1.0. But it's always a
good idea to browse through the list of known bugs and see if one will
likely hit you...


//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread Brendan Duddridge

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?
The category_id is part of a compound primary key in the  
category_product

table. The primary key on category_product is (category_id, product_id).

Here's the definitions of the two tables involved in the join:

Table "public.category_product"
   Column| Type | Modifiers
-+--+---
category_id | integer  | not null
product_id  | integer  | not null
en_name_sort_order  | integer  |
fr_name_sort_order  | integer  |
merchant_sort_order | integer  |
price_sort_order| integer  |
merchant_count  | integer  |
is_active   | character varying(5) |
Indexes:
"x_category_product_pk" PRIMARY KEY, btree (category_id,  
product_id)

"category_product__is_active_idx" btree (is_active)
"category_product__merchant_sort_order_idx" btree  
(merchant_sort_order)
"x_category_product__category_id_fk_idx" btree (category_id)  
CLUSTER

"x_category_product__product_id_fk_idx" btree (product_id)
Foreign-key constraints:
"x_category_product_category_fk" FOREIGN KEY (category_id)  
REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED
"x_category_product_product_fk" FOREIGN KEY (product_id)  
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED




 Table "public.product_attribute_value"
   Column   | Type  | Modifiers
+---+---
attribute_id   | integer   | not null
attribute_unit_id  | integer   |
attribute_value_id | integer   |
boolean_value  | character varying(5)  |
decimal_value  | numeric(30,10)|
product_attribute_value_id | integer   | not null
product_id | integer   | not null
product_reference_id   | integer   |
status_code| character varying(32) |
Indexes:
"product_attribute_value_pk" PRIMARY KEY, btree  
(product_attribute_value_id)

"product_attribute_value__attribute_id_fk_idx" btree (attribute_id)
"product_attribute_value__attribute_unit_id_fk_idx" btree  
(attribute_unit_id)
"product_attribute_value__attribute_value_id_fk_idx" btree  
(attribute_value_id)

"product_attribute_value__product_id_fk_idx" btree (product_id)
"product_attribute_value__product_reference_id_fk_idx" btree  
(product_reference_id)

Foreign-key constraints:
"product_attribute_value_attribute_fk" FOREIGN KEY  
(attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE  
INITIALLY DEFERRED
"product_attribute_value_attributeunit_fk" FOREIGN KEY  
(attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id)  
DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_attributevalue_fk" FOREIGN KEY  
(attribute_value_id) REFERENCES attribute_value(attribute_value_id)  
DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_product_fk" FOREIGN KEY (product_id)  
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_productreference_fk" FOREIGN KEY  
(product_reference_id) REFERENCES product(product_id) DEFERRABLE  
INITIALLY DEFERRED



Not sure if that helps answer your question, but the query is pretty  
slow. Sometimes it takes 5 - 15 seconds depending on the category_id  
specified.


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:


What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:

Hi,

I have a query that is using a sequential scan instead of an index
scan. I've turned off sequential scans and it is in fact faster with
the index scan.

Here's my before and after.

Before:

ssdev=# SET enable_seqscan TO DEFAULT;
ssdev=# explain analyze select cp.product_id
from category_product cp, product_attribute_value pav
where cp.category_id = 1001082 and cp.product_id =
pav.product_id;


   QUERY PLAN
- 
---
- 
---

--
Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
time=4.521..2580.520 rows=19695 loops=1)
   Hash Cond: ("outer".product_id = "inner".product_id)
   ->  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12
rows=2387312 width=4) (act

Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread chris smith
On 4/1/06, Brendan Duddridge <[EMAIL PROTECTED]> wrote:
> Hi Jim,
>
> I'm not quite sure what you mean by the correlation of category_id?

It means how many distinct values does it have (at least that's my
understanding of it ;) ).

select category_id, count(*) from category_product group by category_id;

will show you how many category_id's there are and how many products
are in each category.

Having a lot of products in one category (or having a small amount of
categories) can slow things down because the db can't use the index
effectively.. which might be what you're seeing (hence why it's fast
for some categories, slow for others).


> On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:
>
> > What's the correlation of category_id? The current index scan cost
> > estimator places a heavy penalty on anything with a correlation much
> > below about 90%.
> >
> > On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
> >> Hi,
> >>
> >> I have a query that is using a sequential scan instead of an index
> >> scan. I've turned off sequential scans and it is in fact faster with
> >> the index scan.
> >>
> >> Here's my before and after.
> >>
> >> Before:
> >>
> >> ssdev=# SET enable_seqscan TO DEFAULT;
> >> ssdev=# explain analyze select cp.product_id
> >>  from category_product cp, product_attribute_value pav
> >>  where cp.category_id = 1001082 and cp.product_id =
> >>  pav.product_id;
> >>
> >>
> >>QUERY PLAN
> >> -
> >> ---
> >> -
> >> ---
> >> --
> >> Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
> >> time=4.521..2580.520 rows=19695 loops=1)
> >>Hash Cond: ("outer".product_id = "inner".product_id)
> >>->  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12
> >> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
> >> loops=1)
> >>->  Hash  (cost=23.10..23.10 rows=970 width=4) (actual
> >> time=2.267..2.267 rows=1140 loops=1)
> >>  ->  Index Scan using x_category_product__category_id_fk_idx
> >> on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
> >> time=0.122..1.395 rows=1140 loops=1)
> >>Index Cond: (category_id = 1001082)
> >> Total runtime: 2584.221 ms
> >> (7 rows)
> >>
> >>
> >> After:
> >>
> >> ssdev=# SET enable_seqscan TO false;
> >> ssdev=# explain analyze select cp.product_id
> >>  from category_product cp, product_attribute_value pav
> >>  where cp.category_id = 1001082 and cp.product_id =
> >>  pav.product_id;
> >>
> >>
> >>   QUERY PLAN
> >> -
> >> ---
> >> -
> >> ---
> >> -
> >> Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual
> >> time=0.373..71.177 rows=19695 loops=1)
> >>->  Index Scan using x_category_product__category_id_fk_idx on
> >> category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
> >> time=0.129..1.438 rows=1140 loops=1)
> >>  Index Cond: (category_id = 1001082)
> >>->  Index Scan using product_attribute_value__product_id_fk_idx
> >> on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)
> >> (actual time=0.016..0.053 rows=17 loops=1140)
> >>  Index Cond: ("outer".product_id = pav.product_id)
> >> Total runtime: 74.747 ms
> >> (6 rows)
> >>
> >> There's quite a big difference in speed there. 2584.221 ms vs. 74.747
> >> ms.
> >>
> >> Any ideas what I can do to improve this without turning sequential
> >> scanning off?
> >>
> >> Thanks,
> >>
> >> 
> >> Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]
> >>
> >> ClickSpace Interactive Inc.
> >> Suite L100, 239 - 10th Ave. SE
> >> Calgary, AB  T2G 0V9
> >>
> >> http://www.clickspace.com
> >>
> >
> >
> >
> > --
> > Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
> > Pervasive Software  http://pervasive.comwork: 512-231-6117
> > vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
> >
> > ---(end of
> > broadcast)---
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >subscribe-nomail command to [EMAIL PROTECTED] so that
> > your
> >message can get through to the mailing list cleanly
> >
>
>
>
>


--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread Brendan Duddridge

Ah I see. Ok, well we have a very wide variety here...

category_id | count
-+---
 1000521 | 31145
 1001211 | 22991
 1001490 | 22019
 1001628 | 12472
 146 | 10480
 187 | 10338
 1001223 | 10020
 1001560 |  9532
 1000954 |  8633
 1001314 |  8191
 1001482 |  8140
 1001556 |  7959
 1001481 |  7850
[snip...]
 1001133 | 1
 1000532 | 1
 1000691 | 1
 1000817 | 1
 1000783 | 1
 1000689 | 1

(1157 rows)

So what's the best kind of query to handle this kind of data to make  
it fast in all cases? I'd like get down to sub-second response times.


currently we have:

select cp.product_id
 from category_product cp, product_attribute_value pav
 where cp.category_id = 1001082 and cp.product_id =
 pav.product_id;


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 6:23 PM, chris smith wrote:


On 4/1/06, Brendan Duddridge <[EMAIL PROTECTED]> wrote:

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?


It means how many distinct values does it have (at least that's my
understanding of it ;) ).

select category_id, count(*) from category_product group by  
category_id;


will show you how many category_id's there are and how many products
are in each category.

Having a lot of products in one category (or having a small amount of
categories) can slow things down because the db can't use the index
effectively.. which might be what you're seeing (hence why it's fast
for some categories, slow for others).



On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:


What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:

Hi,

I have a query that is using a sequential scan instead of an index
scan. I've turned off sequential scans and it is in fact faster  
with

the index scan.

Here's my before and after.

Before:

ssdev=# SET enable_seqscan TO DEFAULT;
ssdev=# explain analyze select cp.product_id
 from category_product cp, product_attribute_value pav
 where cp.category_id = 1001082 and cp.product_id =
 pav.product_id;


   QUERY PLAN
--- 
--

---
--- 
--

---
--
Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
time=4.521..2580.520 rows=19695 loops=1)
   Hash Cond: ("outer".product_id = "inner".product_id)
   ->  Seq Scan on product_attribute_value pav   
(cost=0.00..40127.12

rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
loops=1)
   ->  Hash  (cost=23.10..23.10 rows=970 width=4) (actual
time=2.267..2.267 rows=1140 loops=1)
 ->  Index Scan using  
x_category_product__category_id_fk_idx

on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
time=0.122..1.395 rows=1140 loops=1)
   Index Cond: (category_id = 1001082)
Total runtime: 2584.221 ms
(7 rows)


After:

ssdev=# SET enable_seqscan TO false;
ssdev=# explain analyze select cp.product_id
 from category_product cp, product_attribute_value pav
 where cp.category_id = 1001082 and cp.product_id =
 pav.product_id;


  QUERY PLAN
--- 
--

---
--- 
--

---
-
Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual
time=0.373..71.177 rows=19695 loops=1)
   ->  Index Scan using x_category_product__category_id_fk_idx on
category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
time=0.129..1.438 rows=1140 loops=1)
 Index Cond: (category_id = 1001082)
   ->  Index Scan using product_attribute_value__product_id_fk_idx
on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)
(actual time=0.016..0.053 rows=17 loops=1140)
 Index Cond: ("outer".product_id = pav.product_id)
Total runtime: 74.747 ms
(6 rows)

There's quite a big difference in speed there. 2584.221 ms vs.  
74.747

ms.

Any ideas what I can do to improve this without turning sequential
scanning off?

Thanks,

___ 
_
Brendan Duddridge | CTO | 403-277-5591 x24 |   
[EMAIL PROTECTED]


ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com





--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comw