Hi Matthew ,

Thanks very much for the analysis. It does takes 17 sec to execute when data is 
not in cache. I cannot use "distinct" as I have aggregate operators in select 
clause in original query. What I would like to ask can partitioning around 
workspaceid would help ? Or any sort of selective index would help me. 

Thanks.




________________________________
From: Matthew Wakeling <matt...@flymine.org>
To: niraj patel <npa...@gridsolv.com>
Cc: pgsql-performance@postgresql.org
Sent: Tue, 8 December, 2009 7:33:38 PM
Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.

On Tue, 8 Dec 2009, niraj patel wrote:
>  Group  (cost=509989.19..511518.30 rows=9 width=10) (actual 
> time=1783.102..2362.587
> rows=261 loops=1)
>    ->  Sort  (cost=509989.19..510753.74 rows=305821 width=10) (actual
> time=1783.097..2121.378 rows=272211 loops=1)
>          Sort Key: topfamilyid
>          ->  Bitmap Heap Scan on cmrules r  (cost=14501.36..476896.34 
> rows=305821
> width=10) (actual time=51.507..351.487 rows=272211 loops=1)
>                Recheck Cond: (workspaceid = 18512::numeric)
>                ->  Bitmap Index Scan on pk_ws_fea_fam_cmrules  
> (cost=0.00..14424.90
> rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
>                      Index Cond: (workspaceid = 18512::numeric)
>  Total runtime: 2373.008 ms
> (8 rows)

> select count(*) from  cmrules;
> 
> Gives me 17 643 532 Rows

Looks good from here. Think about what you're asking the database to do. It has 
to select 272211 rows out of a large table with 17643532 rows. That in itself 
could take a very long time. It is clear that in your EXPLAIN this data is 
already cached, otherwise it would have to perform nigh on 270000 seeks over 
the discs, which would take (depending on the disc system) something on the 
order of twenty minutes. Those 272211 rows then have to be sorted, which takes 
a couple of seconds, which again is pretty good. The rows are then uniqued, 
which is really quick, before returning the results.

It's hard to think how you would expect the database to do this any faster, 
really.

> Indexes:
>     "pk_ws_fea_fam_cmrules" PRIMARY KEY, btree (workspaceid, featureid, 
> topfamilyid,
> ruleenddate, gid)
>     "idx_cmrules" btree (topfamilyid)
>     "idx_gid_ws_cmrules" btree (gid, workspaceid)

You may perhaps benefit from an index on just the workspaceid column, but the 
benefit may be minor.

You may think of clustering the table on the index, but that will only be of 
benefit if the data is not in the cache.

The statistics seem to be pretty accurate, predicting 305821 instead of 272211 
rows. The database is not going to easily predict the number of unique results 
(9 instead of 261), but that doesn't affect the query plan much, so I wouldn't 
worry about it.

I would consider upgrading to Postgres 8.4 if possible, as it does have some 
considerable performance improvements, especially for bitmap index scans if you 
are using a RAID array. I'd also try using "SELECT DISTINCT" rather than "GROUP 
BY" and seeing if that helps.

Matthew

-- Now the reason people powdered their faces back then was to change the values
"s" and "n" in this equation here.                 - Computer science lecturer
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to