Re: [GENERAL] Index not being used unless enable_seqscan=false

2005-08-10 Thread Sven Willenberger
On Wed, 2005-08-10 at 12:01 -0700, Shane wrote:
 Hello all,
 
 I am working with a simple table and query abut cannot seem
 to get it to use the index I have created.  However, if I
 set enable_seqscan=false, the index is used and the query
 is much faster.  I have tried a vacuum analyze but to no
 avail.
 
 Table layout:
   Table public.seen
   Column  |  Type  | Modifiers
 --++---
  group_id | integer| not null
  msgid| text   | not null
  msgtime  | timestamp(0) without time zone | not null
 Indexes:
 seen_group_id_key unique, btree (group_id, msgid)
 seen_msgtime btree (msgtime)
 Foreign-key constraints:
 $1 FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON 
 DELETE CASCADE
 
 explain analyze with enable_seqscan=true
 explain analyze select msgid from seen where msgtime  cast(now() - interval 
 '6 months' as timestamp(0) without time zone);
  QUERY PLAN   
   
 -
  Seq Scan on seen  (cost=0.00..107879.45 rows=1081044 width=46) (actual 
 time=7597.387..27000.777 rows=28907 loops=1)
Filter: (msgtime  ((now() - '6 mons'::interval))::timestamp(0) without 
 time zone)
  Total runtime: 27096.337 ms
 (3 rows)
 
 Same query with enable_seqscan=false
   QUERY PLAN  
   
 --
  Index Scan using seen_msgtime on seen  (cost=0.00..3818325.78 rows=1081044 
 width=46) (actual time=0.140..156.222 rows=28907 loops=1)
Index Cond: (msgtime  ((now() - '6 mons'::interval))::timestamp(0) 
 without time zone)
  Total runtime: 248.737 ms
 (3 rows)
 
 Any ideas on how I can fix this.  I get this problem now
 and again with other databases but a vacuum usually fixes
 it.

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Sven


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Index not being used unless enable_seqscan=false

2005-08-10 Thread Ragnar HafstaĆ°
On Wed, 2005-08-10 at 12:01 -0700, Shane wrote:
 Hello all,
 
 I am working with a simple table and query abut cannot seem
 to get it to use the index I have created.  However, if I
 set enable_seqscan=false, the index is used and the query
 is much faster.  I have tried a vacuum analyze but to no
 avail.

[snip]
 explain analyze select msgid from seen where msgtime  cast(now() - interval 
 '6 months' as timestamp(0) without time zone);
  QUERY PLAN   
   
 -
  Seq Scan on seen  (cost=0.00..107879.45 rows=1081044 width=46) (actual 
 time=7597.387..27000.777 rows=28907 loops=1)
Filter: (msgtime  ((now() - '6 mons'::interval))::timestamp(0) without 
 time zone)
  Total runtime: 27096.337 ms
 (3 rows)

 Same query with enable_seqscan=false
[snip faster plan]

 
 Any ideas on how I can fix this.  I get this problem now
 and again with other databases but a vacuum usually fixes
 it.

The planner is not very good at estimating selectivity of
single unequalities. If you can specify a range in the
where clause, you might possibly have better luck.
...WHERE  msgtime  cast(now() - interval '6 months' 
   as timestamp(0) without time zone
 AND msgtime = '2000-01-01'

Also, you might want to try to increase the STATISTICS target
of msgtime.

Sometimes an ORDER BY clause can help the planner on choosing indexscan,
although in this case the difference in estimated cost is so high that
I doubt it.



gnari



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Index not being used unless enable_seqscan=false

2005-08-10 Thread Sven Willenberger
On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:
 On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:
  Right off the bat (if I am interpreting the results of your explain
  analyze correctly) it looks like the planner is basing its decision to
  seqscan as it thinks that it needs to filter over 1 million rows (versus
  the 29,000 rows that actually are pulled). Perhaps increasing stats on
  msgtime and then analyzing the table may help. Depending on your
  hardware, decreasing random_page_cost in your postgresql.conf just a
  touch may help too.
 
 Thanks for the pointers.
 
 I tried increasing the stats from the default of 10 to 25
 with no change.  How high would you bring it?  Also, I've
 never played with the various cost variables.  The database
 sits on a raid5 partition composed of 4 15k u320 SCSI
 drives, dual xeon 2.8(ht enabled) 2gb ram.  I suppose this
 might actually increase the cost of fetching a random disk
 page as it may well be on another physical disk and
 wouldn't be in the readahead cache.  Any idea as to what it
 should be on this sort of system?
 
 
 ---(end of broadcast)---

Try increasing stats to 100 on just the msgtime column, not the default
(changing the default will only have an effect on newly created columns
-- you may want to change the default back to 10):

ALTER TABLE seen ALTER msgtime SET STATISTICS 100;

After running that command, analyze the table again and see if that
helps.

I am assuming the culprit is this particular column as your index and
search criteria is based on that one. The default random_page_cost I
believe is 4.0; on your system you could probably easily drop it to 3,
possibly lower, and see how that performs.

Sven


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Index not being used unless enable_seqscan=false

2005-08-10 Thread Tom Lane
Shane [EMAIL PROTECTED] writes:
 I am working with a simple table and query abut cannot seem
 to get it to use the index I have created.
 ...
 explain analyze select msgid from seen where msgtime  cast(now() - interval 
 '6 months' as timestamp(0) without time zone);

As some other people already pointed out, the problem is the horrible
misestimate of the number of matching rows.  You did not say your
Postgres version, but I'm betting it's pre-8.0.  Versions before 8.0
would not assume that they could get any useful statistical info from
an expression involving now() (or in general, any non-immutable
function).  The default assumption in such cases is that a lot of
rows are retrieved --- too many for an indexscan.

If you cannot update to 8.0.* at the moment, a workaround is to do the
timestamp calculation on the client side so that you can send over
a query that's just a comparison to a constant:

... where msgtime  '2005-02-14 ...'::timestamp;

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster