Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Harmon S. Nine




THAT WAS IT!!

Thank you very much.
Is there a way to change the type of "CURRENT_TIMESTAMP" to "timestamp
without time zone" so that casting isn't needed?


BTW, isn't this a bug?

-- Harmon


Stephan Szabo wrote:

  On Mon, 26 Jul 2004, Harmon S. Nine wrote:

  
  
However, we can't get the planner to do an timestamp-based index scan.

Anyone know what to do?

  
  
I'd wonder if the type conversion is causing you problems.
CURRENT_TIMESTAMP - INTERVAL '10 minutes' is a timestamp with time zone
while the column is timestamp without time zone.  Casting
CURRENT_TIMESTAMP to timestamp without time zone seemed to make it able to
choose an index scan on 7.4.

  






Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Harmon S. Nine
We were getting a little desperate, so we engaged in overkill to rule 
out lack-of-analyze as a cause for the slow queries.

Thanks for your advice :)
-- Harmon
Matthew T. O'Connor wrote:
VACUUM FULL ANALYZE every 3 hours seems a little severe.  You will 
probably be be served just as well by VACUUM ANALYZE.  But you 
probably don't need the VACUUM part most of the time.   You might try 
doing an ANALYZE on the specific tables you are having issues with.  
Since ANALYZE should be much quicker and not have the performance 
impact of a VACUUM, you could do it every hour, or even every 15 minutes.

Good luck...
Harmon S. Nine wrote:
Hello --
To increase query (i.e. select) performance, we're trying to get 
postgres to use an index based on a timestamp column in a given table.

Event-based data is put into this table several times a minute, with 
the timestamp indicating when a particular row was placed in the table.

The table is purged daily, retaining only the rows that are less than 
7 days old.  That is, any row within the table is less than 1 week 
old (+ 1 day, since the purge is daily).

A typical number of rows in the table is around 400,000.
A "VACUUM FULL ANALYZE"  is performed every 3 hours.
The problem:
We often query the table to extract those rows that are, say, 10 
minutes old or less.

Given there are 10080 minutes per week, the planner could, properly 
configured, estimate the number of rows returned by such a query to be:

10 min/ 10080 min  *  400,000 = 0.001 * 400,000 = 400.
Making an index scan, with the timestamp field the index, far faster 
then a sequential scan.

However, we can't get the planner to do an timestamp-based index scan.
Anyone know what to do?
Here's the table specs:
monitor=# \d "eventtable"
   Table "public.eventtable"
 Column   |Type |  
Modifiers
---+-+-- 

timestamp | timestamp without time zone | not null default 
('now'::text)::timestamp(6) with time zone
key   | bigint  | not null default 
nextval('public."eventtable_key_seq"'::text)
propagate | boolean |
facility  | character(10)   |
priority  | character(10)   |
host  | character varying(128)  | not null
message   | text| not null
Indexes:
   "eventtable_pkey" primary key, btree ("timestamp", "key")
   "eventtable_host" btree (host)
   "eventtable_timestamp" btree ("timestamp")

Here's a query (with "explain analyze"):
monitor=# explain analyze select * from "eventtable" where timestamp 
> CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
 

Seq Scan on "eventtable"  (cost=0.00..19009.97 rows=136444 width=155) 
(actual time=11071.073..11432.522 rows=821 loops=1)
  Filter: (("timestamp")::timestamp with time zone > 
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 11433.384 ms
(3 rows)

Here's something strange.  We try to disable sequential scans, but to 
no avail.  The estimated cost skyrockets, though:

monitor=# set enable_seqscan = false;
SET
monitor=# explain analyze select * from "eventtable" where timestamp 
> CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
- 

Seq Scan on "eventtable"  (cost=1.00..100019009.97 
rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763 
loops=1)
  Filter: (("timestamp")::timestamp with time zone > 
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 9934.353 ms
(3 rows)

monitor=# set enable_seqscan = true;
SET
monitor=#

Any help is greatly appreciated :)
-- Harmon

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Harmon S. Nine
Thank you for your response :)
This improves the row estimation, but it is still using a sequential scan.
It really seems like the query would go faster if an index scan was 
used, given the number of rows fetched (both estimated and actual) is 
significantly less than the number of rows in the table.

Is there some way to get the planner to use the timestamp as an index on 
these queries?

monitor=# explain analyze select * from "eventtable" where timestamp 
between (CURRENT_TIMESTAMP - INTERVAL '10 min') AND CURRENT_TIMESTAMP;
   
QUERY PLAN
---
Seq Scan on "eventtable"  (cost=0.00..23103.29 rows=2047 width=155) 
(actual time=10227.253..10276.944 rows=1662 loops=1)
  Filter: ((("timestamp")::timestamp with time zone >= 
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) 
AND (("timestamp")::timestamp with time zone <= 
('now'::text)::timestamp(6) with time zone))
Total runtime: 10278.628 ms
(3 rows)

monitor=# SELECT COUNT(*) FROM "eventtable";
count

425602
(1 row)
monitor=#
-- Harmon
Kevin Barnard wrote:

Harmon S. Nine wrote:
monitor=# explain analyze select * from "eventtable" where timestamp 
> CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN

Try
SELECT * FROM eventtable where timestamp BETWEEN  (CURRENT_TIMESTAMP - 
INTERVAL '10 minutes') AND CURRENT_TIMESTAMP;

This should will use a range off valid times.  What your query is 
doing is looking for 10 minutes ago to an infinate future.  Statically 
speaking that should encompass most of the table because you have an 
infinate range.  No index will be used.  If you assign a range the 
planner can fiqure out what you are looking for.


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Timestamp-based indexing

2004-07-26 Thread Harmon S. Nine
Hello --
To increase query (i.e. select) performance, we're trying to get 
postgres to use an index based on a timestamp column in a given table.

Event-based data is put into this table several times a minute, with the 
timestamp indicating when a particular row was placed in the table.

The table is purged daily, retaining only the rows that are less than 7 
days old.  That is, any row within the table is less than 1 week old (+ 
1 day, since the purge is daily).

A typical number of rows in the table is around 400,000.
A "VACUUM FULL ANALYZE"  is performed every 3 hours.
The problem:
We often query the table to extract those rows that are, say, 10 minutes 
old or less.

Given there are 10080 minutes per week, the planner could, properly 
configured, estimate the number of rows returned by such a query to be:

10 min/ 10080 min  *  400,000 = 0.001 * 400,000 = 400.
Making an index scan, with the timestamp field the index, far faster 
then a sequential scan.

However, we can't get the planner to do an timestamp-based index scan.
Anyone know what to do?
Here's the table specs:
monitor=# \d "eventtable"
   Table "public.eventtable"
 Column   |Type |  
Modifiers
---+-+--
timestamp | timestamp without time zone | not null default 
('now'::text)::timestamp(6) with time zone
key   | bigint  | not null default 
nextval('public."eventtable_key_seq"'::text)
propagate | boolean |
facility  | character(10)   |
priority  | character(10)   |
host  | character varying(128)  | not null
message   | text| not null
Indexes:
   "eventtable_pkey" primary key, btree ("timestamp", "key")
   "eventtable_host" btree (host)
   "eventtable_timestamp" btree ("timestamp")

Here's a query (with "explain analyze"):
monitor=# explain analyze select * from "eventtable" where timestamp > 
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN

Seq Scan on "eventtable"  (cost=0.00..19009.97 rows=136444 width=155) 
(actual time=11071.073..11432.522 rows=821 loops=1)
  Filter: (("timestamp")::timestamp with time zone > 
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 11433.384 ms
(3 rows)

Here's something strange.  We try to disable sequential scans, but to no 
avail.  The estimated cost skyrockets, though:

monitor=# set enable_seqscan = false;
SET
monitor=# explain analyze select * from "eventtable" where timestamp > 
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
-
Seq Scan on "eventtable"  (cost=1.00..100019009.97 rows=136444 
width=155) (actual time=9909.847..9932.438 rows=1763 loops=1)
  Filter: (("timestamp")::timestamp with time zone > 
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 9934.353 ms
(3 rows)

monitor=# set enable_seqscan = true;
SET
monitor=#

Any help is greatly appreciated :)
-- Harmon

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly