subject:"\[PERFORM\] Performace comparison of indexes over timestamp fields"

Re: [PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Tom Lane

Arnau <[EMAIL PROTECTED]> writes:
> As you can see the time difference are very big
>Timestamp:318.328 ms
>int8 index:   120.804 ms
>double precision: 57.065 ms

As already suggested elsewhere, you probably weren't sufficiently
careful in taking your measurements.

A look at the code says that int8 comparison ought to be the fastest
of these.  If timestamps are implemented as floats (which you didn't
say) their comparison speed ought to be *exactly* the same as floats,
because the comparison functions are line-for-line the same.  If
timestamps are implemented as int8 then they should be similar to
int8 comparisons, maybe a tad slower due to an extra level of function
call.  But in any case it seems likely that the actual comparison
function calls would be just a small fraction of the runtime.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Alexander Staubo

On 5/22/07, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote:

On Tue, May 22, 2007 at 02:39:33PM +0200, Alexander Staubo wrote:
> PostgreSQL uses B-tree indexes for scalar values. For an expression
> such as "t between a and b", I believe it's going to match both sides
> of the table independently (ie., t >= a and t <= b) and intersect
> these subsets. This is inefficient.

A B-tree index can satisfy range queries such as this.

You're right, and I'm wrong -- my head is not in the right place
today. B-trees are inefficient for intervals, but quite satisfactory
for range searches.

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Steinar H. Gunderson

On Tue, May 22, 2007 at 02:39:33PM +0200, Alexander Staubo wrote:
> PostgreSQL uses B-tree indexes for scalar values. For an expression
> such as "t between a and b", I believe it's going to match both sides
> of the table independently (ie., t >= a and t <= b) and intersect
> these subsets. This is inefficient.

A B-tree index can satisfy range queries such as this.

> You should get better performance by mapping timestamps to a
> one-dimensional plane and indexing them using GiST. GiST implements an
> R-tree-like structure that supports bounding-box searches.

You may be thinking of interval overlaps?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Alexander Staubo


On 5/22/07, Arnau <[EMAIL PROTECTED]> wrote:

   On older versions of PostgreSQL, at least in my experience, queries
on timestamps fields even having indexes where performing quite bad
mainly sequential scans where performed.


PostgreSQL uses B-tree indexes for scalar values. For an expression
such as "t between a and b", I believe it's going to match both sides
of the table independently (ie., t >= a and t <= b) and intersect
these subsets. This is inefficient.

You should get better performance by mapping timestamps to a
one-dimensional plane and indexing them using GiST. GiST implements an
R-tree-like structure that supports bounding-box searches.

This involves setting up a functional index:

 create index ... on payment_transactions using gist (
   box(point(extract(epoch from time), 0), point(extract(epoch from
time), 0)) box_ops)

I'm using box() here because GiST doesn't have a concept of points.

Then insert as usual, and then query with something like:

 select ... from payment_transactions
 where box(
   point(extract(epoch from '2006-04-01'::date), 0),
   point(extract(epoch from '2006-08-01'::date), 0)) && box(
   point(extract(epoch from time), 0),
   point(extract(epoch from time), 0));

PostgreSQL should be able to exploit the GiST index by recognizing
that the result of box() expression operand is already computed in the
index.

This much less inconvenient and portable -- I would love for
PostgreSQL to be provide syntactic sugar and special-casing to make
this transparent -- but worth it if you are dealing with a lot of
range searches.


   Now I have a newer version of PostgreSQL and I've done some tests
comparing the performance of an index over a timestamp field with a
numeric field. To do so, I have the following table:

  Table "public.payment_transactions"
 Column |Type |Modifiers
+-+-
transaction_id | character varying(32)   | not null
timestamp_in   | timestamp without time zone | default now()
credits| integer |
epoch_in   | bigint  |
epoch_in2  | double precision|

[snip]

A timestamp is stored internally as an 8-byte double-precision float.
Therefore, timestamp_in and epoch_in2 should behave identically.


While doing the tests this table has about 100.000 entries.


Make sure PostgreSQL is able to keep the entire table in memory by
setting shared_buffers; you don't want to be hitting to the disk.

Make sure you run "analyze" on the table before you execute the test.


To test the diferent indexes I have executed the following:


Your query plans are roughly identical. The difference in the timings
implies that you only ran the queries once. I suggest you run each
query at least 10 times, and report the individual numbers (the "total
runtime" parts of the output) you get. Arithmetic means are not that
interesting.

Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Performace comparison of indexes over timestamp fields

2007-05-22 Thread Arnau


Hi all,

  I have some tables where all the queries that will be executed are 
timestamps driven, so it'd be nice to have an index over those fields.


  On older versions of PostgreSQL, at least in my experience, queries 
on timestamps fields even having indexes where performing quite bad 
mainly sequential scans where performed.


  Now I have a newer version of PostgreSQL and I've done some tests 
comparing the performance of an index over a timestamp field with a 
numeric field. To do so, I have the following table:


 Table "public.payment_transactions"
Column |Type |Modifiers
+-+-
transaction_id | character varying(32)   | not null
timestamp_in   | timestamp without time zone | default now()
credits| integer |
epoch_in   | bigint  |
epoch_in2  | double precision|
Indexes:
   "pk_paytrans_transid" PRIMARY KEY, btree (transaction_id)
   "idx_paytrans_epochin" btree (epoch_in)
   "idx_paytrans_epochin2" btree (epoch_in2)
   "idx_paytrans_timestamp" btree (timestamp_in)

timestamp_in it's the timestamp, epoch_in and epoch_in2 are the epoch 
equivalent to timestamp to test how the indexes perform. We have three 
different indexes (testing purposes) one over a timestamp field, one 
over an int8 and one over a double precision field.


While doing the tests this table has about 100.000 entries.



To test the diferent indexes I have executed the following:

Index over timestamp_in (timestamp)

# explain analyze select * from payment_transactions where timestamp_in 
between '2007-02-13'::timestamp and '2007-02-15'::timestamp;


  QUERY PLAN
---
Index Scan using idx_paytrans_timestamp on payment_transactions 
(cost=0.00..1480.24 rows=1698 width=138) (actual time=11.693..310.402 
rows=1587 loops=1)
  Index Cond: ((timestamp_in >= '2007-02-13 00:00:00'::timestamp 
without time zone) AND (timestamp_in <= '2007-02-15 00:00:00'::timestamp 
without time zone))

Total runtime: 318.328 ms
(3 rows)


Index over epoch_in (int8)

# explain analyze select * from payment_transactions where epoch_in 
between extract( epoch from '2007-02-13'::date )::int8 and extract( 
epoch from '2007-02-15'::date )::int8;


QUERY PLAN
-
Index Scan using idx_paytrans_epochin on payment_transactions 
(cost=0.00..1483.24 rows=1698 width=138) (actual time=34.369..114.943 
rows=1587 loops=1)
  Index Cond: ((epoch_in >= 1171321200::bigint) AND (epoch_in <= 
1171494000::bigint))

Total runtime: 120.804 ms
(3 rows)


Index over epoch_in (double precision)

# explain analyze select * from payment_transactions where epoch_in2 
between extract( epoch from '2007-02-13'::date ) and extract( epoch from 
'2007-02-15'::date );


QUERY PLAN
-
Index Scan using idx_paytrans_epochin2 on payment_transactions 
(cost=0.00..1479.24 rows=1698 width=138) (actual time=26.115..51.357 
rows=1587 loops=1)
  Index Cond: ((epoch_in2 >= 1171321200::double precision) AND 
(epoch_in2 <= 1171494000::double precision))

Total runtime: 57.065 ms
(3 rows)



As you can see the time difference are very big
  Timestamp:318.328 ms
  int8 index:   120.804 ms
  double precision: 57.065 ms

is this normal? am I doing anything wrong?

As rule of thumb is better to store epochs than timestamps?

Thank you very much
--
Arnau

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Performace comparison of indexes over timestamp fields

Re: [PERFORM] Performace comparison of indexes over timestamp fields

Re: [PERFORM] Performace comparison of indexes over timestamp fields

Re: [PERFORM] Performace comparison of indexes over timestamp fields

[PERFORM] Performace comparison of indexes over timestamp fields

5 matches

Site Navigation

Mail list logo

Footer information