subject:"\[SQL\] index for group by"

[SQL] index for group by

2008-07-22 Thread Patrick Scharrenberg

Hi,

is there a way to speedup group by queries with an index?

In particular if I have a table like this:

CREATE TABLE data
(
   id1 integer,
   id2 integer,
   somedata character varying,
   ts timestamp with time zone
);

where continously data is logged about id1 and id2 into somedata,
together with the timestamp when it was logged.

So I have multiple rows with the same id1 and id2 but different
timestamp (and data maybe).

At the moment I have ~40.000.000 rows in that table so doing a

SELECT id1, id2 FROM data GROUP BY id1, id2;

takes some time (~10 minutes)
and return about 1.000.000 rows.

I created an index on both colums id1 and id2 (together) which takes
about 800 MB but doesn't speedup things.
In fact it even doesn't seem to be used.

Is there any way to speedup this group by or does it seem more likely
that I have a conceptional flaw?

regards
Patrick

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] index for group by

2008-07-22 Thread A. Kretschmer

am  Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg 
folgendes:
 Hi,
 
 is there a way to speedup group by queries with an index?
 
 In particular if I have a table like this:
 
 CREATE TABLE data
 (
id1 integer,
id2 integer,
somedata character varying,
ts timestamp with time zone
 );
 
 where continously data is logged about id1 and id2 into somedata,
 together with the timestamp when it was logged.
 
 So I have multiple rows with the same id1 and id2 but different
 timestamp (and data maybe).
 
 At the moment I have ~40.000.000 rows in that table so doing a
 
   SELECT id1, id2 FROM data GROUP BY id1, id2;


without a where-clause every select forces a seq-scan.


 
 takes some time (~10 minutes)
 and return about 1.000.000 rows.
 
 I created an index on both colums id1 and id2 (together) which takes
 about 800 MB but doesn't speedup things.
 In fact it even doesn't seem to be used.

The database has to read all rows, an index can't help in this case.


 
 Is there any way to speedup this group by or does it seem more likely
 that I have a conceptional flaw?

Hard to say...


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] index for group by

2008-07-22 Thread Steve Midgley

At 09:20 AM 7/22/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 22 Jul 2008 13:27:24 +0200
From: A. Kretschmer [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Re: index for group by
Message-ID: [EMAIL PROTECTED]

am  Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg 
folgendes:

 Hi,

 is there a way to speedup group by queries with an index?

 In particular if I have a table like this:

 CREATE TABLE data
 (
id1 integer,
id2 integer,
somedata character varying,
ts timestamp with time zone
 );

 where continously data is logged about id1 and id2 into 
somedata,

 together with the timestamp when it was logged.

 So I have multiple rows with the same id1 and id2 but different
 timestamp (and data maybe).

 At the moment I have ~40.000.000 rows in that table so doing a

   SELECT id1, id2 FROM data GROUP BY id1, id2;

without a where-clause every select forces a seq-scan.

First, why are you doing a group by when you aren't doing an 
aggregation (like COUNT, SUM, etc)? It seems like you can get way 
better performance by doing this:

SELECT DISTINCT ON (id1, id2) id1, id2 FROM data ORDER BY id1, id2

(Assuming your compound index is in id1,id2 order). Am I missing 
something?

A different more cumbersome idea I have for you (if you really do need 
a GROUP BY) is to build a warehouse table that precalculates the data 
you want. You can build some recurring process that runs every NN 
minutes or hours and fires off a stored procedure which grabs all the 
data from this data table, aggregates it and saves it to warehouse 
table. You could aggregate against your datetime stamp by N hours or 
days as well. If this idea is of interest you can write back to the 
list or off-list to me for more info.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] index for group by

Re: [SQL] index for group by

Re: [SQL] index for group by

3 matches

Site Navigation

Mail list logo

Footer information