Re: [SQL] index for group by

Steve Midgley Tue, 22 Jul 2008 14:28:40 -0700

At 09:20 AM 7/22/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 22 Jul 2008 13:27:24 +0200
From: "A. Kretschmer" <[EMAIL PROTECTED]>
To: [email protected]
Subject: Re: index for group by
Message-ID: <[EMAIL PROTECTED]>

am Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenbergfolgendes:

> Hi,
>
> is there a way to speedup "group by" queries with an index?
>
> In particular if I have a table like this:
>
> CREATE TABLE data
> (
>    id1 integer,
>    id2 integer,
>    somedata character varying,
>    ts timestamp with time zone
> );
>

> where continously data is logged about "id1" and "id2" into"somedata",

> together with the timestamp when it was logged.
>
> So I have multiple rows with the same id1 and id2 but different
> timestamp (and data maybe).
>
> At the moment I have ~40.000.000 rows in that table so doing a
>
>       SELECT id1, id2 FROM data GROUP BY id1, id2;


without a where-clause every select forces a seq-scan.

First, why are you doing a group by when you aren't doing anaggregation (like COUNT, SUM, etc)? It seems like you can get waybetter performance by doing this:


SELECT DISTINCT ON (id1, id2) id1, id2 FROM data ORDER BY id1, id2

(Assuming your compound index is in "id1,id2" order). Am I missingsomething?

A different more cumbersome idea I have for you (if you really do needa GROUP BY) is to build a warehouse table that precalculates the datayou want. You can build some recurring process that runs every NNminutes or hours and fires off a stored procedure which grabs all thedata from this "data" table, aggregates it and saves it to warehousetable. You could aggregate against your datetime stamp by N hours ordays as well. If this idea is of interest you can write back to thelist or off-list to me for more info.


Steve


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] index for group by

Reply via email to