Re: [PERFORM] Auto-clustering?

Royce Ausburn Sat, 18 Dec 2010 19:08:11 -0800

On 17/12/2010, at 9:20 PM, Pierre C wrote:

> 
>> fc=# explain  analyse select collection, period, tariff, sum(bytesSent), 
>> sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as 
>> startchunk from sample_20101001 where starttime between 1287493200 and 
>> 1290171599  and collection=128    and ip = '10.9.125.207' group by 
>> startchunk, tariff, collection, period;
> 
> If CLUSTER locks bother you, and you don't do UPDATEs, you might consider 
> doing something like this :
> 
> - accumulate the rows in a "recent" table
> - every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your cluster 
> fields)
> - DELETE FROM recent the rows you just inserted
> - VACUUM recent
> 
> The cluster in your archive table will not be perfect but at least all rows 
> from 1 source in 1 hour will be stored close together. But clustering doesn't 
> need to be perfect either, if you get 100x better locality, that's already 
> good !


That's a really decent idea and can slot in perfectly well with how the 
application already works!  We have existing DBAO code that handles monthly 
tables; it'll happily pop data in to a recent table.... In fact we can probably 
tolerate having a "today" table.  Thanks!

--Royce



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Auto-clustering?

Reply via email to