Re: [GENERAL] Best way to handle multi-billion row read-only table?

Greg Smith Tue, 09 Feb 2010 22:51:42 -0800

Asher wrote:

Once loaded into the database the data will never be deleted ormodified and will typically be accessed over a particular date rangefor a particular channel (e.g. "sample_time >= X AND sample_time <= YAND channel=Z"). A typical query won't return more than a few millionrows and speed is not desperately important (as long as the time ismeasured in minutes rather than hours).
Are there any recommended ways to organise this? Should I partition mybig table into multiple smaller ones which will always fit in memory(this would result in several hundreds or thousands of sub-tables)?Are there any ways to keep the index size to a minimum? At the momentI have a few weeks of data, about 180GB, loaded into a single tableand indexed on sample_time and channel and the index takes up 180GB too.

One approach to consider is partitioning by sample_time and not evenincluding the channel number in the index. You've got tiny records;there's going to be hundreds of channels of data on each data pagepulled in, right? Why not minimize physical I/O by reducing the indexsize, just read that whole section of time in to memory (they should bepretty closely clustered and therefore mostly sequential I/O), and thenpush the filtering by channel onto the CPU instead. If you've gotbillions of rows, you're going to end up disk bound anyway; minimizingphysical I/O and random seeking around at the expense of CPU time couldbe a big win.


--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Best way to handle multi-billion row read-only table?

Reply via email to