Hi Rob,
I have default B-Tree indexes created for each of the indexed columes and
primary key columes. (No multiple columes indexe or NULL FIRST or
DESC/ASC). I am using PostgreSQL 8.3 with the auto vacuum daemon on. I
assume analyse will be automatically run to collect statistics for use by
the planner and there is no maintainance for B-tree indexes once it is
created. (Please point me out if I am wrong about this)
I will probably try to partition the status table to group more recent
status records together to minimize the dataset I am querying.
Thx
John
On Jul 31, 2009 1:16am, Rob Sargent <robjsarg...@gmail.com> wrote:
I would be curious to know the performance curve for let's say 20K, 40K ,
60K, 80K, 100K records. And what sort of indexing you have, whether or
not it's clustered, re-built and so on.
One could envision partitioning the status table such that recent records
were grouped together (on the assumption that they will be most
frequently "reported").
wkipj...@gmail.com wrote:
I have the following senario.
I have a tracking system. The system will record the status of an object
regularly, all the status records are stored in one table. And it will
keep a history of maximum 1000 status record for each object it tracks.
The maximum objects the system will track is 100,000. Which means I will
potentially have a table size of 100 million records.
I have to generate a report on the latest status of all objects being
tracked at a particular point in time, and also I have to allow user to
sort and filter on different columes in the status record displayed in
the report.
The following is a brief description in the status record (they are not
actual code)
ObjectRecord(
objectId bigint PrimaryKey
desc varchar
)
StatusRecord (
id bigint PrimaryKey
objectId bigint indexed
datetime bigint indexed
capacity double
reliability double
efficiency double
)
I have tried to do the following, it works very well with around 20,000
objects. (The query return in less than 10s) But when I have 100,000
objects it becomes very very slow. (I don't even have patience to wait
for it to return.... I kill it after 30 mins)
select * from statusrecord s1 INNER JOIN ( SELECT objectId ,
MAX(datetime) AS msdt FROM statusrecord WHERE startDatetime
I did try to write a store procedure like below, for 100,000 objects and
1000 status records / object, it returns in around 30 mins.
CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF
statusrecord AS $BODY$
DECLARE
id VARCHAR;
status statusrecord%ROWTYPE;
BEGIN
FOR object IN SELECT * FROM objectRecord
LOOP
EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' ||
quote_literal(object.objectId) ||
' AND datetime
INTO status;
IF FOUND THEN
RETURN NEXT status;
END IF;
END LOOP;
RETURN;
END
$BODY$ LANGUAGE plpgsql;
Just wanna to know if anyone have a different approach to my senario.
Thanks alot.
John