[SQL] SQL report

2009-07-29 Thread wkipjohn

I have the following senario.

I have a tracking system. The system will record the status of an object  
regularly, all the status records are stored in one table. And it will keep  
a history of maximum 1000 status record for each object it tracks. The  
maximum objects the system will track is 100,000. Which means I will  
potentially have a table size of 100 million records.


I have to generate a report on the latest status of all objects being  
tracked at a particular point in time, and also I have to allow user to  
sort and filter on different columes in the status record displayed in the  
report.


The following is a brief description in the status record (they are not  
actual code)


ObjectRecord(
objectId bigint PrimaryKey
desc varchar
)

StatusRecord (
id bigint PrimaryKey
objectId bigint indexed
datetime bigint indexed
capacity double
reliability double
efficiency double
)

I have tried to do the following, it works very well with around 20,000  
objects. (The query return in less than 10s) But when I have 100,000  
objects it becomes very very slow. (I don't even have patience to wait for  
it to return I kill it after 30 mins)


select * from statusrecord s1 INNER JOIN ( SELECT objectId , MAX(datetime)  
AS msdt FROM statusrecord WHERE startDatetime <= 1233897527657 GROUP BY  
objectId ) AS s2 ON ( s1.objectId = s2.objectId AND s1.datetime = s2.msdt )  
where ( capacity < 10.0 ) order by s1.datetime DESC, s1.objectId DESC;


I did try to write a store procedure like below, for 100,000 objects and  
1000 status records / object, it returns in around 30 mins.


CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF  
statusrecord AS $BODY$

DECLARE
id VARCHAR;
status statusrecord%ROWTYPE;
BEGIN
FOR object IN SELECT * FROM objectRecord
LOOP
EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' ||  
quote_literal(object.objectId) ||

' AND datetime <= ' || quote_literal(pitvalue) || ' ORDER BY datetime DESC'
INTO status;
IF FOUND THEN
RETURN NEXT status;
END IF;
END LOOP;
RETURN;
END
$BODY$ LANGUAGE plpgsql;

Just wanna to know if anyone have a different approach to my senario.  
Thanks alot.


John


Re: [SQL] SQL report

2009-08-02 Thread wkipjohn

Hi Steve,

Thanks for you suggestions. In my senario, what is current depends on  
users. Because if user wants a status report at 00:00 1st Jan 2009, then  
00:00 1st Jan 2009 is current. So it is not possible to flag any records as  
current unless the user tells us what is current.


cheers
John

On Jul 31, 2009 2:41am, Steve Crawford   
wrote:

wkipj...@gmail.com wrote:




I have the following senario.




I have a tracking system. The system will record the status of an object  
regularly, all the status records are stored in one table. And it will  
keep a history of maximum 1000 status record for each object it tracks.  
The maximum objects the system will track is 100,000. Which means I will  
potentially have a table size of 100 million records.




I have to generate a report on the latest status of all objects being  
tracked at a particular point in time, and also I have to allow user to  
sort and filter on different columes in the status record displayed in  
the report.




...





Just wanna to know if anyone have a different approach to my senario.  
Thanks alot.





Not knowing all the details of your system, here are some things you  
could experiment with:




1. Add a "latest record id" field in your object table (automatically  
updated with a trigger) that would allow you to do a simple join with the  
tracking table. I suspect that such a join will be far faster than  
calculating "max" 100,000 times at the expense of a slightly larger main  
table.




2. Add a "current record flag" in the status table that simply flags the  
most recent record for each object (again, use triggers to keep the flag  
appropriately updated). This would also eliminate the need for the "max"  
subquery. You could even create a partial index filtering on the "current  
record flag" which could speed things up if the reporting query is  
written correctly.




3. Partition the table into a "current status table" and "historical  
status table" (each inheriting from the main table). Use a trigger so  
that anytime a new status record in added, the old "current" record is  
moved from the "current" to the "historical" table and the new one added  
to the "current" table. The latest status report will only need a simple  
join on the "current" table with a max size of 100,000 rather than a more  
complex query over a 100,000,000 record table.





Cheers,



Steve






Re: [SQL] SQL report

2009-08-02 Thread wkipjohn

Hi Rob,

I have default B-Tree indexes created for each of the indexed columes and  
primary key columes. (No multiple columes indexe or NULL FIRST or  
DESC/ASC). I am using PostgreSQL 8.3 with the auto vacuum daemon on. I  
assume analyse will be automatically run to collect statistics for use by  
the planner and there is no maintainance for B-tree indexes once it is  
created. (Please point me out if I am wrong about this)


I will probably try to partition the status table to group more recent  
status records together to minimize the dataset I am querying.


Thx
John


On Jul 31, 2009 1:16am, Rob Sargent  wrote:
I would be curious to know the performance curve for let's say 20K, 40K ,  
60K, 80K, 100K records. And what sort of indexing you have, whether or  
not it's clustered, re-built and so on.




One could envision partitioning the status table such that recent records  
were grouped together (on the assumption that they will be most  
frequently "reported").





wkipj...@gmail.com wrote:




I have the following senario.




I have a tracking system. The system will record the status of an object  
regularly, all the status records are stored in one table. And it will  
keep a history of maximum 1000 status record for each object it tracks.  
The maximum objects the system will track is 100,000. Which means I will  
potentially have a table size of 100 million records.




I have to generate a report on the latest status of all objects being  
tracked at a particular point in time, and also I have to allow user to  
sort and filter on different columes in the status record displayed in  
the report.




The following is a brief description in the status record (they are not  
actual code)





ObjectRecord(



objectId bigint PrimaryKey



desc varchar



)





StatusRecord (



id bigint PrimaryKey



objectId bigint indexed



datetime bigint indexed



capacity double



reliability double



efficiency double



)




I have tried to do the following, it works very well with around 20,000  
objects. (The query return in less than 10s) But when I have 100,000  
objects it becomes very very slow. (I don't even have patience to wait  
for it to return I kill it after 30 mins)




select * from statusrecord s1 INNER JOIN ( SELECT objectId ,  
MAX(datetime) AS msdt FROM statusrecord WHERE startDatetime



I did try to write a store procedure like below, for 100,000 objects and  
1000 status records / object, it returns in around 30 mins.




CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF  
statusrecord AS $BODY$



DECLARE



id VARCHAR;



status statusrecord%ROWTYPE;



BEGIN



FOR object IN SELECT * FROM objectRecord



LOOP


EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' ||  
quote_literal(object.objectId) ||



' AND datetime
INTO status;



IF FOUND THEN



RETURN NEXT status;



END IF;



END LOOP;



RETURN;



END



$BODY$ LANGUAGE plpgsql;




Just wanna to know if anyone have a different approach to my senario.  
Thanks alot.





John