[SQL] SELECT max() group by problem

2009-08-02 Thread Heigo Niilop
hi,



I have table



CREATE TABLE table

(

  id integer NOT NULL,

  timest timestamp with time zone NOT NULL,

  db_time timestamp with time zone NOT NULL DEFAULT now(),

  "values" text[],

  CONSTRAINT table_pkey PRIMARY KEY (id, timest)

)



„id“ have foreign key with table1



and when I try to do



SELECT MAX(table.timest)  FROM table, table1  WHERE

table.id=table1.id and

table1.id in (1,2,3) GROUP BY id



then it is terrible slow, when I use strange syntax



SELECT table.timest  FROM table,table1 WHERE

table.id=table1.id and table1.id in(1,2,3) and table.timest=

(SELECT max(timest) FROM table WHERE table.id=table1.id)



I receive all needed data very fast.



My questions are

1)  why this first query is slow and what I can do to make it faster
(some more indexes??)?

2)  what kind of danger I have with second query (so far I have right
data)?





I have Postgres 8.3 and table have over million rows.



Regards,

Heigo


Re: [SQL] Tweak sql result set... ?

2009-08-02 Thread Axe

> i think jasen is thinking of manipulating the result set in your programming
> enviroment not in the database.
> btw from the point of "clean programming" it's a bad idea to integrate
> html-elements directly into your database quereis. you're mixing data layer
> and design layer.
>
> what do you mean with sometimes?
> 2009/7/29 Axe 
>

I am aware of this "clean programming" and this is a part of why
I am trying to keep my blackbox of script intact since I want to keep
a good standard in those scripts. Why I need it "sometimes" is beacuse
I want to use the set of production scripts for test purposes.
"Intact" and
when I am doing some tests, I would benefit from manipulating the
result
set I am getting out from postgres for testing purposes by just
prepend
my test-workbench on top of the production script.

/ Axe

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] SQL report

2009-08-02 Thread wkipjohn

Hi Steve,

Thanks for you suggestions. In my senario, what is current depends on  
users. Because if user wants a status report at 00:00 1st Jan 2009, then  
00:00 1st Jan 2009 is current. So it is not possible to flag any records as  
current unless the user tells us what is current.


cheers
John

On Jul 31, 2009 2:41am, Steve Crawford   
wrote:

wkipj...@gmail.com wrote:




I have the following senario.




I have a tracking system. The system will record the status of an object  
regularly, all the status records are stored in one table. And it will  
keep a history of maximum 1000 status record for each object it tracks.  
The maximum objects the system will track is 100,000. Which means I will  
potentially have a table size of 100 million records.




I have to generate a report on the latest status of all objects being  
tracked at a particular point in time, and also I have to allow user to  
sort and filter on different columes in the status record displayed in  
the report.




...





Just wanna to know if anyone have a different approach to my senario.  
Thanks alot.





Not knowing all the details of your system, here are some things you  
could experiment with:




1. Add a "latest record id" field in your object table (automatically  
updated with a trigger) that would allow you to do a simple join with the  
tracking table. I suspect that such a join will be far faster than  
calculating "max" 100,000 times at the expense of a slightly larger main  
table.




2. Add a "current record flag" in the status table that simply flags the  
most recent record for each object (again, use triggers to keep the flag  
appropriately updated). This would also eliminate the need for the "max"  
subquery. You could even create a partial index filtering on the "current  
record flag" which could speed things up if the reporting query is  
written correctly.




3. Partition the table into a "current status table" and "historical  
status table" (each inheriting from the main table). Use a trigger so  
that anytime a new status record in added, the old "current" record is  
moved from the "current" to the "historical" table and the new one added  
to the "current" table. The latest status report will only need a simple  
join on the "current" table with a max size of 100,000 rather than a more  
complex query over a 100,000,000 record table.





Cheers,



Steve






Re: [SQL] SQL report

2009-08-02 Thread wkipjohn

Hi Rob,

I have default B-Tree indexes created for each of the indexed columes and  
primary key columes. (No multiple columes indexe or NULL FIRST or  
DESC/ASC). I am using PostgreSQL 8.3 with the auto vacuum daemon on. I  
assume analyse will be automatically run to collect statistics for use by  
the planner and there is no maintainance for B-tree indexes once it is  
created. (Please point me out if I am wrong about this)


I will probably try to partition the status table to group more recent  
status records together to minimize the dataset I am querying.


Thx
John


On Jul 31, 2009 1:16am, Rob Sargent  wrote:
I would be curious to know the performance curve for let's say 20K, 40K ,  
60K, 80K, 100K records. And what sort of indexing you have, whether or  
not it's clustered, re-built and so on.




One could envision partitioning the status table such that recent records  
were grouped together (on the assumption that they will be most  
frequently "reported").





wkipj...@gmail.com wrote:




I have the following senario.




I have a tracking system. The system will record the status of an object  
regularly, all the status records are stored in one table. And it will  
keep a history of maximum 1000 status record for each object it tracks.  
The maximum objects the system will track is 100,000. Which means I will  
potentially have a table size of 100 million records.




I have to generate a report on the latest status of all objects being  
tracked at a particular point in time, and also I have to allow user to  
sort and filter on different columes in the status record displayed in  
the report.




The following is a brief description in the status record (they are not  
actual code)





ObjectRecord(



objectId bigint PrimaryKey



desc varchar



)





StatusRecord (



id bigint PrimaryKey



objectId bigint indexed



datetime bigint indexed



capacity double



reliability double



efficiency double



)




I have tried to do the following, it works very well with around 20,000  
objects. (The query return in less than 10s) But when I have 100,000  
objects it becomes very very slow. (I don't even have patience to wait  
for it to return I kill it after 30 mins)




select * from statusrecord s1 INNER JOIN ( SELECT objectId ,  
MAX(datetime) AS msdt FROM statusrecord WHERE startDatetime



I did try to write a store procedure like below, for 100,000 objects and  
1000 status records / object, it returns in around 30 mins.




CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF  
statusrecord AS $BODY$



DECLARE



id VARCHAR;



status statusrecord%ROWTYPE;



BEGIN



FOR object IN SELECT * FROM objectRecord



LOOP


EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' ||  
quote_literal(object.objectId) ||



' AND datetime
INTO status;



IF FOUND THEN



RETURN NEXT status;



END IF;



END LOOP;



RETURN;



END



$BODY$ LANGUAGE plpgsql;




Just wanna to know if anyone have a different approach to my senario.  
Thanks alot.





John