Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-21 Thread Alex Stapleton


On 20 Jun 2005, at 18:46, Josh Berkus wrote:



Alex,




Hi, i'm trying to optimise our autovacuum configuration so that it
vacuums / analyzes some of our larger tables better. It has been set
to the default settings for quite some time. We never delete
anything  (well not often, and not much) from the tables, so I am not
so worried about the VACUUM status, but I am wary of XID wraparound
nuking us at some point if we don't sort vacuuming out so we VACUUM
at least once every year ;)




I personally don't use autovaccuum on very large databases.   For DW,
vacuuming is far better tied to ETL operations or a clock schedule of
downtime.



Downtime is something I'd rather avoid if possible. Do you think we  
will need to run VACUUM FULL occasionally? I'd rather not lock tables  
up unless I cant avoid it. We can probably squeeze an automated  
vacuum tied to our data inserters every now and then though.




XID wraparound may be further away than you think.   Try checking
pg_controldata, which will give you the current XID, and you can  
calculate
how long you are away from wraparound.  I just tested a 200G data  
warehouse
and figured out that we are 800 months away from wraparound,  
despite hourly

ETL.



Is this an 8.0 thing? I don't have a pg_controldata from what I can  
see. Thats nice to hear though.








However not running ANALYZE for such huge
periods of time is probably impacting the statistics accuracy
somewhat, and I have seen some unusually slow queries at times.
Anyway, does anyone think we might benefit from a more aggressive
autovacuum configuration?




Hmmm, good point, you could use autovacuum for ANALYZE only.  Just  
set the
VACUUM settings preposterously high (like 10x) so it never runs.
Then it'll
run ANALYZE only.   I generally threshold 200, multiple 0.1x for  
analyze;

that is, re-analyze after 200+10% of rows have changed.



I will try those settings out, that sounds good to me though.



--
Josh Berkus
Aglio Database Solutions
San Francisco







---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread Amit V Shah
Hi all,

I have like a repository table with is very very huge with atleast a few
hundreds of millions, may be over that. The information is stored in form of
rows in these tables. I need to make that information wide based on some
grouping and display them as columns on the screen.

I am thinking of having a solution where I create views for each screen,
which are just read only.

However, I donot know if the query that creates the view is executed
everytime I select something from the view. Because if that is the case,
then I think my queries will again be slow. But if that is the way views
work, then what would be the point in creating them ..

Any suggestions, helps --

(Please pardon if this question should not be on performance forum)

Thanks,
Amit


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Limit clause not using index

2005-06-21 Thread Yves Vindevogel
Hi,

I have a very simple query on a big table.  When I issue a limit and/or offset clause, the query is not using the index.
Can anyone explain me this ?

rvponp=# explain select * from tblprintjobs order by loginuser, desceventdate, desceventtime offset 25 limit 25 ;
QUERY PLAN 
---
Limit  (cost=349860.62..349860.68 rows=25 width=206)
->  Sort  (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
->  Seq Scan on tblprintjobs  (cost=0.00..25589.36 rows=622236 width=206)
(4 rows)

rvponp=# explain select * from tblprintjobs order by loginuser, desceventdate, desceventtime ;
QUERY PLAN  
-
Sort  (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
->  Seq Scan on tblprintjobs  (cost=0.00..25589.36 rows=622236 width=206)
(3 rows)

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Bricklen Anderson
Yves Vindevogel wrote:
 Hi, 
 
 rvponp=# explain select * from tblprintjobs order by loginuser,
 desceventdate, desceventtime offset 25 limit 25 ;

 I have a very simple query on a big table.  When I issue a limit
 and/or offset clause, the query is not using the index. 
 Can anyone explain me this ? 

Do you have an index on (loginuser,desceventdate,desceventtime)?

-- 
___

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread John A Meinel

Yves Vindevogel wrote:


Hi,

I have a very simple query on a big table. When I issue a limit 
and/or offset clause, the query is not using the index.

Can anyone explain me this ?


You didn't give enough information. What does you index look like that 
you are expecting it to use?

Generally, you want to have matching columns. So you would want
CREATE INDEX blah ON tblprintjobs(loginuser, desceventdate, desceventtime);

Next, you should post EXPLAIN ANALYZE instead of regular explain, so we 
can have an idea if the planner is actually making correct estimations.


John
=:-



rvponp=# explain select * from tblprintjobs order by loginuser, 
desceventdate, desceventtime offset 25 limit 25 ;

QUERY PLAN
--- 


Limit (cost=349860.62..349860.68 rows=25 width=206)
- Sort (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
- Seq Scan on tblprintjobs (cost=0.00..25589.36 rows=622236 width=206)
(4 rows)

rvponp=# explain select * from tblprintjobs order by loginuser, 
desceventdate, desceventtime ;

QUERY PLAN
- 


Sort (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
- Seq Scan on tblprintjobs (cost=0.00..25589.36 rows=622236 width=206)
(3 rows)

Met vriendelijke groeten,
Bien à vous,
Kind regards,

*Yves Vindevogel*
*Implements*






signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Tom Lane
Yves Vindevogel [EMAIL PROTECTED] writes:
 Can anyone explain me this ?

 rvponp=# explain select * from tblprintjobs order by loginuser,  
 desceventdate, desceventtime offset 25 limit 25 ;
  QUERY PLAN
  
 ---
   Limit  (cost=349860.62..349860.68 rows=25 width=206)
 -  Sort  (cost=349860.56..351416.15 rows=622236 width=206)
   Sort Key: loginuser, desceventdate, desceventtime
   -  Seq Scan on tblprintjobs  (cost=0.00..25589.36 rows=622236 
 width=206)
 (4 rows)


Do you have an index matching that sort key?  I'd certainly expect the
above to use it if it were there.  For the full table case it's not so
clear --- an indexscan isn't always better.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread Matthew Nuzum
On 6/21/05, Amit V Shah [EMAIL PROTECTED] wrote:
 Hi all,
...
 I am thinking of having a solution where I create views for each screen,
 which are just read only.
 
 However, I donot know if the query that creates the view is executed
 everytime I select something from the view. Because if that is the case,
 then I think my queries will again be slow. But if that is the way views
 work, then what would be the point in creating them ..
 
 Any suggestions, helps --

They do get executed every time. I have a similar issue, but my data
does not change very frequently, so instead of using a view, I create
lookup tables to hold the data. So once a day I do something like
this:
drop lookup_table_1;
create table lookup_table_1 as SELECT ...;

In my case, rows are not deleted or updated, so I don't actually do a
drop table... I merely add new records to the existing table, but if
your data changes, the drop table technique can be faster than doing a
delete or update.

-- 
Matthew Nuzum
www.bearfruit.org

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread Richard Huxton

Amit V Shah wrote:

Hi all,

I have like a repository table with is very very huge with atleast a few
hundreds of millions, may be over that. The information is stored in form of
rows in these tables. I need to make that information wide based on some
grouping and display them as columns on the screen.

I am thinking of having a solution where I create views for each screen,
which are just read only.

However, I donot know if the query that creates the view is executed
everytime I select something from the view. Because if that is the case,
then I think my queries will again be slow. But if that is the way views
work, then what would be the point in creating them ..


That's exactly how they work. You'd still want them because they let you 
simplify access control (user A can only see some rows, user B can see 
all rows) or just make your queries simpler.


Sounds like you want what is known as a materialised view which is 
basically a summary table that is kept up to date by triggers. You query 
the table instead of actually recalculating every time. Perhaps google 
for postgresql materialized view (you might want a z or s in 
materialised).


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread PFC




However, I donot know if the query that creates the view is executed
everytime I select something from the view. Because if that is the case,
then I think my queries will again be slow. But if that is the way views
work, then what would be the point in creating them ..


	Views are more for when you have a query which keeps coming a zillion  
time in your application like :


SELECT p.*, pd.* FROM products p, products_names pd WHERE p.id=pd.id AND  
pd.language=...


You create a view like :

CREATE VIEW products_with_name AS SELECT p.*, pd.* FROM products p,  
products_names pd WHERE p.id=pd.id


And then you :

SELECT * FROM products_with_name WHERE id=... AND language=...

	It saves a lot of headache and typing over and over again the same thing,  
and you can tell your ORM library to use them, too.


	But for your application, they're useless, You should create a  
materialized view... which is just a table and update it from a CRON job.
	You can still use a view to fill your table, and as a way to hold your  
query, so the cron job doesn't have to issue real queries, just filling  
tables from views :


CREATE VIEW cached_stuff_view AS ...

And once in while :

BEGIN;
DROP TABLE cached_stuff;
CREATE TABLE cached_stuff AS SELECT * FROM cached_stuff_view;
CREATE INDEX ... ON cached_stuff( ... )
COMMIT;
ANALYZE cached_stuff;

Or :
BEGIN;
TRUNCATE cached_stuff;
INSERT INTO cached_stuff SELECT * FROM cached_stuff_view;
COMMIT;
ANALYZE cached_stuff;

If you update your entire table it's faster to just junk it or truncate it  
then recreate it, but maybe you'd prefer TRUNCATE which saves you from  
having to re-create of indexes... but it'll be faster if you drop the  
indexes and re-create them afterwards anyway instead of them being updated  
for each row inserted. So I'd say DROP TABLE.













---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Yves Vindevogel
x-tad-biggerThese are my indexes

create index ixprintjobsapplicationtype on tblPrintjobs (applicationtype);
create index ixprintjobsdesceventdate on tblPrintjobs (desceventdate);
create index ixprintjobsdesceventtime on tblPrintjobs (desceventtime);
create index ixprintjobsdescpages on tblPrintjobs (descpages);   
create index ixprintjobsdocumentname on tblPrintjobs (documentname) ;   
create index ixprintjobseventcomputer on tblPrintjobs (eventcomputer); 
create index ixprintjobseventdate on tblPrintjobs (eventdate);
create index ixprintjobseventtime on tblPrintjobs (eventtime);  
create index ixprintjobseventuser on tblPrintjobs (eventuser);   
create index ixprintjobshostname on tblPrintjobs (hostname) ; 
create index ixprintjobsipaddress on tblPrintjobs (ipaddress) ;
create index ixprintjobsloginuser on tblPrintjobs (loginuser) ;  
create index ixprintjobspages on tblPrintjobs (pages) ;  
create index ixprintjobsprintport on tblPrintjobs (printport) ; 
create index ixprintjobsprintqueue on tblPrintjobs (printqueue) ;   
create index ixprintjobsrecordnumber on tblPrintjobs (recordnumber) ;  
create index ixprintjobssize on tblPrintjobs (size) ;  
create index ixprintjobsusertype on tblPrintjobs (usertype) ;   
create index ixPrintjobsDescpagesDocumentname on tblPrintjobs (descpages, documentname) ;
create index ixPrintjobsHostnamePrintqueueDesceventdateDesceventtime on tblPrintjobs (hostname, printqueue, desceventdate, desceventtime) ;
create index ixPrintjobsLoginDescEventdateDesceventtime on tblPrintjobs (loginuser, desceventdate, desceventtime) ;
/x-tad-bigger

On 21 Jun 2005, at 16:42, Tom Lane wrote:

Yves Vindevogel [EMAIL PROTECTED]> writes:
Can anyone explain me this ?

rvponp=# explain select * from tblprintjobs order by loginuser,  
desceventdate, desceventtime offset 25 limit 25 ;
QUERY PLAN
 
---
Limit  (cost=349860.62..349860.68 rows=25 width=206)
->  Sort  (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
->  Seq Scan on tblprintjobs  (cost=0.00..25589.36 rows=622236 width=206)
(4 rows)


Do you have an index matching that sort key?  I'd certainly expect the
above to use it if it were there.  For the full table case it's not so
clear --- an indexscan isn't always better.

regards, tom lane


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread John A Meinel

Amit V Shah wrote:


After I sent out this email, I found this article from google

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Looks like we can control as to when the views refresh... I am still kind of
confused, and would appreciate help !!

The create/drop table does sound a solution that can work, but the thing is
I want to get manual intervention out, and besides, my work flow is very
complex so this might not be an option for me :-(

Thanks,
Amit



Just to make it clear, a view is not the same as a materialized view.
A view is just a set of rules to the planner so that it can simplify
interactions with the database. A materialized view is a query which has
been saved into a table.

To set it up properly, really depends on what your needs are.

  1. How much time can elapse between an update to the system, and an
 update to the materialized views?
  2. How many updates / (sec, min, hour, month) do you expect. Is
 insert performance critical, or secondary.

For instance, if you get a lot of updates, but you can have a 1 hour lag
between the time a new row is inserted and the view is updated, you can
just create a cron job that runs every hour to regenerate the
materialized view.

If you don't get many updates, but you need them to show up right away,
then you can add triggers to the affected tables, such that
inserting/updating to a specific table causes an update to the
materialized view.

There are quite a few potential tradeoffs. Rather than doing a
materialized view, you could just improve your filters. If you are doing
a query to show people the results, you generally have some sort of
upper bound on how much data you can display. Humans don't like reading
more than 100 or 1000 rows. So create your normal query, and just take
on a LIMIT 100 at the end. If you structure your query properly, and
have appropriate indexes, you should be able to make the LIMIT count,
and allow you to save a lot of overhead of generating rows that you
don't use.

I would probably start by posting the queries you are currently using,
along with an EXPLAIN ANALYZE, and a description of what you actually
need from the query. Then this list can be quite helpful in
restructuring your query to make it faster.

John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] slow growing table

2005-06-21 Thread Jone C
 On second thought... Does a VACUUM FULL help? If so, you might want to
 increase your FSM settings.

Thank you for the reply, sorry for delay I was on holiday.

I tried that it had no effect. I benchmarked 2x before, peformed
VACUUM FULL on the table in question post inserts, then benchmarked 2x
after. Same results...

Should I try your suggestion on deleting the indexes? This table needs
to be accessible for reads at all times however though...

thank you kindly


On 6/6/05, Steinar H. Gunderson [EMAIL PROTECTED] wrote:
 On Mon, Jun 06, 2005 at 07:00:37PM +0200, Steinar H. Gunderson wrote:
  You might have a problem with index bloat. Could you try REINDEXing the
  indexes on the table and see if that makes a difference?
 

 
 /* Steinar */
 -- 
 Homepage: http://www.sesse.net/
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Yves Vindevogel
Nevermind guys 
There's an error in a function that is creating these indexes.
The function never completed succesfully so the index is not there

Very sorry about this !!


On 21 Jun 2005, at 16:57, Yves Vindevogel wrote:

x-tad-biggerThese are my indexes

create index ixprintjobsapplicationtype on tblPrintjobs (applicationtype);
create index ixprintjobsdesceventdate on tblPrintjobs (desceventdate);
create index ixprintjobsdesceventtime on tblPrintjobs (desceventtime);
create index ixprintjobsdescpages on tblPrintjobs (descpages);   
create index ixprintjobsdocumentname on tblPrintjobs (documentname) ;   
create index ixprintjobseventcomputer on tblPrintjobs (eventcomputer); 
create index ixprintjobseventdate on tblPrintjobs (eventdate);
create index ixprintjobseventtime on tblPrintjobs (eventtime);  
create index ixprintjobseventuser on tblPrintjobs (eventuser);   
create index ixprintjobshostname on tblPrintjobs (hostname) ; 
create index ixprintjobsipaddress on tblPrintjobs (ipaddress) ;
create index ixprintjobsloginuser on tblPrintjobs (loginuser) ;  
create index ixprintjobspages on tblPrintjobs (pages) ;  
create index ixprintjobsprintport on tblPrintjobs (printport) ; 
create index ixprintjobsprintqueue on tblPrintjobs (printqueue) ;   
create index ixprintjobsrecordnumber on tblPrintjobs (recordnumber) ;  
create index ixprintjobssize on tblPrintjobs (size) ;  
create index ixprintjobsusertype on tblPrintjobs (usertype) ;   
create index ixPrintjobsDescpagesDocumentname on tblPrintjobs (descpages, documentname) ;
create index ixPrintjobsHostnamePrintqueueDesceventdateDesceventtime on tblPrintjobs (hostname, printqueue, desceventdate, desceventtime) ;
create index ixPrintjobsLoginDescEventdateDesceventtime on tblPrintjobs (loginuser, desceventdate, desceventtime) ;
/x-tad-bigger

On 21 Jun 2005, at 16:42, Tom Lane wrote:

Yves Vindevogel [EMAIL PROTECTED]> writes:
Can anyone explain me this ?

rvponp=# explain select * from tblprintjobs order by loginuser,  
desceventdate, desceventtime offset 25 limit 25 ;
QUERY PLAN
 
---
Limit  (cost=349860.62..349860.68 rows=25 width=206)
->  Sort  (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
->  Seq Scan on tblprintjobs  (cost=0.00..25589.36 rows=622236 width=206)
(4 rows)


Do you have an index matching that sort key?  I'd certainly expect the
above to use it if it were there.  For the full table case it's not so
clear --- an indexscan isn't always better.

regards, tom lane


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

Pasted Graphic 2.tiff>

Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller---(end of broadcast)---
TIP 8: explain analyze is your friend

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Yves Vindevogel
rvponp=# explain analyze select * from tblPrintjobs order by loginuser, desceventdate, desceventtime ; 
QUERY PLAN   

Sort  (cost=345699.06..347256.49 rows=622972 width=203) (actual time=259438.952..268885.586 rows=622972 loops=1)
Sort Key: loginuser, desceventdate, desceventtime
->  Seq Scan on tblprintjobs  (cost=0.00..25596.72 rows=622972 width=203) (actual time=21.155..8713.810 rows=622972 loops=1)
Total runtime: 271583.422 ms
(4 rows)

On 21 Jun 2005, at 16:42, John A Meinel wrote:

Yves Vindevogel wrote:

Hi,

I have a very simple query on a big table. When I issue a limit and/or offset clause, the query is not using the index.
Can anyone explain me this ?

You didn't give enough information. What does you index look like that you are expecting it to use?
Generally, you want to have matching columns. So you would want
CREATE INDEX blah ON tblprintjobs(loginuser, desceventdate, desceventtime);

Next, you should post EXPLAIN ANALYZE instead of regular explain, so we can have an idea if the planner is actually making correct estimations.

John
=:->

rvponp=# explain select * from tblprintjobs order by loginuser, desceventdate, desceventtime offset 25 limit 25 ;
QUERY PLAN
--- 
Limit (cost=349860.62..349860.68 rows=25 width=206)
-> Sort (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
-> Seq Scan on tblprintjobs (cost=0.00..25589.36 rows=622236 width=206)
(4 rows)

rvponp=# explain select * from tblprintjobs order by loginuser, desceventdate, desceventtime ;
QUERY PLAN
- 
Sort (cost=349860.56..351416.15 rows=622236 width=206)
Sort Key: loginuser, desceventdate, desceventtime
-> Seq Scan on tblprintjobs (cost=0.00..25589.36 rows=622236 width=206)
(3 rows)

Met vriendelijke groeten,
Bien à vous,
Kind regards,

*Yves Vindevogel*
*Implements*




Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread John A Meinel

Yves Vindevogel wrote:


rvponp=# explain analyze select * from tblPrintjobs order by
loginuser, desceventdate, desceventtime ;
QUERY PLAN


Sort (cost=345699.06..347256.49 rows=622972 width=203) (actual
time=259438.952..268885.586 rows=622972 loops=1)
Sort Key: loginuser, desceventdate, desceventtime
- Seq Scan on tblprintjobs (cost=0.00..25596.72 rows=622972
width=203) (actual time=21.155..8713.810 rows=622972 loops=1)
Total runtime: 271583.422 ms
(4 rows)



Can you post it with the limit? I realize the query takes a long time,
but that is the more important query to look at.

Also, just as a test, if you can, try dropping most of the indexes
except for the important one. It might be that the planner is having a
hard time because there are too many permutations to try.
I believe if you drop the indexes inside a transaction, they will still
be there for other queries, and if you rollback instead of commit, you
won't lose anything.

BEGIN;
DROP INDEX ...
EXPLAIN ANALYZE SELECT *...
ROLLBACK;

John
=:-



signature.asc
Description: OpenPGP digital signature


[PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread Yves Vindevogel
Hi,

I have another question regarding indexes.

I have a table with a lot of indexes on it.  Those are needed to perform my searches.
Once a day, a bunch of records is inserted in my table. 

Say, my table has 1.000.000 records and I add 10.000 records (1% new)
What would be faster.

1) Dropping my indexes and recreating them after the inserts
2) Just inserting it and have PG manage the indexes

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller
---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] slow growing table

2005-06-21 Thread John A Meinel

Jone C wrote:


On second thought... Does a VACUUM FULL help? If so, you might want to
increase your FSM settings.




Thank you for the reply, sorry for delay I was on holiday.

I tried that it had no effect. I benchmarked 2x before, peformed
VACUUM FULL on the table in question post inserts, then benchmarked 2x
after. Same results...

Should I try your suggestion on deleting the indexes? This table needs
to be accessible for reads at all times however though...

thank you kindly




I believe dropping an index inside a transaction is only visible to that
transaction. (Can someone back me up on this?)
Which means if you did:

BEGIN;
DROP INDEX index in question;
CREATE INDEX same index ON same stuff;
COMMIT;

The only problem is that if you are using a unique or primary key index,
a foreign key which is referencing that index would have to be dropped
and re-created as well. So you could have a pretty major cascade effect.

A better thing to do if your table only has one (or at least only a few)
indexes, would be to CLUSTER, which is effectively a VACUUM FULL + a
REINDEX (plus sorting the rows so that they are in index order). It
holds a full lock on the table, and takes a while, but when you are
done, things are cleaned up quite a bit.

You might also try just a REINDEX on the indexes in question, but this
also holds a full lock on the table. (My DROP + CREATE might also as
well, I'm not really sure, I just think of it as a way to recreate
without losing it for other transactions)

John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread John A Meinel

Yves Vindevogel wrote:


Hi,

I have another question regarding indexes.

I have a table with a lot of indexes on it. Those are needed to 
perform my searches.

Once a day, a bunch of records is inserted in my table.

Say, my table has 1.000.000 records and I add 10.000 records (1% new)
What would be faster.

1) Dropping my indexes and recreating them after the inserts
2) Just inserting it and have PG manage the indexes

Met vriendelijke groeten,
Bien à vous,
Kind regards,

*Yves Vindevogel*
*Implements*



I'm guessing for 1% new that (2) would be faster.
John
=:-




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Tom Lane
Yves Vindevogel [EMAIL PROTECTED] writes:
   create index ixPrintjobsLoginDescEventdateDesceventtime on 
 tblPrintjobs (loginuser, desceventdate, desceventtime) ;

Hmm, that certainly looks like it should match the query.  What happens
to the EXPLAIN output if you do set enable_sort = false?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread Yves Vindevogel
And, after let's say a week, would that index still be optimal or would it be a good idea to drop it in the weekend and recreate it.

On 21 Jun 2005, at 17:22, John A Meinel wrote:

Yves Vindevogel wrote:

Hi,

I have another question regarding indexes.

I have a table with a lot of indexes on it. Those are needed to perform my searches.
Once a day, a bunch of records is inserted in my table.

Say, my table has 1.000.000 records and I add 10.000 records (1% new)
What would be faster.

1) Dropping my indexes and recreating them after the inserts
2) Just inserting it and have PG manage the indexes

Met vriendelijke groeten,
Bien à vous,
Kind regards,

*Yves Vindevogel*
*Implements*


I'm guessing for 1% new that (2) would be faster.
John
=:->



Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread John A Meinel

Yves Vindevogel wrote:


And, after let's say a week, would that index still be optimal or
would it be a good idea to drop it in the weekend and recreate it.



It depends a little bit on the postgres version you are using. If you
are only ever adding to the table, and you are not updating it or
deleting from it, I think the index is always optimal.
Once you start deleting from it there are a few cases where older
versions would not properly re-use the empty entries, requiring a
REINDEX. (Deleting low numbers and always adding high numbers was one of
the cases)

However, I believe that as long as you vacuum often enough, so that the
system knows where the unused entries are, you don't ever have to drop
and re-create the index.

John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread Amit V Shah
First of all, thanks to everyone for helping me !

Looks like materialized views will be my answer.

Let me explain my situation a little better. 

The repository table looks like this -

create table repository (statName varchar(45), statValue varchar(45),
metaData varchar(45));

MetaData is a foreign key to other tables. 

The screens show something like following -

Screen 1 -
Stat1  Stat2Stat3
Value  ValueValue
Value  ValueValue



Screen 2 -
Stat3  Stat1Stat5
Value  ValueValue
Value  ValueValue


etc. etc.

The data is grouped based on metaData. 

Updates will only occur nightly and can be controlled. But selects occur
9-5.

One of the compelling reasons I feel is that to create such tables out of
repository tables, the query would be very complicated. If I have a
materialized view, I think the information will be cached. 

Another concern I have is load. If I have lot of simultaneous users creating
such wide tables out of one long table, that would generate substantial
load on the servers. ??

I like the materialized view solution better than having other tables for
each screen. (Would be nice if someone can comment on that)

So that is my situation. 

Again, thanks everyone for helping
Amit

-Original Message-
From: John A Meinel [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 21, 2005 11:01 AM
To: Amit V Shah
Cc: '[EMAIL PROTECTED]'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Do Views execute underlying query everytime ??


Amit V Shah wrote:

After I sent out this email, I found this article from google

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Looks like we can control as to when the views refresh... I am still kind
of
confused, and would appreciate help !!

The create/drop table does sound a solution that can work, but the thing is
I want to get manual intervention out, and besides, my work flow is very
complex so this might not be an option for me :-(

Thanks,
Amit


Just to make it clear, a view is not the same as a materialized view.
A view is just a set of rules to the planner so that it can simplify
interactions with the database. A materialized view is a query which has
been saved into a table.

To set it up properly, really depends on what your needs are.

   1. How much time can elapse between an update to the system, and an
  update to the materialized views?
   2. How many updates / (sec, min, hour, month) do you expect. Is
  insert performance critical, or secondary.

For instance, if you get a lot of updates, but you can have a 1 hour lag
between the time a new row is inserted and the view is updated, you can
just create a cron job that runs every hour to regenerate the
materialized view.

If you don't get many updates, but you need them to show up right away,
then you can add triggers to the affected tables, such that
inserting/updating to a specific table causes an update to the
materialized view.

There are quite a few potential tradeoffs. Rather than doing a
materialized view, you could just improve your filters. If you are doing
a query to show people the results, you generally have some sort of
upper bound on how much data you can display. Humans don't like reading
more than 100 or 1000 rows. So create your normal query, and just take
on a LIMIT 100 at the end. If you structure your query properly, and
have appropriate indexes, you should be able to make the LIMIT count,
and allow you to save a lot of overhead of generating rows that you
don't use.

I would probably start by posting the queries you are currently using,
along with an EXPLAIN ANALYZE, and a description of what you actually
need from the query. Then this list can be quite helpful in
restructuring your query to make it faster.

John
=:-


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] unsubscribe

2005-06-21 Thread Don Vaillancourt





-- 


  

  
  Don
Vaillancourt
Director of Software Development 
  


  
  WEB
IMPACT INC.
  phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com
address: http://www.mapquest.ca
  
  
   
  


  
  This
email message is intended only for the addressee(s) and contains
information that may be
confidential and/or copyright. 
  
If you are not the intended
recipient please notify the sender by reply email and
immediately delete this email. 
  
Use, disclosure or reproduction of this
email by anyone other than the
intended recipient(s) is strictly prohibited. No representation is made
that this email or any
attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient. 
  

  






Re: [PERFORM] Trying to figure out pgbench

2005-06-21 Thread Mohan, Ross
I had a similar experience. 

regardless of scaling, etc, I got same results. almost like flags
are not active. 

did

pgbench -I template1
and
pgbench -c 10 -t 50 -v -d 1 

and played around from there

This is on IBM pSeries, AIX5.3, PG8.0.2

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Yu
Sent: Tuesday, June 21, 2005 12:05 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Trying to figure out pgbench


My Dual Core Opteron server came in last week. I tried to do some 
benchmarks with pgbench to get some numbers on the difference between 
1x1 - 2x1 - 2x2 but no matter what I did, I kept getting the same TPS 
on all systems. Any hints on what the pgbench parameters I should be using?

In terms of production use, it definitely can handle more load. 
Previously, Apache/Perl had to run on a separate server to avoid a ~50% 
penalty. Now, the numbers are +15% performance even with Apache/Perl 
running on the same box as PostgreSQL. How much more load of course is 
what I'd like to quantify.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Trying to figure out pgbench

2005-06-21 Thread William Yu
My Dual Core Opteron server came in last week. I tried to do some 
benchmarks with pgbench to get some numbers on the difference between 
1x1 - 2x1 - 2x2 but no matter what I did, I kept getting the same TPS 
on all systems. Any hints on what the pgbench parameters I should be using?


In terms of production use, it definitely can handle more load. 
Previously, Apache/Perl had to run on a separate server to avoid a ~50% 
penalty. Now, the numbers are +15% performance even with Apache/Perl 
running on the same box as PostgreSQL. How much more load of course is 
what I'd like to quantify.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread Yves Vindevogel
I only add records, and most of the values are random
Except the columns for dates, 

On 21 Jun 2005, at 17:49, John A Meinel wrote:

Yves Vindevogel wrote:

And, after let's say a week, would that index still be optimal or
would it be a good idea to drop it in the weekend and recreate it.


It depends a little bit on the postgres version you are using. If you
are only ever adding to the table, and you are not updating it or
deleting from it, I think the index is always optimal.
Once you start deleting from it there are a few cases where older
versions would not properly re-use the empty entries, requiring a
REINDEX. (Deleting low numbers and always adding high numbers was one of
the cases)

However, I believe that as long as you vacuum often enough, so that the
system knows where the unused entries are, you don't ever have to drop
and re-create the index.

John
=:->


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread John A Meinel

Yves Vindevogel wrote:


I only add records, and most of the values are random
Except the columns for dates, 


I doubt that you would need to recreate indexes. That really only needs
to be done in pathological cases, most of which have been fixed in the
latest postgres.

If you are only inserting (never updating or deleting), the index can
never bloat, since you are only adding new stuff.
(You cannot get dead items to bloat your index if you never delete
anything.)

John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread PFC


From what you say I understand that you have a huge table like this :

( name, value, id )

And you want to make statistics on (value) according to (name,id).

***

First of all a materialized view doen't exist in postgres, it's just a  
word to name a table automatically updated by triggers.

An example would be like this :

table orders (order_id, ...)
table ordered_products (order_id, product_id, quantity, ...)

If you want to optimize the slow request :
SELECT product_id, sum(quantity) as total_quantity_ordered
FROM ordered_products GROUP BY product_id

You would create a cache table like this :
table ordered_products_cache (product_id, quantity)

And add triggers ON UPDATE/INSERT/DELETE on table ordered_products to  
update ordered_products_cache accordingly.


Of course in this case everytime someone touches ordered_products, an  
update is issued to ordered_products_cache.


***

In your case I don't think that is the solution, because you do big  
updates. With triggers this would mean issuing one update of your  
materialized view per row in your big update. This could be slow.


In this case you might want to update the cache table in one request  
rather than doing an awful lot of updates.


So you have two solutions :

1- Junk it all and rebuild it from scratch (this can be faster than it  
seems)
2- Put the rows to be added in a temporary table, update the cache table  
considering the difference between this temporary table and your big  
table, then insert the rows in the big table.


This is the fastest solution but it requires a bit more coding (not THAT  
much though).


***

As for the structure of your cache table, you want :


Screen 1 -
Stat1  Stat2Stat3
Value  ValueValue
Value  ValueValue



Screen 2 -
Stat3  Stat1Stat5
Value  ValueValue
Value  ValueValue

You have several lines, so what is that ? is it grouped by date ? I'll  
presume it is.


So your screens basically show a subset of :

SELECT date, name, sum(value) FROM table GROUP BY name, date

This is what you should put in your summary table.
Then index it on (date,name) and build your screens with :

SELECT * FROM summary WHERE (date BETWEEN .. AND ..) AND name IN (Stat3,   
Stat1,  Stat5)


That should be pretty easy ; you get a list of (name,date,value) that you  
just have to format accordingly on your screen.

























---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-21 Thread Josh Berkus
Alex,

 Downtime is something I'd rather avoid if possible. Do you think we
 will need to run VACUUM FULL occasionally? I'd rather not lock tables
 up unless I cant avoid it. We can probably squeeze an automated
 vacuum tied to our data inserters every now and then though.

As long as your update/deletes are less than 10% of the table for all time, 
you should never have to vacuum, pending XID wraparound.

 Is this an 8.0 thing? I don't have a pg_controldata from what I can
 see. Thats nice to hear though.

'fraid so, yes.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Do Views execute underlying query everytime ??

2005-06-21 Thread Matthew Nuzum
On 6/21/05, PFC [EMAIL PROTECTED] wrote:
...
 In your case I don't think that is the solution, because you do big
 updates. With triggers this would mean issuing one update of your
 materialized view per row in your big update. This could be slow.
 
 In this case you might want to update the cache table in one request
 rather than doing an awful lot of updates.
 
 So you have two solutions :
 
 1- Junk it all and rebuild it from scratch (this can be faster than it
 seems)
 2- Put the rows to be added in a temporary table, update the cache table
 considering the difference between this temporary table and your big
 table, then insert the rows in the big table.
 
 This is the fastest solution but it requires a bit more coding (not THAT
 much though).
 
Amit,

I understand your desire to not need any manual intervention...

I don't know what OS you use, but here are two practical techniques
you can use to achieve the above solution suggested by PFC:

a: If you are on a Unix like OS such as Linux of Free BSD you have the
beautiful cron program that will run commands nightly.

b: If you are on Windows you have to do something else. The simplest
solution I've found is called pycron (easily locatable by google)
and is a service that emulates Unix cron on windows (bypassing a lot
of the windows scheduler hassle).

Now, using either of those solutions, let's say at 6:00 am you want to
do your batch query.

1. Put the queries you want into a text file EXACTLY as you would type
them using psql and save the text file. For example, the file may be
named create_mat_view.txt.
2. Test them by doing this from a command prompt: psql dbname 
create_mat_view.txt
3. Create a cron entry to run the command once a day, it might look like this:
0 6 * * * /usr/bin/psql dbname  /home/admin/create_mat_view.txt
or maybe like this:
0 6 * * * C:\Program Files\PostgreSQL\8.0\psql.exe dbname 
C:\create_mat_view.txt

I hope this helps,
-- 
Matthew Nuzum
www.bearfruit.org

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Configurator project launched

2005-06-21 Thread Josh Berkus
Folks,

OK, I've checked in my first code module and the configurator project is 
officially launched.  Come join us at 
www.pgfoundry.org/projects/configurator

Further communications will be on the Configurator mailing list only.

from the spec:

What is the Configurator, and Why do We Need It?
-

The Configurator is a set of Perl scripts and modules which allow users and
installation programs to write a reasonable postgresql.conf for PostgreSQL
performance based on the answers to some relatively simple questions.  Its
purpose is to provide an option between the poor-performing default
configuration, and the in-depth knowledge required for hand-tuning.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Querying 19million records very slowly

2005-06-21 Thread Kjell Tore Fossbakk
Hello!I use FreeBSD 4.11 with PostGreSQL 7.3.8.I got a huge database with roughly 19 million records. There is just onetable, with a time field, a few ints and a few strings.table testfields time (timestamp), source (string), destination (string), p1 (int),
p2 (int)I have run VACUUM ANALYZE ;I have created indexes on every field, but for some reason my postgreserver wants to use a seqscan, even tho i know a indexed scan would bemuch faster.
create index test_time_idx on test (time) ;create index test_source_idx on test (source) ;create index test_destination_idx on test (destination) ;create index test_p1_idx on test (p1) ;create index test_p2_idx on test (p2) ;
What is really strange, is that when i query a count(*) on one of the intfields (p1), which has a very low count, postgre uses seqscan. In anothercount on the same int field (p1), i know he is giving about 
2.2 millionhits, but then he suddenly uses seqscan, instead of a indexed one. Isn'tthe whole idea of indexing to increase performance in large queries.. Tomake sort of a phonebook for the values, to make it faster to look up what
ever you need... This just seems opposite..Here is a EXPLAIN of my querydatabase= explain select date_trunc('hour', time),count(*) as total fromtest where p1=53 and time  now() - interval '24 hours' group by
date_trunc order by date_trunc ; QUERY PLAN--Aggregate (cost=727622.61..733143.23
 rows=73608 width=8) - Group (cost=727622.61..731303.02 rows=736083 width=8) - Sort (cost=727622.61..729462.81 rows=736083 width=8) Sort Key: date_trunc('hour'::text, time)
 - Seq Scan on test (cost=0.00..631133.12 rows=736083width=8) Filter: ((p1 = 53) AND (time  (now() - '1day'::interval)))(6 rows)
database= drop INDEX test_TABULATORtest_source_idx test_destination_idx test_p1_idx test_p2_idx test_time_idxAfter all this, i tried to set enable_seqscan to off and
enable_nestedloops to on. This didnt help much either. The time to run thequery is still in minutes. My results are the number of elements for eachhour, and it gives about 1000-2000 hits per hour. I have read somewhere,
about PostGreSQL, that it can easily handle 100-200million records. Andwith the right tuned system, have a great performance.. I would like tolearn how :)I also found an article on a page(
http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php):Tip #11: Don't bother indexing columns with huge numbers of records and asmall range of values, such as BOOLEAN columns.This tip, regretfully, is perhaps the only tip where I cannot provide a
good, real-world example from my work. So I'll give you a hypotheticalsituation instead:Imagine that you have a database table with a list of every establishmentvending ice cream in the US. A simple example might look like:
Where there were almost 1 million rows, but due to simplistic data entry,only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and3-OTHER) which are relatively evenly distributed. In this hypothetical
situation, you might find (with testing using EXPLAIN) that an index ontype is ignored and the parser uses a seq scan (or table scan) instead. This is because a table scan can actually be faster than an index scan in
this situation. Thus, any index on type should be dropped.Certainly, the boolean column (active) requires no indexing as it has onlytwo possible values and no index will be faster than a table scan.
Then I ask, what is useful with indexing, when I can't use it on a VERYlarge database? It is on my 15 million record database it takes for everto do seqscans over and over again... This is probably why, as i mentioned
earlier, the reason (read the quote) why he chooses a full scan and not aindexed one...So what do I do? :confused:I'v used SQL for years, but never in such a big scale. Thus, not having tolearn how to deal with large number of records. Usually a maximum of 1000
records. Now, with millions, I need to learn a way to make my suckyqueries better.Im trying to learn more about tuning my system, makeing better queries andsuch. I'v found some documents on the Internet, but far from the best.
Feedback most appreciated!Regards,a learning PostGreSQL user


Re: [PERFORM] Another question on indexes (drop and recreate)

2005-06-21 Thread Yves Vindevogel
Ok, tnx !!

On 21 Jun 2005, at 18:54, John A Meinel wrote:

Yves Vindevogel wrote:

I only add records, and most of the values are random
Except the columns for dates, 

I doubt that you would need to recreate indexes. That really only needs
to be done in pathological cases, most of which have been fixed in the
latest postgres.

If you are only inserting (never updating or deleting), the index can
never bloat, since you are only adding new stuff.
(You cannot get dead items to bloat your index if you never delete
anything.)

John
=:->


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be
x-tad-smaller
First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi./x-tad-smaller

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Tobias Brox
[John A Meinel - Tue at 10:14:24AM -0500]
 I believe if you drop the indexes inside a transaction, they will still
 be there for other queries, and if you rollback instead of commit, you
 won't lose anything.

Has anyone tested this?

(sorry, I only have the production database to play with at the moment,
and I don't think I should play with it ;-)

-- 
Tobias Brox, Beijing


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Prepared statements vs. Stored Procedures

2005-06-21 Thread Oliver Crosby
I'm hoping someone can offer some advice here.

I have a large perl script that employs prepared statements to do all its queries. I'm looking at using stored procedures to improve performance times for the script. Would making a stored procedure to replace each prepared statement be worthwhile? If not, when could I use stored procedures to improve performance?


Thanks in advance.


Re: [PERFORM] Querying 19million records very slowly

2005-06-21 Thread Paul Ramsey

Some tips:

- EXPLAIN ANALYZE provides a more useful analysis of a slow query, 
because it gives both the estimate and actual times/rows for each step 
in the plan.


- The documentation is right: rows with little variation are pretty 
useless to index. Indexing is about selectivity, reducing the amount 
of stuff the database has to read off the the disk.


- You only have two things in your WHERE clause, so that is where the 
most important indexes reside. How many of your rows have p1=53? How 
many of your rows have happened in the last day? If your answer is a 
lot then the indexes are not going to help: PostgreSQL will be more 
efficient scanning every tuple than it will be jumping around the index 
structure for a large number of tuples.


- If neither time nor p1 are particularly selective individually, but 
they are selective when taken together, try a multi-key index on them both.


Paul

Kjell Tore Fossbakk wrote:


Hello!

I use FreeBSD 4.11 with PostGreSQL 7.3.8.

I got a huge database with roughly 19 million records. There is just one
table, with a time field, a few ints and a few strings.


table test
fields time (timestamp), source (string), destination (string), p1 (int),
p2 (int)


I have run VACUUM ANALYZE ;

I have created indexes on every field, but for some reason my postgre
server wants to use a seqscan, even tho i know a indexed scan would be
much faster.


create index test_time_idx on test (time) ;
create index test_source_idx on test (source) ;
create index test_destination_idx on test (destination) ;
create index test_p1_idx on test (p1) ;
create index test_p2_idx on test (p2) ;



What is really strange, is that when i query a count(*) on one of the int
fields (p1), which has a very low count, postgre uses seqscan. In another
count on the same int field (p1), i know he is giving about 2.2 million
hits, but then he suddenly uses seqscan, instead of a indexed one. Isn't
the whole idea of indexing to increase performance in large queries.. To
make sort of a phonebook for the values, to make it faster to look up what
ever you need... This just seems opposite..

Here is a EXPLAIN of my query

database= explain select date_trunc('hour', time),count(*) as total from
test where p1=53 and time  now() - interval '24 hours' group by
date_trunc order by date_trunc ;
QUERY PLAN
--
Aggregate  (cost=727622.61..733143.23 rows=73608 width=8)
   -  Group  (cost=727622.61..731303.02 rows=736083 width=8)
 -  Sort  (cost=727622.61..729462.81 rows=736083 width=8)
   Sort Key: date_trunc('hour'::text, time)
   -  Seq Scan on test  (cost=0.00..631133.12 rows=736083
width=8)
 Filter: ((p1 = 53) AND (time  (now() - '1
day'::interval)))
(6 rows)




database= drop INDEX test_TABULATOR
test_source_idx test_destination_idxtest_p1_idx   
test_p2_idx   test_time_idx



After all this, i tried to set enable_seqscan to off and
enable_nestedloops to on. This didnt help much either. The time to run the
query is still in minutes. My results are the number of elements for each
hour, and it gives about 1000-2000 hits per hour. I have read somewhere,
about PostGreSQL, that it can easily handle 100-200million records. And
with the right tuned system, have a great performance.. I would like to
learn how :)

I also found an article on a page
( http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php):
Tip #11:  Don't bother indexing columns with huge numbers of records and a
small range of values, such as BOOLEAN columns.

This tip, regretfully, is perhaps the only tip where I cannot provide a
good, real-world example from my work.  So I'll give you a hypothetical
situation instead:

Imagine that you have a database table with a list of every establishment
vending ice cream in the US.  A simple example might look like:

Where there were almost 1 million rows, but due to simplistic data entry,
only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and
3-OTHER) which are relatively evenly distributed.  In this hypothetical
situation, you might find (with testing using EXPLAIN) that an index on
type is ignored and the parser uses a seq scan (or table scan) instead.
This is because a table scan can actually be faster than an index scan in
this situation.  Thus, any index on type should be dropped.

Certainly, the boolean column (active) requires no indexing as it has only
two possible values and no index will be faster than a table scan.


Then I ask, what is useful with indexing, when I can't use it on a VERY
large database? It is on my 15 million record database it takes for ever
to do seqscans over and over again... This is probably why, as i mentioned
earlier, the reason (read the quote) why he chooses a full scan and not a
indexed one...

So what do I do? :confused:

I'v used SQL for years, but never in 

Re: [PERFORM] Prepared statements vs. Stored Procedures

2005-06-21 Thread Tobias Brox
[Oliver Crosby - Tue at 03:46:03PM -0400]
 I'm hoping someone can offer some advice here.
  I have a large perl script that employs prepared statements to do all its 
 queries. I'm looking at using stored procedures to improve performance times 
 for the script. Would making a stored procedure to replace each prepared 
 statement be worthwhile? If not, when could I use stored procedures to 
 improve performance?
  Thanks in advance.

My gut feeling says that if you are only doing read-operations there are
none or almost none benefits with stored procedures.

One argument we used for not looking much into stored procedures, was
that we expect the database to become the bottleneck if we get too much
activity.  At the application side, we can always expand by adding more
boxes, but the database, beeing the hub of the system, cannot easily be
expanded (we can tweak and tune and upgrade the whole box, and
eventually at some point we believe we will need to put old data at a
separate database, and also make a replica for heavy report queries)

If you have loads of data going from the database to the application, a
little bit of light processing done on the data, and then data going
back to the database server, then I guess stored procedures would be
better.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Michael Fuhr
On Tue, Jun 21, 2005 at 09:46:39PM +0200, Tobias Brox wrote:
 [John A Meinel - Tue at 10:14:24AM -0500]
  I believe if you drop the indexes inside a transaction, they will still
  be there for other queries, and if you rollback instead of commit, you
  won't lose anything.
 
 Has anyone tested this?

Observations from tests with 8.0.3:

DROP INDEX acquires an AccessExclusiveLock on the table and on the
index.  This will cause the transaction executing the DROP INDEX
to block until no other transaction holds any kind of lock on either,
and once the locks are acquired, no other transaction will be able
to access the table or the index until the transaction doing the
DROP INDEX commits or rolls back.  Rolling back leaves the index
in place.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Tom Lane
Tobias Brox [EMAIL PROTECTED] writes:
 [John A Meinel - Tue at 10:14:24AM -0500]
 I believe if you drop the indexes inside a transaction, they will still
 be there for other queries, and if you rollback instead of commit, you
 won't lose anything.

 Has anyone tested this?

Certainly.  Bear in mind though that DROP INDEX will acquire exclusive
lock on the index's table, so until you roll back, no other transaction
will be able to touch the table at all.  So the whole thing may be a
nonstarter in a production database anyway :-(.  You can probably get
away with
BEGIN;
DROP INDEX ...
EXPLAIN ...
ROLLBACK;
if you fire it from a script rather than by hand --- but EXPLAIN
ANALYZE might be a bad idea ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Querying 19million records very slowly

2005-06-21 Thread PFC


use CURRENT_TIME which is a constant instead of now() which is not  
considered constant... (I think)


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-21 Thread Alex Stapleton


On 21 Jun 2005, at 18:13, Josh Berkus wrote:


Alex,



Downtime is something I'd rather avoid if possible. Do you think we
will need to run VACUUM FULL occasionally? I'd rather not lock tables
up unless I cant avoid it. We can probably squeeze an automated
vacuum tied to our data inserters every now and then though.



As long as your update/deletes are less than 10% of the table for  
all time,

you should never have to vacuum, pending XID wraparound.



Hmm, I guess as we have hundreds of millions of rows, and when we do  
delete things, it's only a few thousand, and rarely. VACUUMing  
shouldn't need to happen too often. Thats good. Thanks a lot for the  
advice.



Is this an 8.0 thing? I don't have a pg_controldata from what I can
see. Thats nice to hear though.



'fraid so, yes.


Bloody Debian stable. I might have to experiment with building from  
source or using alien on debian to convert the rpms. Fun. Oh well.



--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-21 Thread Steinar H. Gunderson
On Tue, Jun 21, 2005 at 11:08:43PM +0100, Alex Stapleton wrote:
 Bloody Debian stable. I might have to experiment with building from  
 source or using alien on debian to convert the rpms. Fun. Oh well.

Or just pull in postgresql-8.0 from unstable; sid is close enough to sarge
for it to work quite well in practice, AFAIK.

You'll lose the security support, though, but you will with building from
source or using alien anyhow :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Configurator project launched

2005-06-21 Thread Joshua D. Drake

Josh Berkus wrote:

Greg,



Not sure how far along you are, but I've been writing some really nifty
extensions to DBD::Pg that allow easy querying of all the current
run-time settings. Could be very useful to this project, seems to me. If
you're interested in possibly using it, let me know, I can bump it up on
my todo list.



Um, can't we just get that from pg_settings?

Anyway, I'll be deriving settings from the .conf file, since most of the 
time the Configurator will be run on a new installation.


Aren't most of the settings all kept in the SHOW variables anyway?

Sincerely,

Joshua D. Drake







--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Configurator project launched

2005-06-21 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 Um, can't we just get that from pg_settings?

 Anyway, I'll be deriving settings from the .conf file, since most of the
 time the Configurator will be run on a new installation.

 Aren't most of the settings all kept in the SHOW variables anyway?

As I said, it may not be applicable to this project, but thought I would
offer. One gotcha the module makes transparent is that in older versions of
PG, the variables are returned in a different way (via RAISE). My module
will allow you to get the configuration for any connected database, for
any configuration file, and the defaults for any known version, and do
quick comparisons between them all. So you could use it to see what has
changed between a particular server and its conf file, or the differences
between two conf files, or the differences between two databases, or even show
what has changed in the default conf file from 7.4.7 and 8.0.1. It will also
allow you to rewrite the conf files in a standard way.

I'm hoping to roll this into 1.44 or 1.45 or DBD::Pg.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200506212046
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iD8DBQFCuLWDvJuQZxSWSsgRAjUVAJ42oeveZBuutFo1G3Cs/3dRZWjKggCfS1Yf
Tv5RWiG9s8Ucv/t/2HZ4/R8=
=1eap
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Tobias Brox
[Tom Lane - Tue at 05:20:07PM -0400]
 
 Certainly.  Bear in mind though that DROP INDEX will acquire exclusive
 lock on the index's table, so until you roll back, no other transaction
 will be able to touch the table at all.  So the whole thing may be a
 nonstarter in a production database anyway :-(.

That's what I was afraid of.  I was running psql at the production DB
without starting a transaction (bad habit, I know) and tried to drop an
index there, but I had to cancel the transaction, it took forever and
in the same time blocking all the revenue-generating activity.

-- 
Tobias Brox, +86-13521622905
Nordicbet, IT dept

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match