[PERFORM] please help with the explain analyze plan

2009-02-11 Thread Rajesh Kumar Mallah
Dear friends,

I have explain analyze of two queries

explain analyze SELECT count(*) from general.rfis where  1=1   and
inquiry_type = 'BUY'and receiver_uid=1320721;
(7 ms)
http://pastebin.com/m5297e03c

explain analyze SELECT count(*) from general.rfis where  1=1   and
inquiry_type = 'BUY'and receiver_uid=1320721 generated_date =
2251  and ;
(80 secs)
http://pastebin.com/d1e4bdea7


The table general.rfis is partitioned on generated_date and the
condition generated_date = 2251
was added with the intention to limit the number of (date based)
partitions that would be searched
by the query using the constraint exclusion facility. however as
evident the query has become very
slow as a result of this condition (even on primed caches).

can anyone kindly explain why the result was so counter intuitive ?

In particular where is most of the (80828.438 ms) spent on the plan
http://pastebin.com/d1e4bdea7 (reference to actual line number is appreciated)



structure of a typical partition (abridged)

 Table rfi_partitions.rfis_part_2009_01
Column |  Type  |
 Modifiers
---++---
 rfi_id| integer| not null default
nextval('general.rfis_rfi_id_seq'::regclass)
 sender_uid| integer| not null
 receiver_uid  | integer| not null
 subject   | character varying(100) | not null
 message   | text   | not null
 inquiry_type  | character varying(50)  | default
'BUY'::character varying
 inquiry_source| character varying(30)  | not null
 generated_date| integer| not null default
general.current_date_id()
Indexes:
rfis_part_2009_01_pkey PRIMARY KEY, btree (rfi_id)
rfis_part_2009_01_generated_date btree (generated_date)
rfis_part_2009_01_receiver_uid btree (receiver_uid) CLUSTER
rfis_part_2009_01_sender_uid btree (sender_uid)
Check constraints:
rfis_part_2009_01_generated_date_check CHECK (generated_date =
3289 AND generated_date = 3319)
rfis_part_2009_01_rfi_id_check CHECK (rfi_id = 12344252 AND
rfi_id = 12681399)
Inherits: rfis

regds
rajesh kumar mallah.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] please help with the explain analyze plan

2009-02-11 Thread Glyn Astill
Both queries are using your uid index on each of the partitions not 
generated_date, it's doing the generated_date with a filter on most of the 
partitions.

This is except for on partition part_2006_02 in the second query where it uses 
your generated date index - and that takes the 80 secs.

 -  Index Scan using rfis_part_2006_02_generated_date on 
rfis_part_2006_02 rfis  (cost=0.00..6.45 rows=1 width=0) (actual 
time=80827.207..80827.207 rows=0 loops=1)

Also the second query appears to go through a few more partitions than the 
first, i.e. part_2001_2004 and part_1005


--- On Wed, 11/2/09, Rajesh Kumar Mallah mallah.raj...@gmail.com wrote:

 From: Rajesh Kumar Mallah mallah.raj...@gmail.com
 Subject: [PERFORM] please help with the explain analyze plan
 To: pgsql-performance@postgresql.org
 Date: Wednesday, 11 February, 2009, 10:58 AM
 Dear friends,
 
 I have explain analyze of two queries
 
 explain analyze SELECT count(*) from general.rfis where 
 1=1   and
 inquiry_type = 'BUY'and receiver_uid=1320721;
 (7 ms)
 http://pastebin.com/m5297e03c
 
 explain analyze SELECT count(*) from general.rfis where 
 1=1   and
 inquiry_type = 'BUY'and receiver_uid=1320721
 generated_date =
 2251  and ;
 (80 secs)
 http://pastebin.com/d1e4bdea7
 
 
 The table general.rfis is partitioned on generated_date and
 the
 condition generated_date = 2251
 was added with the intention to limit the number of (date
 based)
 partitions that would be searched
 by the query using the constraint exclusion facility.
 however as
 evident the query has become very
 slow as a result of this condition (even on primed caches).
 
 can anyone kindly explain why the result was so counter
 intuitive ?
 
 In particular where is most of the (80828.438 ms) spent on
 the plan
 http://pastebin.com/d1e4bdea7 (reference to actual line
 number is appreciated)
 
 
 
 structure of a typical partition (abridged)
 
  Table rfi_partitions.rfis_part_2009_01
 Column |  Type  |
  Modifiers
 ---++---
  rfi_id| integer| not null
 default
 nextval('general.rfis_rfi_id_seq'::regclass)
  sender_uid| integer| not null
  receiver_uid  | integer| not null
  subject   | character varying(100) | not null
  message   | text   | not null
  inquiry_type  | character varying(50)  | default
 'BUY'::character varying
  inquiry_source| character varying(30)  | not null
  generated_date| integer| not null
 default
 general.current_date_id()
 Indexes:
 rfis_part_2009_01_pkey PRIMARY KEY, btree
 (rfi_id)
 rfis_part_2009_01_generated_date btree
 (generated_date)
 rfis_part_2009_01_receiver_uid btree
 (receiver_uid) CLUSTER
 rfis_part_2009_01_sender_uid btree
 (sender_uid)
 Check constraints:
 rfis_part_2009_01_generated_date_check
 CHECK (generated_date =
 3289 AND generated_date = 3319)
 rfis_part_2009_01_rfi_id_check CHECK
 (rfi_id = 12344252 AND
 rfi_id = 12681399)
 Inherits: rfis
 
 regds
 rajesh kumar mallah.
 
 -- 
 Sent via pgsql-performance mailing list
 (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] please help with the explain analyze plan

2009-02-11 Thread Rajesh Kumar Mallah
thanks for the hint,

now the peak hour is over and the same scan is taking 71 ms in place of 8 ms
and the total query time is also acceptable. But it is surprising that
the scan was
taking so long consistently at that point of time. I shall test again
under similar
circumstance tomorrow.

Is it possible to enable block level statistics from the psql prompt
for a particular query
and see the results on the psql prompt ?

explain  analyze SELECT count(*) from
rfi_partitions.rfis_part_2006_02 where  generated_date = 2251 and
receiver_uid=1320721 ;

  QUERY PLAN

 Aggregate  (cost=6.44..6.45 rows=1 width=0) (actual
time=71.513..71.513 rows=1 loops=1)
   -  Index Scan using rfis_part_2006_02_generated_date on
rfis_part_2006_02  (cost=0.00..6.43 rows=1 width=0) (actual
time=71.508..71.508 rows=0 loops=1)
 Index Cond: (generated_date = 2251)
 Filter: (receiver_uid = 1320721)
 Total runtime: 71.553 ms
(5 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] please help with the explain analyze plan

2009-02-11 Thread Rajesh Kumar Mallah
hurray!
ANALYZING changed the plan

I was not expecting the plan to change because
the partition of 2006_02 is supposed to be
dormant. maybe the partition was never analyzed.

But still question remains, why the time taken was
in orders of magnitude higher in loaded condition.



tradein_clients= explain   SELECT count(*) from
rfi_partitions.rfis_part_2006_02 where  generated_date = 2251 and
receiver_uid=1320721 ;
   QUERY PLAN

 Aggregate  (cost=6.44..6.45 rows=1 width=0)
   -  Index Scan using rfis_part_2006_02_generated_date on
rfis_part_2006_02  (cost=0.00..6.43 rows=1 width=0)
 Index Cond: (generated_date = 2251)
 Filter: (receiver_uid = 1320721)
(4 rows)
tradein_clients= ANALYZE rfi_partitions.rfis_part_2006_02;
ANALYZE
tradein_clients= explain   SELECT count(*) from
rfi_partitions.rfis_part_2006_02 where  generated_date = 2251 and
receiver_uid=1320721 ;
  QUERY PLAN
--
 Aggregate  (cost=8.78..8.79 rows=1 width=0)
   -  Index Scan using rfis_part_2006_02_receiver_uid on
rfis_part_2006_02  (cost=0.00..8.77 rows=1 width=0)
 Index Cond: (receiver_uid = 1320721)
 Filter: (generated_date = 2251)
(4 rows)

tradein_clients= explain analyze  SELECT count(*) from
rfi_partitions.rfis_part_2006_02 where  generated_date = 2251 and
receiver_uid=1320721 ;

QUERY PLAN

 Aggregate  (cost=8.78..8.79 rows=1 width=0) (actual time=0.045..0.045
rows=1 loops=1)
   -  Index Scan using rfis_part_2006_02_receiver_uid on
rfis_part_2006_02  (cost=0.00..8.77 rows=1 width=0) (actual
time=0.042..0.042 rows=0 loops=1)
 Index Cond: (receiver_uid = 1320721)
 Filter: (generated_date = 2251)
 Total runtime: 0.082 ms
(5 rows)

tradein_clients=

On Wed, Feb 11, 2009 at 6:07 PM, Rajesh Kumar Mallah
mallah.raj...@gmail.com wrote:
 thanks for the hint,

 now the peak hour is over and the same scan is taking 71 ms in place of 8 
 ms
 and the total query time is also acceptable. But it is surprising that
 the scan was
 taking so long consistently at that point of time. I shall test again
 under similar
 circumstance tomorrow.

 Is it possible to enable block level statistics from the psql prompt
 for a particular query
 and see the results on the psql prompt ?

 explain  analyze SELECT count(*) from
 rfi_partitions.rfis_part_2006_02 where  generated_date = 2251 and
 receiver_uid=1320721 ;

  QUERY PLAN
 
  Aggregate  (cost=6.44..6.45 rows=1 width=0) (actual
 time=71.513..71.513 rows=1 loops=1)
   -  Index Scan using rfis_part_2006_02_generated_date on
 rfis_part_2006_02  (cost=0.00..6.43 rows=1 width=0) (actual
 time=71.508..71.508 rows=0 loops=1)
 Index Cond: (generated_date = 2251)
 Filter: (receiver_uid = 1320721)
  Total runtime: 71.553 ms
 (5 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Rajesh Kumar Mallah
Hi,

Is it possible to configure autovacuum to run only
during certain hours ? We are forced to keep
it off because it pops up during the peak
query hours.

Regds
rajesh kumar mallah.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Glyn Astill

 From: Rajesh Kumar Mallah mallah.raj...@gmail.com

 Is it possible to configure autovacuum to run only
 during certain hours ? We are forced to keep
 it off because it pops up during the peak
 query hours.

AFAIK not directly within the conf.

However you could probably set up a shell script to turn it on and off as and 
when required via cron; just change the setting to off in the conf file and 
reload.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Guillaume Cottenceau
Rajesh Kumar Mallah mallah.rajesh 'at' gmail.com writes:

 Hi,

 Is it possible to configure autovacuum to run only
 during certain hours ? We are forced to keep
 it off because it pops up during the peak
 query hours.

You'd rather configure the delaying process to not alter too much
performance. Autovacuum is really not targeted at running once a
day - I think it is partly because the old vacuuming was too weak
(because too seldom in many cases) that autovaccum was added.

A delaying configuration that works nicely for us without
impacting performance much (tested at the time of 8.2 to cause a
+40% response time during autovacuuming, compared to +300% with
more default values):

vacuum_cost_delay = 150
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 1000
autovacuum_vacuum_cost_delay = 300

(Disclaimer: IIRC, Alvaro doesn't like these figures at all)

Of course, these are good for us (bloat is very, very low and
performance impact is not experienced in production), not
necessarily for you. You should conduct your own tests.

Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/

-- 
Guillaume Cottenceau

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Rajesh Kumar Mallah
On Wed, Feb 11, 2009 at 7:11 PM, Guillaume Cottenceau g...@mnc.ch wrote:
 Rajesh Kumar Mallah mallah.rajesh 'at' gmail.com writes:

 Hi,

 Is it possible to configure autovacuum to run only
 during certain hours ? We are forced to keep
 it off because it pops up during the peak
 query hours.

 You'd rather configure the delaying process to not alter too much
 performance. Autovacuum is really not targeted at running once a
 day - I think it is partly because the old vacuuming was too weak
 (because too seldom in many cases) that autovaccum was added.

 A delaying configuration that works nicely for us without
 impacting performance much (tested at the time of 8.2 to cause a
 +40% response time during autovacuuming, compared to +300% with
 more default values):

 vacuum_cost_delay = 150
 vacuum_cost_page_hit = 1
 vacuum_cost_page_miss = 10
 vacuum_cost_page_dirty = 20
 vacuum_cost_limit = 1000
 autovacuum_vacuum_cost_delay = 300

why is it not a good idea to give end users control over when they
want to run it ?



 (Disclaimer: IIRC, Alvaro doesn't like these figures at all)

 Of course, these are good for us (bloat is very, very low and
 performance impact is not experienced in production), not
 necessarily for you. You should conduct your own tests.

 Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/

 --
 Guillaume Cottenceau


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Alvaro Herrera
Rajesh Kumar Mallah escribió:

 why is it not a good idea to give end users control over when they
 want to run it ?

It has never been said that we don't want to give the users control.
It's a matter of writing the code.  If you want to propose a patch to
add the feature, feel free.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Grzegorz Jaśkiewicz
On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
mallah.raj...@gmail.com wrote:

 vacuum_cost_delay = 150
 vacuum_cost_page_hit = 1
 vacuum_cost_page_miss = 10
 vacuum_cost_page_dirty = 20
 vacuum_cost_limit = 1000
 autovacuum_vacuum_cost_delay = 300

 why is it not a good idea to give end users control over when they
 want to run it ?

Effectively, you have control over autovacuum via these params.
You have to remember, that autovacuum doesn't cost much, and it makes
planner know more about data.
It's not there to clean up databases, as you might imagine - it is
there to update stats, and mark pages as free.

So make sure you tweak that config fist, because I have a funny
feeling that you just think that vacuuming bogs down your machine, and
_can_ be turned off without any bad consequences, which is simply not
true.


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Ben

On Feb 11, 2009, at 6:57 AM, Rajesh Kumar Mallah wrote:


why is it not a good idea to give end users control over when they
want to run it ?


There's nothing stopping you from just turning off autovacuum and  
running vacuum manually. The point of autovacuum is to vacuum as  
needed.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Scott Carey
On a large partitioned database, ordinary vacuum is a very very difficult 
option.

Most of the time on such a system, most tables are dormant with respect to 
writes and never need to be vacuumed.  A 'daily vacuum' is likely to take a 
full day to run on larger systems. Since ordinary vacuum can't be run on 
subsets of tables without explicitly naming them one at a time (can't just 
vacuum a schema, tablespace, or use a regex to match table names), good luck 
using it effectively if you have a few thousand tables in partitions.  You'll 
have to have application code or a script with knowledge of all the partition 
names and which are in need of an analyze/vacuum.

Autovacuum is good enough in recent versions to be tuned to have very low 
impact though.  If you have to, rather than stop and start it, just turn the 
delay or cost settings up and down during different parts of the day.  More 
than likely however, it will be able to keep up with a single set of settings.
In particular, rather than making the delay longer, make the appropriate cost 
larger -- page miss or page dirty affect how much I/O it will do, and page hit 
will mostly affect how much CPU it uses.

Perhaps a feature request is to have a version of the manual vacuum command 
that doesn't bother running on tables that autovacuum wouldn't touch due to 
insufficient data churn.  This way, at lean hours one can manually vacuum to 
help an autovacuum that was tuned for very low impact 'catch up'.
Also, if there was some way to make vacuum not stink so badly on tables that 
were just loaded with pg_load, where it causes huge write activity for tables 
that clearly have no bloat (I believe this is a hint bits thing?).

From: pgsql-performance-ow...@postgresql.org 
[pgsql-performance-ow...@postgresql.org] On Behalf Of Ben 
[be...@silentmedia.com]
Sent: Wednesday, February 11, 2009 8:46 AM
To: Rajesh Kumar Mallah
Cc: PostgreSQL Performance
Subject: Re: [PERFORM] scheduling autovacuum at lean hours only.

On Feb 11, 2009, at 6:57 AM, Rajesh Kumar Mallah wrote:

 why is it not a good idea to give end users control over when they
 want to run it ?

There's nothing stopping you from just turning off autovacuum and
running vacuum manually. The point of autovacuum is to vacuum as
needed.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Rajesh Kumar Mallah
On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz gryz...@gmail.com wrote:
 On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
 mallah.raj...@gmail.com wrote:

 vacuum_cost_delay = 150
 vacuum_cost_page_hit = 1
 vacuum_cost_page_miss = 10
 vacuum_cost_page_dirty = 20
 vacuum_cost_limit = 1000
 autovacuum_vacuum_cost_delay = 300

 why is it not a good idea to give end users control over when they
 want to run it ?

 Effectively, you have control over autovacuum via these params.
 You have to remember, that autovacuum doesn't cost much, and it makes
 planner know more about data.
 It's not there to clean up databases, as you might imagine - it is
 there to update stats, and mark pages as free.

 So make sure you tweak that config fist, because I have a funny
 feeling that you just think that vacuuming bogs down your machine, and
 _can_ be turned off without any bad consequences, which is simply not
 true.

our usage pattern is such that peak activity (indicated by load average)
during day time is 10 times during night hours. Autovacuum just puts
more pressure to the system. If less stressing version is used then
it shall take longer to complete one cycle,  which would mean  less
performance for longer time . Less performance queues up queries
and encourages people to re submit their queries which again
adds to bogging up the system.

In our case i feel the hardware is bit underscaled as compared to
load thats why i think running in lean hours is best of both worlds
no performance sacrifices and intelligent vacuuming.

regds
-- mallah.




 --
 GJ


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Ben Chobot

On Wed, 11 Feb 2009, Scott Carey wrote:


On a large partitioned database, ordinary vacuum is a very very difficult 
option.

Most of the time on such a system, most tables are dormant with respect to 
writes and never need to be vacuumed.  A 'daily vacuum' is likely to take a 
full day to run on larger systems. Since ordinary vacuum can't be run on 
subsets of tables without explicitly naming them one at a time (can't just 
vacuum a schema, tablespace, or use a regex to match table names), good luck 
using it effectively if you have a few thousand tables in partitions.  You'll 
have to have application code or a script with knowledge of all the partition 
names and which are in need of an analyze/vacuum.

Autovacuum is good enough in recent versions to be tuned to have very low 
impact though.  If you have to, rather than stop and start it, just turn the 
delay or cost settings up and down during different parts of the day.  More 
than likely however, it will be able to keep up with a single set of settings.
In particular, rather than making the delay longer, make the appropriate cost 
larger -- page miss or page dirty affect how much I/O it will do, and page hit 
will mostly affect how much CPU it uses.

Perhaps a feature request is to have a version of the manual vacuum command 
that doesn't bother running on tables that autovacuum wouldn't touch due to 
insufficient data churn.  This way, at lean hours one can manually vacuum to 
help an autovacuum that was tuned for very low impact 'catch up'.
Also, if there was some way to make vacuum not stink so badly on tables that 
were just loaded with pg_load, where it causes huge write activity for tables 
that clearly have no bloat (I believe this is a hint bits thing?).


Oh, I agree with everything you say. I'm just pointing out that if you 
really do want control over when things get vacuumed (e.g. you have a 
mostly-read database 20 hours a day and then 4 hours of heavy churn at 
night) then you can still do that if you want.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Brad Nicholson
On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote:
 On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz gryz...@gmail.com 
 wrote:
  On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
  mallah.raj...@gmail.com wrote:
 
  vacuum_cost_delay = 150
  vacuum_cost_page_hit = 1
  vacuum_cost_page_miss = 10
  vacuum_cost_page_dirty = 20
  vacuum_cost_limit = 1000
  autovacuum_vacuum_cost_delay = 300
 
  why is it not a good idea to give end users control over when they
  want to run it ?
 
  Effectively, you have control over autovacuum via these params.
  You have to remember, that autovacuum doesn't cost much, and it makes
  planner know more about data.
  It's not there to clean up databases, as you might imagine - it is
  there to update stats, and mark pages as free.
 
  So make sure you tweak that config fist, because I have a funny
  feeling that you just think that vacuuming bogs down your machine, and
  _can_ be turned off without any bad consequences, which is simply not
  true.
 
 our usage pattern is such that peak activity (indicated by load average)
 during day time is 10 times during night hours. Autovacuum just puts
 more pressure to the system. If less stressing version is used then
 it shall take longer to complete one cycle,  which would mean  less
 performance for longer time . Less performance queues up queries
 and encourages people to re submit their queries which again
 adds to bogging up the system.

That's not exactly how it works in practise, if tuned properly.  It may
take longer, but it is less intensive while running.

We had one system that had spikes happening due to the exact case you
described - there were noticeably high IO wait times while certain
tables were being vacuumed.  We set the cost delay and the wait times
dropped to the point where it was non-issue.  Vacuums take twice as
long, but there is no measurable impact to the performance.

 In our case i feel the hardware is bit underscaled as compared to
 load thats why i think running in lean hours is best of both worlds
 no performance sacrifices and intelligent vacuuming.

That is a different issue altogether. 

Not vacuuming a running system at all during peak hours is not
considered intelligent vacuuming IMHO.  There are plenty of use cases
where small, frequent vacuums keep tables under control at a very low
cost.  Letting them go for extended periods of time without vacuuming
causes bloat and eventual slowdowns to table access which manifest in
higher IO usage across the board.

If you really are dead set on vacuuming only at night, you may want to
do a careful analysis of which tables need to be vacuumed and when, and
trigger manual vacuums from cron.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Rajesh Kumar Mallah
On Wed, Feb 11, 2009 at 11:30 PM, Brad Nicholson
bnich...@ca.afilias.info wrote:
 On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote:
 On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz gryz...@gmail.com 
 wrote:
  On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
  mallah.raj...@gmail.com wrote:
 
  vacuum_cost_delay = 150
  vacuum_cost_page_hit = 1
  vacuum_cost_page_miss = 10
  vacuum_cost_page_dirty = 20
  vacuum_cost_limit = 1000
  autovacuum_vacuum_cost_delay = 300
 
  why is it not a good idea to give end users control over when they
  want to run it ?
 
  Effectively, you have control over autovacuum via these params.
  You have to remember, that autovacuum doesn't cost much, and it makes
  planner know more about data.
  It's not there to clean up databases, as you might imagine - it is
  there to update stats, and mark pages as free.
 
  So make sure you tweak that config fist, because I have a funny
  feeling that you just think that vacuuming bogs down your machine, and
  _can_ be turned off without any bad consequences, which is simply not
  true.

 our usage pattern is such that peak activity (indicated by load average)
 during day time is 10 times during night hours. Autovacuum just puts
 more pressure to the system. If less stressing version is used then
 it shall take longer to complete one cycle,  which would mean  less
 performance for longer time . Less performance queues up queries
 and encourages people to re submit their queries which again
 adds to bogging up the system.

 That's not exactly how it works in practise, if tuned properly.  It may
 take longer, but it is less intensive while running.

 We had one system that had spikes happening due to the exact case you
 described - there were noticeably high IO wait times while certain
 tables were being vacuumed.  We set the cost delay and the wait times
 dropped to the point where it was non-issue.


I think i can take this route and monitor the io activity
during vacuums. thanks everyone for their suggestions.

-- mallah.


Vacuums take twice as
 long, but there is no measurable impact to the performance.

 In our case i feel the hardware is bit underscaled as compared to
 load thats why i think running in lean hours is best of both worlds
 no performance sacrifices and intelligent vacuuming.

 That is a different issue altogether.

 Not vacuuming a running system at all during peak hours is not
 considered intelligent vacuuming IMHO.  There are plenty of use cases
 where small, frequent vacuums keep tables under control at a very low
 cost.  Letting them go for extended periods of time without vacuuming
 causes bloat and eventual slowdowns to table access which manifest in
 higher IO usage across the board.

 If you really are dead set on vacuuming only at night, you may want to
 do a careful analysis of which tables need to be vacuumed and when, and
 trigger manual vacuums from cron.

 --
 Brad Nicholson  416-673-4106
 Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance