Re: [PERFORM] Background vacuum

2007-05-19 Thread Ron Mayer
Greg Smith wrote:
 
 Let's break this down into individual parts:

Great summary.

 4) Is vacuuming a challenging I/O demand?  Quite.
 
 Add all this up, and that fact that you're satisfied with how nice has
 worked successfully for you doesn't have to conflict with an opinion
 that it's not the best approach for controlling vacuuming.  I just
 wouldn't extrapolate your experience too far here.

I wasn't claiming it's a the best approach for vacuuming.

From my first posting in this thread I've been agreeing that
vacuum_cost_delay is the better tool for handling vacuum.  Just
that the original poster also asked for a way of setting priorities
so I pointed him to one.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Background vacuum

2007-05-18 Thread Ron Mayer
Tom Lane wrote:
 Ron Mayer [EMAIL PROTECTED] writes:
 Greg Smith wrote:
 Count me on the side that agrees adjusting the vacuuming parameters is
 the more straightforward way to cope with this problem.
 
 Agreed for vacuum; but it still seems interesting to me that
 across databases and workloads high priority transactions
 tended to get through faster than low priority ones.  Is there
 any reason to believe that the drawbacks of priority inversion
 outweigh the benefits of setting priorities?
 
 Well, it's unclear, and anecdotal evidence is unlikely to convince
 anybody.  I had put some stock in the CMU paper, but if it's based
 on PG 7.3 then you've got to **seriously** question its relevance
 to the current code.

I was thinking the paper's results might apply more generally
to RDBMS-like applications since they did test 3 of them with
different locking behavior and different bottlenecks.

But true, I should stop bringing up 7.3 examples.


Anecdotally ;-) I've found renice-ing reports to help; especially
in the (probably not too uncommon case) where slow running
batch reporting queries hit different tables than interactive
reporting queries.   I guess that's why I keep defending
priorities as a useful technique.   It seems even more useful
considering the existence of schedulers that have priority
inheritance features.

I'll admit there's still the theoretical possibility that
it's a foot-gun so I don't mind people having to write
their own stored procedure to enable it - but I'd be
surprised if anyone could find a real world case where
priorities would do more harm than good.

Though, yeah, it'd be easy to construct an artificial
case that'd demonstrate priority inversion (i.e. have
a low priority process that takes a lock and sits
and spins on some CPU-intensive stored procedure
without doing any I/O).

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Background vacuum

2007-05-18 Thread Greg Smith

On Fri, 18 May 2007, Ron Mayer wrote:


Anecdotally ;-) I've found renice-ing reports to help


Let's break this down into individual parts:

1) Is there enough CPU-intensive activity in some database tasks that they 
can be usefully be controlled by tools like nice?  Sure.


2) Is it so likely that you'll fall victim to a priority inversion problem 
that you shouldn't ever consider that technique?  No.


3) Does the I/O scheduler in modern OSes deal with a lot more things than 
just the CPU?  You bet.


4) Is vacuuming a challenging I/O demand?  Quite.

Add all this up, and that fact that you're satisfied with how nice has 
worked successfully for you doesn't have to conflict with an opinion that 
it's not the best approach for controlling vacuuming.  I just wouldn't 
extrapolate your experience too far here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Background vacuum

2007-05-17 Thread Andrew Sullivan
On Thu, May 10, 2007 at 05:10:56PM -0700, Ron Mayer wrote:
 One way is to write astored procedure that sets it's own priority.
 An example is here:
 http://weblog.bignerdranch.com/?p=11

Do you have evidence to show this will actually work consistently?
The problem with doing this is that if your process is holding a lock
that prevents some other process from doing something, then your
lowered priority actually causes that _other_ process to go slower
too.  This is part of the reason people object to the suggestion that
renicing a single back end will help anything.

 This paper studied both CPU and lock priorities on a variety
 of databases including PostgreSQL.
 
 http://www.cs.cmu.edu/~bianca/icde04.pdf
 
  By contrast, for PostgreSQL, lock scheduling is not as
   effective as CPU scheduling (see Figure 4(c)).

It is likely that in _some_ cases, you can get this benefit, because
you don't have contention issues.  The explanation for the good lock
performance by Postgres on the TPC-C tests they were using is
PostgreSQL's MVCC: Postgres locks less.  The problem comes when you
have contention, and in that case, CPU scheduling will really hurt. 

This means that, to use CPU scheduling safely, you have to be really
sure that you know what the other transactions are doing. 

A 

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Information security isn't a technological problem.  It's an economics
problem.
--Bruce Schneier

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Background vacuum

2007-05-17 Thread Ron Mayer
Andrew Sullivan wrote:
 On Thu, May 10, 2007 at 05:10:56PM -0700, Ron Mayer wrote:
 One way is to write astored procedure that sets it's own priority.
 An example is here:
 http://weblog.bignerdranch.com/?p=11
 
 Do you have evidence to show this will actually work consistently?

The paper referenced below gives a better explanation than I can.

Their conclusion was that on many real-life workloads (including
TPC-C and TPC-H like workloads) on many databases (including DB2
and postgresql) the benefits vastly outweighed the disadvantages.

 The problem with doing this is that if your process is holding a lock
 that prevents some other process from doing something, then your
 lowered priority actually causes that _other_ process to go slower
 too.  This is part of the reason people object to the suggestion that
 renicing a single back end will help anything.

Sure.  And in the paper they discussed the effect and found that
if you do have an OS scheduler than supports priority inheritance
the benefits are even bigger than those without it.  But even
for OS's and scheduler combinations without it the benefits
were very significant.

 
 This paper studied both CPU and lock priorities on a variety
 of databases including PostgreSQL.

 http://www.cs.cmu.edu/~bianca/icde04.pdf

  By contrast, for PostgreSQL, lock scheduling is not as
   effective as CPU scheduling (see Figure 4(c)).
 
 It is likely that in _some_ cases, you can get this benefit, because
 you don't have contention issues.  The explanation for the good lock
 performance by Postgres on the TPC-C tests they were using is
 PostgreSQL's MVCC: Postgres locks less.  The problem comes when you
 have contention, and in that case, CPU scheduling will really hurt. 
 
 This means that, to use CPU scheduling safely, you have to be really
 sure that you know what the other transactions are doing. 

Not necessarily.  From the wide range of conditions the paper tested
I'd say it's more like quicksort - you need to be sure you avoid
theoretical pathological conditions that noone (that I can find)
has encountered in practice.

If you do know of such a workload, I (and imagine the authors
of that paper) would be quite interested.

Since they showed that the benefits are very real for both
TPC-C and TPC-H like workloads I think the burden of proof
is now more on the people warning of the (so far theoretical)
drawbacks.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Background vacuum

2007-05-17 Thread Greg Smith
Ah, glad this came up again 'cause a problem here caused my original reply 
to bounce.


On Thu, 10 May 2007, Ron Mayer wrote:

Actually, CPU priorities _are_ an effective way of indirectly scheduling 
I/O priorities. This paper studied both CPU and lock priorities on a 
variety of databases including PostgreSQL. 
http://www.cs.cmu.edu/~bianca/icde04.pdf


I spent a fair amount of time analyzing that paper recently, and found it 
hard to draw any strong current conclusions from it.  Locking and related 
scalability issues are much better now than in the PG 7.3 they tested. 
For example, from the paper:


We find almost all lightweight locking in PostgreSQL fucntions to 
serialize the I/O buffer pool and WAL activity...as a result, we attribute 
all the lightweight lock waiting time for the above-listed locks to I/O.


Well, sure, if you classify those as I/O waits it's no surprise you can 
darn near directly control them via CPU scheduling; I question the current 
relevancy of this historical observation about the old code.  I think it's 
much easier to get into an honest I/O bound situation now with a TPC-C 
like workload (they kind of cheated on that part too which is a whole 
'nother discussion), especially with the even faster speeds of modern 
processors, and then you're in a situation where CPU scheduling is not so 
effective for indirectly controlling I/O prioritization.


Count me on the side that agrees adjusting the vacuuming parameters is the 
more straightforward way to cope with this problem.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Background vacuum

2007-05-17 Thread Ron Mayer
Greg Smith wrote:
 
 Count me on the side that agrees adjusting the vacuuming parameters is
 the more straightforward way to cope with this problem.


Agreed for vacuum; but it still seems interesting to me that
across databases and workloads high priority transactions
tended to get through faster than low priority ones.  Is there
any reason to believe that the drawbacks of priority inversion
outweigh the benefits of setting priorities?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Background vacuum

2007-05-17 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 Greg Smith wrote:
 Count me on the side that agrees adjusting the vacuuming parameters is
 the more straightforward way to cope with this problem.

 Agreed for vacuum; but it still seems interesting to me that
 across databases and workloads high priority transactions
 tended to get through faster than low priority ones.  Is there
 any reason to believe that the drawbacks of priority inversion
 outweigh the benefits of setting priorities?

Well, it's unclear, and anecdotal evidence is unlikely to convince
anybody.  I had put some stock in the CMU paper, but if it's based
on PG 7.3 then you've got to **seriously** question its relevance
to the current code.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Background vacuum

2007-05-10 Thread Ron Mayer
Dan Harris wrote:
 Daniel Haensse wrote:
 Has anybody a nice
 solution to change process priority? A shell script, maybe even for java?

One way is to write astored procedure that sets it's own priority.
An example is here:
http://weblog.bignerdranch.com/?p=11


 While this may technically work, I think it lacks a key point.  'nice' (
 at least the versions I'm familiar with ) do not adjust I/O priority. 
 VACUUM is bogging things down because of the extra strain on I/O.  CPU
 usage shouldn't really be much of a factor.

Actually, CPU priorities _are_ an effective way of indirectly scheduling
I/O priorities.

This paper studied both CPU and lock priorities on a variety
of databases including PostgreSQL.

http://www.cs.cmu.edu/~bianca/icde04.pdf

 By contrast, for PostgreSQL, lock scheduling is not as
  effective as CPU scheduling (see Figure 4(c)).
  ...
  The effectiveness of CPU-Prio for TPC-C on
  PostgreSQL is surprising, given that I/O (I/O-related
  lightweight locks) is its bottleneck. Due to CPU prioritization,
  high-priority transactions are able to request I/O resources
  before low-priority transactions can. As a result,
  high-priority transactions wait fewer times (52% fewer) for
  I/O, and when they do wait, they wait behind fewer transactions
  (43% fewer). The fact that simple CPU prioritization
  is able to improve performance so significantly suggests that
  more complicated I/O scheduling is not always necessary.
  ...
  For TPC-C on MVCC DBMS, and in particular PostgreSQL,
  CPU scheduling is most effective, due to its ability
  to indirectly schedule the I/O bottleneck.
  ...
  For TPC-C running on PostgreSQL, the simplest CPU scheduling
  policy (CPU-Prio) provides a factor of 2 improvement
  for high-priority transactions, while adding priority inheritance
  (CPU-Prio-Inherit) provides a factor of 6 improvement
  while hardly penalizing low-priority transactions.
  Preemption (P-CPU) provides no appreciable benefit
  over CPU-Prio-Inherit
  

 Instead, I would recommend looking at vacuum_cost_delay and the related
 settings to make vacuum lower priority than the queries you care about. 
 This should be a cleaner solution for you.

Yeah, that's still true.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Background vacuum

2007-05-09 Thread Daniel Haensse
Dear list,

I'm running postgres on a tomcat server. The vacuum is run every hour
(cronjob) which leads to a performance drop of the tomcat applications.
I played around with renice command and I think it is possible to reduce
this effect which a renice. The problem is how can I figure out the PID
of the postmaster performing the vacuum(automated)? Has anybody a nice
solution to change process priority? A shell script, maybe even for java?

best regards

Dani



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Background vacuum

2007-05-09 Thread Dan Harris

Daniel Haensse wrote:

Dear list,

I'm running postgres on a tomcat server. The vacuum is run every hour
(cronjob) which leads to a performance drop of the tomcat applications.
I played around with renice command and I think it is possible to reduce
this effect which a renice. The problem is how can I figure out the PID
of the postmaster performing the vacuum(automated)? Has anybody a nice
solution to change process priority? A shell script, maybe even for java?



While this may technically work, I think it lacks a key point.  'nice' ( at 
least the versions I'm familiar with ) do not adjust I/O priority.  VACUUM is 
bogging things down because of the extra strain on I/O.  CPU usage shouldn't 
really be much of a factor.


Instead, I would recommend looking at vacuum_cost_delay and the related settings 
to make vacuum lower priority than the queries you care about.  This should be a 
cleaner solution for you.


-Dan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq