On 3/26/10 4:57 PM, Richard Yen wrote:
Hi everyone,

We've recently encountered some swapping issues on our CentOS 64GB Nehalem 
machine, running postgres 8.4.2.  Unfortunately, I was foolish enough to set 
shared_buffers to 40GB.  I was wondering if anyone would have any insight into 
why the swapping suddenly starts, but never recovers?

<img src="http://richyen.com/i/swap.png";>

Note, the machine has been up and running since mid-December 2009.  It was only 
a March 8 that this swapping began, and it's never recovered.

If we look at dstat, we find the following:

<img src="http://richyen.com/i/dstat.png";>

Note that it is constantly paging in, but never paging out.

This happens when you have too many processes using too much space to fit in 
real memory, but none of them are changing their memory image.  If the system 
swaps a process in, but that process doesn't change anything in memory, then 
there are no dirty pages and the kernel can just kick the process out of memory 
without writing anything back to the swap disk -- the data in the swap are 
still valid.

It's a classic problem when processes are running round-robin. Say you have 
space for 100 processes, but you're running 101 process.  When you get to the 
#101, #1 is the oldest so it swaps out.  Then #1 runs, and #2 is the oldest, so 
it gets kicked out.  Then #2 runs and kicks out #3 ... and so forth.  Going 
from 100 to 101 process brings the system nearly to a halt.

Some operating systems try to use tricks to keep this from happening, but it's 
a hard problem to solve.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to