Geo Carncross wrote:
On Mon, 2006-03-06 at 13:26 -0500, Matthew T. O'Connor wrote:
The OOM Killer should be killed itself. I keep up with the PostgreSQL
lists and they *HIGHLY* recommend disabling the OOM killer. The section
16.4.3 for details:
http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN18105
Basically their view of the OOM killer is that it's a very bad idea for
a server that you want to be reliable. Basically if you tell the Kernel
not to overcommit memory, the OOM killer becomes moot, but you better
have enough mem / swap space to handle your needs.
The OOMK isn't _causing_ the problem. The problem is already there! You're
simply out of memory! It doesn't matter if malloc() fails or not.
The OOMK is simply one way of fixing it.
I disagree here, a program can intelligently deal with a malloc()
failure, PostgreSQL does this very well in fact. A program can not deal
intelligently with getting killed at random moments. So I disagree that
OOM is a way of fixing the problem.
This is the nugget of truth. If you disable overcommit, then malloc() or
mmap() can fail. If you use overcommit, then the process gets killed
someplace else.
Same comment as above, a well written program can deal with a malloc
failure.
Pg has to be resistant to failure anyway- it has to protect against the
power cord getting yanked.
And it is.
OOMK or not- you run out of memory, and Pg stops doing it's job.
No, PG allocates most of what it needs on startup, a particular query
may fail due to a malloc() failure, but the postmaster will keep
running. Also, the OOM killer often does not kill the program that is
causing the OOM problem.
Daemons should be resistant to accidental death. Whether it be OOMK,
signal, or bug in the software.
Right, and PG is, however there is a stability loss when using OOM, you
can't predict when PG will get killed, and it might not even be PG's
fault, that is, some other program might be using all the memory.
Daemons cannot be resistant on their own: The only way to "make" a
program stay running is by init- and that's only because init (pid 1)
cannot be killed.
If you run postmaster from init (as I do) then if Pg dies, it gets
restarted. This can be because Pg has a bug in it, or a signal gets sent
to the wrong process group, the OOMK goes nuts, or any number of other
reasons.
That's fine, but the only program I run from init are ones I don't
trust, I have never run PG from init, and I have never had a problem
with it dying.
So you say: "turn off the OOMK so Pg stays running"
I say: "Turn off init.d so Pg stays running"
NO PG does not stay running from init any better than from init.d, it
just gets back up faster (assuming it can be restarted). I don't like
this since it may mask a larger problem. Bottom line is PG shouldn't
die very often and doesn't if it's setup properly.
Matt