Geo Carncross wrote:
On Mon, 2006-03-06 at 13:26 -0500, Matthew T. O'Connor wrote:
The OOM Killer should be killed itself. I keep up with the PostgreSQL lists and they *HIGHLY* recommend disabling the OOM killer. The section 16.4.3 for details:

http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN18105

Basically their view of the OOM killer is that it's a very bad idea for a server that you want to be reliable. Basically if you tell the Kernel not to overcommit memory, the OOM killer becomes moot, but you better have enough mem / swap space to handle your needs.

The OOMK isn't _causing_ the problem. The problem is already there! You're
simply out of memory! It doesn't matter if malloc() fails or not.

The OOMK is simply one way of fixing it.

I disagree here, a program can intelligently deal with a malloc() failure, PostgreSQL does this very well in fact. A program can not deal intelligently with getting killed at random moments. So I disagree that OOM is a way of fixing the problem.

This is the nugget of truth. If you disable overcommit, then malloc() or
mmap() can fail. If you use overcommit, then the process gets killed
someplace else.

Same comment as above, a well written program can deal with a malloc failure.

Pg has to be resistant to failure anyway- it has to protect against the
power cord getting yanked.

And it is.

OOMK or not- you run out of memory, and Pg stops doing it's job.

No, PG allocates most of what it needs on startup, a particular query may fail due to a malloc() failure, but the postmaster will keep running. Also, the OOM killer often does not kill the program that is causing the OOM problem.

Daemons should be resistant to accidental death. Whether it be OOMK,
signal, or bug in the software.

Right, and PG is, however there is a stability loss when using OOM, you can't predict when PG will get killed, and it might not even be PG's fault, that is, some other program might be using all the memory.

Daemons cannot be resistant on their own: The only way to "make" a
program stay running is by init- and that's only because init (pid 1)
cannot be killed.

If you run postmaster from init (as I do) then if Pg dies, it gets
restarted. This can be because Pg has a bug in it, or a signal gets sent
to the wrong process group, the OOMK goes nuts, or any number of other
reasons.

That's fine, but the only program I run from init are ones I don't trust, I have never run PG from init, and I have never had a problem with it dying.

So you say: "turn off the OOMK so Pg stays running"
I say: "Turn off init.d so Pg stays running"

NO PG does not stay running from init any better than from init.d, it just gets back up faster (assuming it can be restarted). I don't like this since it may mask a larger problem. Bottom line is PG shouldn't die very often and doesn't if it's setup properly.

Matt

Reply via email to