Re: [HACKERS] PG Killed by OOM Condition

2005-10-25 Thread Jeff Davis
daveg wrote:
 When this happens the machine runs out of memory and swap. Without the oom
 killer it simply hangs the machine which is inconvenient as it is at a remote
 location. The oom killer usually lets the machine recover and postgres restart
 without a hard reboot.
 

If vm.overcommit is set to 2, wouldn't postgres get a memory allocation
error, rather than a hung machine?

By the way, what does FreeBSD do? I've never had any memory allocation
related headaches on that platform (although I'm fairly new to FreeBSD).

Regards,
Jeff Davis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PG Killed by OOM Condition

2005-10-25 Thread Tom Lane
daveg [EMAIL PROTECTED] writes:
 I work with a client that runs 16Gb memory with 16Gb of swap on dual opterons
 dedicated to postgres. They have large tables and like hash joins as they are
 often the fastest way to a result, so work_mem is set fairly large. Sometimes
 postgres is very inaccurate predicting real memory use verses work_mem and
 will grow very much larger than expected.

FWIW, 8.1 should be a lot better at this --- it can dynamically readjust
the hash join parameters to keep memory usage under the work_mem limit.

 When this happens the machine runs out of memory and swap. Without the oom
 killer it simply hangs the machine which is inconvenient as it is at a remote
 location.

It shouldn't hang in any case ... something wrong there.  I can
believe that the machine would go to its knees as it thrashes more
and more while approaching the totally-out-of-swap point, but it
shouldn't hang up.  You might have a kernel bug to deal with.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG Killed by OOM Condition

2005-10-24 Thread Bruno Wolff III
On Mon, Oct 03, 2005 at 23:03:06 +1000,
  John Hansen [EMAIL PROTECTED] wrote:
 Good people,
 
 Just had a thought!
 
 Might it be worth while protecting the postmaster from an OOM Kill on
 Linux by setting /proc/{pid}/oom_adj to -17 ?
 (Described vaguely in mm/oom_kill.c)

Wouldn't it be better to use sysctl to tell the kernel not to over commit
memory in the first place?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG Killed by OOM Condition

2005-10-24 Thread mark
On Mon, Oct 24, 2005 at 10:20:39PM -0500, Bruno Wolff III wrote:
 On Mon, Oct 03, 2005 at 23:03:06 +1000,
   John Hansen [EMAIL PROTECTED] wrote:
  Good people,
  Just had a thought!
  Might it be worth while protecting the postmaster from an OOM Kill on
  Linux by setting /proc/{pid}/oom_adj to -17 ?
  (Described vaguely in mm/oom_kill.c)
 Wouldn't it be better to use sysctl to tell the kernel not to over commit
 memory in the first place?

Only if you don't have large processes in your system that fork()
frequently, pushing the reserved memory over the limit, preventing
PostgreSQL from allocating memory when it does need it, even though
copy-on-write allows plenty of memory to continue to be available -
it is just reserved... :-)

There isn't a perfect answer.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PG Killed by OOM Condition

2005-10-24 Thread Bruno Wolff III
On Mon, Oct 24, 2005 at 23:55:07 -0400,
  [EMAIL PROTECTED] wrote:
 On Mon, Oct 24, 2005 at 10:20:39PM -0500, Bruno Wolff III wrote:
  On Mon, Oct 03, 2005 at 23:03:06 +1000,
John Hansen [EMAIL PROTECTED] wrote:
   Good people,
   Just had a thought!
   Might it be worth while protecting the postmaster from an OOM Kill on
   Linux by setting /proc/{pid}/oom_adj to -17 ?
   (Described vaguely in mm/oom_kill.c)
  Wouldn't it be better to use sysctl to tell the kernel not to over commit
  memory in the first place?
 
 Only if you don't have large processes in your system that fork()
 frequently, pushing the reserved memory over the limit, preventing
 PostgreSQL from allocating memory when it does need it, even though
 copy-on-write allows plenty of memory to continue to be available -
 it is just reserved... :-)
 
 There isn't a perfect answer.

No, but I would think tying up some disk space as swap space would be a
better solution. The linux oom killer is really dangerous.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG Killed by OOM Condition

2005-10-24 Thread daveg
On Mon, Oct 24, 2005 at 11:26:52PM -0500, Bruno Wolff III wrote:
 On Mon, Oct 24, 2005 at 23:55:07 -0400,
   [EMAIL PROTECTED] wrote:
  On Mon, Oct 24, 2005 at 10:20:39PM -0500, Bruno Wolff III wrote:
   On Mon, Oct 03, 2005 at 23:03:06 +1000,
 John Hansen [EMAIL PROTECTED] wrote:
Good people,
Just had a thought!
Might it be worth while protecting the postmaster from an OOM Kill on
Linux by setting /proc/{pid}/oom_adj to -17 ?
(Described vaguely in mm/oom_kill.c)
   Wouldn't it be better to use sysctl to tell the kernel not to over commit
   memory in the first place?
  
  Only if you don't have large processes in your system that fork()
  frequently, pushing the reserved memory over the limit, preventing
  PostgreSQL from allocating memory when it does need it, even though
  copy-on-write allows plenty of memory to continue to be available -
  it is just reserved... :-)
  
  There isn't a perfect answer.
 
 No, but I would think tying up some disk space as swap space would be a
 better solution. The linux oom killer is really dangerous.

I work with a client that runs 16Gb memory with 16Gb of swap on dual opterons
dedicated to postgres. They have large tables and like hash joins as they are
often the fastest way to a result, so work_mem is set fairly large. Sometimes
postgres is very inaccurate predicting real memory use verses work_mem and
will grow very much larger than expected. Which can result in two or more
postgres processes with over 10 Gb of virtual memory along with the usual 60
or so normal sized ones. 

When this happens the machine runs out of memory and swap. Without the oom
killer it simply hangs the machine which is inconvenient as it is at a remote
location. The oom killer usually lets the machine recover and postgres restart
without a hard reboot.

A solution is to use ulimit to set the maximum memory available to a
process. Ideally this would be a pg_ctl or postmaster option so that all the
forked postgresql processes would inherit the ulimit. The advantage over the
oom killer is that only the overly large process fails, and it fails with an
out of memory error and exits cleanly as opposed to having the whole set
of backends restarted.

-dg

-- 
David Gould  [EMAIL PROTECTED]
If simplicity worked, the world would be overrun with insects.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] PG Killed by OOM Condition

2005-10-04 Thread Jeff Davis
 It's not an easy decision. Linux isn't wrong. Solaris isn't wrong.
 Most people never hit these problems, and the people that do, are
 just as likely to hit one problem, or the other. The grass is always
 greener on the side of the fence that isn't hurting me right now,
 and all that.
 
 Cheers,
 mark
 

Thanks, a very informative reply.

Do you have some references where I can learn more?

I think that I've run into the OOM killer without a fork() being
involved, but I could be wrong. Is it possible to be hit by the OOM
killer if no applications use fork()?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PG Killed by OOM Condition

2005-10-04 Thread Dennis Bjorklund
On Mon, 3 Oct 2005, Jeff Davis wrote:

 involved, but I could be wrong. Is it possible to be hit by the OOM
 killer if no applications use fork()?

Sure, whenever the system is out of mem and the os can't find a free page 
then it kills a process. If you check the kernel log you can see if the 
oom killer have been doing some work.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PG Killed by OOM Condition

2005-10-04 Thread Martijn van Oosterhout
On Mon, Oct 03, 2005 at 11:47:57PM -0700, Jeff Davis wrote:
 I think that I've run into the OOM killer without a fork() being
 involved, but I could be wrong. Is it possible to be hit by the OOM
 killer if no applications use fork()?

fork() is the obvious overcomitter. If Netscape wants to spawn a new
process, it first has to fork 50MB of memory, then free probably most
of it because it execs some little plugin. If processes mmap() a large block
and then doesn't use it until later. Similar idea with brk(). If you
run out of swap at the wrong moment... Recent versions are more clever
about who to kill. Sometimes you just get unlucky...

It's always killed the right process for me (Mozilla derivative leaked
masses of memory over long period).
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgppqV21TxiXq.pgp
Description: PGP signature


[HACKERS] PG Killed by OOM Condition

2005-10-03 Thread John Hansen
Good people,

Just had a thought!

Might it be worth while protecting the postmaster from an OOM Kill on
Linux by setting /proc/{pid}/oom_adj to -17 ?
(Described vaguely in mm/oom_kill.c)

Kind Regards,

John Hansen


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread Martijn van Oosterhout
On Mon, Oct 03, 2005 at 11:03:06PM +1000, John Hansen wrote:
 Might it be worth while protecting the postmaster from an OOM Kill on
 Linux by setting /proc/{pid}/oom_adj to -17 ?
 (Described vaguely in mm/oom_kill.c)

Has it actually happened to you? PostgreSQL is pretty good about its
memory usage. Besides, seems to me it should be an system admisitrator
descision.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpZ0ko2iHAwg.pgp
Description: PGP signature


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread Tom Lane
John Hansen [EMAIL PROTECTED] writes:
 Might it be worth while protecting the postmaster from an OOM Kill on
 Linux by setting /proc/{pid}/oom_adj to -17 ?
 (Described vaguely in mm/oom_kill.c)

(a) wouldn't that require root privilege?  (b) how would we determine
whether we are on a system to which this applies?  (c) is it actually
documented in a way that makes you think it'll be a permanently
supported feature (ie, somewhere outside the source code)?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread John Hansen
Martijn van Oosterhout Wrote:
 
 Has it actually happened to you? PostgreSQL is pretty good 
 about its memory usage. Besides, seems to me it should be an 
 system admisitrator descision.

No, Just came across this by chance, and thought it might be a good
idea.
Perhaps as a postgresql.conf setting.

... John


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread John Hansen
Tom Lane Wrote:

 (a) wouldn't that require root privilege?  (b) how would we 
 determine whether we are on a system to which this applies?  
 (c) is it actually documented in a way that makes you think 
 it'll be a permanently supported feature (ie, somewhere 
 outside the source code)?

(a) No, /proc/{pid}/* is owned by the process
(b) /proc/{pid}/oom_adj exists ?
(c) No, from the source: (not docbooked, we don't want this one
cluttering up the manual)

... John

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread Jeff Davis
Martijn van Oosterhout wrote:
 On Mon, Oct 03, 2005 at 11:03:06PM +1000, John Hansen wrote:
 
Might it be worth while protecting the postmaster from an OOM Kill on
Linux by setting /proc/{pid}/oom_adj to -17 ?
(Described vaguely in mm/oom_kill.c)
 
 
 Has it actually happened to you? PostgreSQL is pretty good about its
 memory usage. Besides, seems to me it should be an system admisitrator
 descision.
 

It's happened to me...

Usually it's when there's some other runaway process, and the kernel
decides to kill PostgreSQL because it can't tell the difference.

I really don't like that feature in linux. Nobody has been able to
explain to me why linux is the only OS with an OOM Killer. If someone
here has more information, I'd like to know.

When using linux I always set vm_overcommit=2.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread Alvaro Herrera
On Mon, Oct 03, 2005 at 01:25:00PM -0700, Jeff Davis wrote:
 Martijn van Oosterhout wrote:
  On Mon, Oct 03, 2005 at 11:03:06PM +1000, John Hansen wrote:
  
 Might it be worth while protecting the postmaster from an OOM Kill on
 Linux by setting /proc/{pid}/oom_adj to -17 ?
 (Described vaguely in mm/oom_kill.c)
  
  Has it actually happened to you? PostgreSQL is pretty good about its
  memory usage. Besides, seems to me it should be an system admisitrator
  descision.

Maybe what we could do is put a line to change the setting in the
contrib/start-script/linux script, and perhaps lobby the packagers of
Linux distributions to do the same.

ISTM it's trivial to test whether the file exists, and useful to
activate the feature if available.

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle.  (Larry Wall, Apocalypse 6)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread mark
On Mon, Oct 03, 2005 at 01:25:00PM -0700, Jeff Davis wrote:
 It's happened to me...
 Usually it's when there's some other runaway process, and the kernel
 decides to kill PostgreSQL because it can't tell the difference.
 I really don't like that feature in linux. Nobody has been able to
 explain to me why linux is the only OS with an OOM Killer. If someone
 here has more information, I'd like to know.
 When using linux I always set vm_overcommit=2.

I don't think it's the only one. Perhaps the only one with a default
setting of on?

I believe Solaris can be configured to over-commit memory.

The problem really comes down to the definition of fork(). UNIX fork()
requires that the system splits a process into two separate copies.
For an application that is currently using 500 Mbytes of virtual
memory, this would require that the system accept that each process,
may use its own complete copy of this 500 Mbytes, for a total of 1
Gbyte in active use. fork() a few more times, and we hit 2 Gbytes, 4
Gbytes - whatever. Even if only for an instant, and even if the pages
are copy-on-write, the system has to consider the possibility that
each application may choose to modify all pages, resulting in complete
copies.

Solaris defaults to not over committing. This means that such an
application, as defined above, would fail at one of the invocations
of fork(). Even though the memory isn't being used - Solaris, by default,
isn't willing to 'over commit' to having the memory allocated as a result
of fork(). Some very large applications don't work under Solaris as a
result, unless this setting is disabled.

Linux takes the opposite extreme. It assumes that copy-on-write will
get us through. The fork() would be allowed - but if available virtual
memory actually does become low, it's faced with a hard decision. It
either fails an application of its choice in a controlled OOM
manner, by trying to guess which application is misbehaving, and
deciding to kill that one - or it waits until memory really is gone,
at which point MANY applications may start to fail, as their page
fault fails to allocate a page, and the process dies a horrible death.

It's not an easy decision. Linux isn't wrong. Solaris isn't wrong.
Most people never hit these problems, and the people that do, are
just as likely to hit one problem, or the other. The grass is always
greener on the side of the fence that isn't hurting me right now,
and all that.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly