Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Robert Haas
On Tue, Dec 15, 2009 at 12:09 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 15/12/2009 12:35 PM, Mark Williamson wrote:

 So what happened is, the above update never completed and the Postgresql
 service consumed all available memory.  We had to forcefully reboot the
 machine

 That means your server is misconfigured. PostgreSQL should never consume all
 available memory. If it does, you have work_mem and/or maintenance_work_mem
 set way too high, and you have VM overcommit enabled in the kernel. You also
 have too much swap.

 http://www.postgresql.org/docs/current/interactive/kernel-resources.html

 http://www.network-theory.co.uk/docs/postgresql/vol3/LinuxMemoryOvercommit.html

 I wouldn't be surprised if you had shared_buffers set too high as well, and
 you have no ulimit set on postgresql's memory usage. All those things add up
 to fatal.

 A properly configured machine should be able to survive memory exhaustion
 caused by a user process fine. Disable VM overcommit, set a ulimit on
 postgresql so it can't consume all memory, use a sane amount of swap, and
 set sane values for work_mem and maintenance_work_mem.

I am skeptical that this is the real cause of the problem.  Yeah, OK,
ulimit might have stopped it, but I don't think any of the rest of
this would have mattered.  Unfortunately, we don't know how large the
table was that the OP attempted to update, or the details of how the
trigger was set up, but my guess it that it was the pending-trigger
list that sucked up all the available memory on the box.  This is an
issue that other people have run into in the past, and I don't think
we have a good solution.  I wonder if we should put some kind of a
limit in place so that queries like this will at least fail relatively
gracefully with an error message rather than taking down the box.

There is a feature forthcoming in 8.5 which will make it easier to
avoid these types of problems, by allowing you to test a condition
before the trigger gets added to the pending trigger list.  So if your
trigger is designed to update some side table whenever a certain
column is updated, you can really skip the trigger altogether without
incurring any memory overhead.  But there will still be people who do
it the old way, and it would be nice if we could at least mitigate the
impact a little.

...Robert

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Greg Stark
On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote:
  This is an
 issue that other people have run into in the past, and I don't think
 we have a good solution.  I wonder if we should put some kind of a
 limit in place so that queries like this will at least fail relatively
 gracefully with an error message rather than taking down the box.

Eh? That's exactly what's supposed to happen now. When malloc returns
0 you're supposed to get a graceful error message and transaction
abort.

-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Robert Haas
On Tue, Dec 15, 2009 at 11:02 AM, Greg Stark gsst...@mit.edu wrote:
 On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote:
  This is an
 issue that other people have run into in the past, and I don't think
 we have a good solution.  I wonder if we should put some kind of a
 limit in place so that queries like this will at least fail relatively
 gracefully with an error message rather than taking down the box.

 Eh? That's exactly what's supposed to happen now. When malloc returns
 0 you're supposed to get a graceful error message and transaction
 abort.

I didn't know that, but it I think by the time malloc returns 0
usually other bad things are happening.  I don't think that's really
an answer.

...Robert

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Greg Stark
On Tue, Dec 15, 2009 at 4:16 PM, Robert Haas robertmh...@gmail.com wrote:
 I didn't know that, but it I think by the time malloc returns 0
 usually other bad things are happening.  I don't think that's really
 an answer.

Only if, as Craig said and you disputed, you have overcommit enabled
or lots of swap.

There is a problem though, if you have overcommit disabled you kind of
need lots of swap to avoid running out of virtual memory long before
you're actually short on physical memory. And it's true that Linux has
had trouble dealing with low memory situations in the past few years,
at least in my experience. But that's not true on all OSes and we can
hope it won't be true for Linux forever too.

-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Robert Haas
On Tue, Dec 15, 2009 at 12:07 PM, Greg Stark gsst...@mit.edu wrote:
 On Tue, Dec 15, 2009 at 4:16 PM, Robert Haas robertmh...@gmail.com wrote:
 I didn't know that, but it I think by the time malloc returns 0
 usually other bad things are happening.  I don't think that's really
 an answer.

 Only if, as Craig said and you disputed, you have overcommit enabled
 or lots of swap.

I definitely dispute that.  :-)

I mean, typically what happens when you allocate a lot of memory is
that everything else on the system that's not in active use gets
pushed out of RAM to make room for the memory hog.  So all of your
file cache goes away and not-recently-accessed portions of other
processes code and data segments get swapped out.  The system becomes
crushingly slow and unresponsive.  On my laptop, for example,
eventually as memory consumption increases you can't use Xwindows any
more because all of those processes have been pushed out to disk to
make room for the memory hog.  Every time you move the mouse it faults
all those pages back in to try to redraw the screen.  But as soon as
you stop it pushes them all back out again.  It's difficult to reach
the point where malloc actually fails because as swapping increases
the activity of the system (including the runaway process) grinds
almost to a halt.  I'm not willing to wait that long for my
transaction to fail.

I suppose that I could fix this by getting rid of my swap partition
altogether, but that seems a rather extreme solution, and it's
certainly not the way most UNIX/Linux systems I run across are
configured, if for no other reason than that the operating system
configurator usually recommends creating one.

Am I all wet here?  I thought waiting for malloc to fail was usually
considered a poor error recovery mechanism.

...Robert

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Greg Stark
On Tue, Dec 15, 2009 at 5:18 PM, Robert Haas robertmh...@gmail.com wrote:
 I suppose that I could fix this by getting rid of my swap partition
 altogether, but that seems a rather extreme solution, and it's
 certainly not the way most UNIX/Linux systems I run across are
 configured, if for no other reason than that the operating system
 configurator usually recommends creating one.

Well I suppose it's a question of degree. The more swap you have the
more you can do before you run out of memory. But nobody said swap
performs as well as RAM What's the kernel going to do though,
refuse to allocate memory when it has swap available?

The problem is that the OS gives no feedback on when you're running
low on RAM but haven't run out yet. There were experiments in the 90s
with a SIGDANGER but I think it never worked great and I don't think
it's widespread.

If Postgres imposed a limit itself it would a) be hard for it to be a
server-wide limit and b) wouldn't take into account other things
running on the system. So that doesn't really help either.


-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Alvaro Herrera
Robert Haas escribió:
 On Tue, Dec 15, 2009 at 11:02 AM, Greg Stark gsst...@mit.edu wrote:
  On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote:
   This is an
  issue that other people have run into in the past, and I don't think
  we have a good solution.  I wonder if we should put some kind of a
  limit in place so that queries like this will at least fail relatively
  gracefully with an error message rather than taking down the box.
 
  Eh? That's exactly what's supposed to happen now. When malloc returns
  0 you're supposed to get a graceful error message and transaction
  abort.
 
 I didn't know that, but it I think by the time malloc returns 0
 usually other bad things are happening.  I don't think that's really
 an answer.

Actually the real answer is that we can't spill the deferred trigger
queue to disk, so it consumes all memory.  *That* is very much our own
shortcoming.  The fact that this drove the operating system into
thrashing is just an expected (if undesirable) side effect.  Greg is
right that if malloc returns 0 we act sanely, i.e. abort transaction and
release all the memory; Craig is right that having this happen is a good
thing.  Whether or not this causes a lot of thrashing depends on the
situation, but that's an OS level problem, not Postgres'.

If we're to do anything about this, it is spilling the trigger queue so
it doesn't eat an unbounded amount of memory.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 If we're to do anything about this, it is spilling the trigger queue so
 it doesn't eat an unbounded amount of memory.

Of course, the reason nothing much has been done about that is that
by the time your trigger queue is long enough to cause such an issue,
you're screwed anyway --- actually executing all those triggers would
take longer than you'll want to wait.

I tend to agree with the upthread opinion that if you're on a platform
that responds so ungracefully to memory hogs, you'd be best off using
ulimit to confine backends to a moderate amount of memory space.
But this *is* a platform problem; it's just one more example of how
poorly designed and implemented Linux's memory handling is.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Robert Haas
On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 If we're to do anything about this, it is spilling the trigger queue so
 it doesn't eat an unbounded amount of memory.

 Of course, the reason nothing much has been done about that is that
 by the time your trigger queue is long enough to cause such an issue,
 you're screwed anyway --- actually executing all those triggers would
 take longer than you'll want to wait.

What is the best way to go about doing that, anyway?

...Robert

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Robert Haas
On Tue, Dec 15, 2009 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 If we're to do anything about this, it is spilling the trigger queue so
 it doesn't eat an unbounded amount of memory.

 Of course, the reason nothing much has been done about that is that
 by the time your trigger queue is long enough to cause such an issue,
 you're screwed anyway --- actually executing all those triggers would
 take longer than you'll want to wait.

 What is the best way to go about doing that, anyway?

 Well, we added conditional triggers which provides a partial fix.  The
 only other idea I've heard that sounds like it'd really help is having
 some sort of lossy storage for foreign-key triggers, where we'd fall
 back to per-block or whole-table rechecking of the constraint instead of
 trying to track the exact rows that were modified.  Not sure how you
 apply that to non-FK triggers though.

Err, sorry, I quoted the wrong part.  I meant, how would you rlimit
the server memory usage?

...Robert

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Err, sorry, I quoted the wrong part.  I meant, how would you rlimit
 the server memory usage?

Put a ulimit command in the server start script?  Depending on the
details of the start script you might need to put it in the postgres
user's .profile instead, but it's certainly doable.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Robert Haas
On Tue, Dec 15, 2009 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Err, sorry, I quoted the wrong part.  I meant, how would you rlimit
 the server memory usage?

 Put a ulimit command in the server start script?  Depending on the
 details of the start script you might need to put it in the postgres
 user's .profile instead, but it's certainly doable.

This may be a stupid question, but when you hit the limit, will it
result in an ERROR or a PANIC?

...Robert

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Dec 15, 2009 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Put a ulimit command in the server start script?  Depending on the
 details of the start script you might need to put it in the postgres
 user's .profile instead, but it's certainly doable.

 This may be a stupid question, but when you hit the limit, will it
 result in an ERROR or a PANIC?

Should be an ERROR ... if it isn't, that's probably a bug.  The design
intention is that malloc failure is just an ERROR.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] statement_timeout is not cancelling query

2009-12-14 Thread Craig Ringer

On 15/12/2009 12:35 PM, Mark Williamson wrote:


So what happened is, the above update never completed and the Postgresql
service consumed all available memory.  We had to forcefully reboot the
machine


That means your server is misconfigured. PostgreSQL should never consume 
all available memory. If it does, you have work_mem and/or 
maintenance_work_mem set way too high, and you have VM overcommit 
enabled in the kernel. You also have too much swap.


http://www.postgresql.org/docs/current/interactive/kernel-resources.html

http://www.network-theory.co.uk/docs/postgresql/vol3/LinuxMemoryOvercommit.html

I wouldn't be surprised if you had shared_buffers set too high as well, 
and you have no ulimit set on postgresql's memory usage. All those 
things add up to fatal.


A properly configured machine should be able to survive memory 
exhaustion caused by a user process fine. Disable VM overcommit, set a 
ulimit on postgresql so it can't consume all memory, use a sane amount 
of swap, and set sane values for work_mem and maintenance_work_mem.



Why does Postgresql NOT have a maximum memory allowed setting?  We want
to allocate resources efficiently and cannot allow one customer to
impact others.


It does. man ulimit.

The operating system can enforce it much better than PostgreSQL can. If 
a Pg bug was to cause Pg to go runaway or try to allocate insane amounts 
of RAM, the ulimit would catch it.


I *do* think it'd be nice to have ulimit values settable via 
postgresql.conf so that you didn't have to faff about editing init 
scripts, though.


( TODO item? )

--
Craig Ringer

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs