Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 12:09 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 15/12/2009 12:35 PM, Mark Williamson wrote: So what happened is, the above update never completed and the Postgresql service consumed all available memory. We had to forcefully reboot the machine That means your server is misconfigured. PostgreSQL should never consume all available memory. If it does, you have work_mem and/or maintenance_work_mem set way too high, and you have VM overcommit enabled in the kernel. You also have too much swap. http://www.postgresql.org/docs/current/interactive/kernel-resources.html http://www.network-theory.co.uk/docs/postgresql/vol3/LinuxMemoryOvercommit.html I wouldn't be surprised if you had shared_buffers set too high as well, and you have no ulimit set on postgresql's memory usage. All those things add up to fatal. A properly configured machine should be able to survive memory exhaustion caused by a user process fine. Disable VM overcommit, set a ulimit on postgresql so it can't consume all memory, use a sane amount of swap, and set sane values for work_mem and maintenance_work_mem. I am skeptical that this is the real cause of the problem. Yeah, OK, ulimit might have stopped it, but I don't think any of the rest of this would have mattered. Unfortunately, we don't know how large the table was that the OP attempted to update, or the details of how the trigger was set up, but my guess it that it was the pending-trigger list that sucked up all the available memory on the box. This is an issue that other people have run into in the past, and I don't think we have a good solution. I wonder if we should put some kind of a limit in place so that queries like this will at least fail relatively gracefully with an error message rather than taking down the box. There is a feature forthcoming in 8.5 which will make it easier to avoid these types of problems, by allowing you to test a condition before the trigger gets added to the pending trigger list. So if your trigger is designed to update some side table whenever a certain column is updated, you can really skip the trigger altogether without incurring any memory overhead. But there will still be people who do it the old way, and it would be nice if we could at least mitigate the impact a little. ...Robert -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote: This is an issue that other people have run into in the past, and I don't think we have a good solution. I wonder if we should put some kind of a limit in place so that queries like this will at least fail relatively gracefully with an error message rather than taking down the box. Eh? That's exactly what's supposed to happen now. When malloc returns 0 you're supposed to get a graceful error message and transaction abort. -- greg -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 11:02 AM, Greg Stark gsst...@mit.edu wrote: On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote: This is an issue that other people have run into in the past, and I don't think we have a good solution. I wonder if we should put some kind of a limit in place so that queries like this will at least fail relatively gracefully with an error message rather than taking down the box. Eh? That's exactly what's supposed to happen now. When malloc returns 0 you're supposed to get a graceful error message and transaction abort. I didn't know that, but it I think by the time malloc returns 0 usually other bad things are happening. I don't think that's really an answer. ...Robert -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 4:16 PM, Robert Haas robertmh...@gmail.com wrote: I didn't know that, but it I think by the time malloc returns 0 usually other bad things are happening. I don't think that's really an answer. Only if, as Craig said and you disputed, you have overcommit enabled or lots of swap. There is a problem though, if you have overcommit disabled you kind of need lots of swap to avoid running out of virtual memory long before you're actually short on physical memory. And it's true that Linux has had trouble dealing with low memory situations in the past few years, at least in my experience. But that's not true on all OSes and we can hope it won't be true for Linux forever too. -- greg -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 12:07 PM, Greg Stark gsst...@mit.edu wrote: On Tue, Dec 15, 2009 at 4:16 PM, Robert Haas robertmh...@gmail.com wrote: I didn't know that, but it I think by the time malloc returns 0 usually other bad things are happening. I don't think that's really an answer. Only if, as Craig said and you disputed, you have overcommit enabled or lots of swap. I definitely dispute that. :-) I mean, typically what happens when you allocate a lot of memory is that everything else on the system that's not in active use gets pushed out of RAM to make room for the memory hog. So all of your file cache goes away and not-recently-accessed portions of other processes code and data segments get swapped out. The system becomes crushingly slow and unresponsive. On my laptop, for example, eventually as memory consumption increases you can't use Xwindows any more because all of those processes have been pushed out to disk to make room for the memory hog. Every time you move the mouse it faults all those pages back in to try to redraw the screen. But as soon as you stop it pushes them all back out again. It's difficult to reach the point where malloc actually fails because as swapping increases the activity of the system (including the runaway process) grinds almost to a halt. I'm not willing to wait that long for my transaction to fail. I suppose that I could fix this by getting rid of my swap partition altogether, but that seems a rather extreme solution, and it's certainly not the way most UNIX/Linux systems I run across are configured, if for no other reason than that the operating system configurator usually recommends creating one. Am I all wet here? I thought waiting for malloc to fail was usually considered a poor error recovery mechanism. ...Robert -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 5:18 PM, Robert Haas robertmh...@gmail.com wrote: I suppose that I could fix this by getting rid of my swap partition altogether, but that seems a rather extreme solution, and it's certainly not the way most UNIX/Linux systems I run across are configured, if for no other reason than that the operating system configurator usually recommends creating one. Well I suppose it's a question of degree. The more swap you have the more you can do before you run out of memory. But nobody said swap performs as well as RAM What's the kernel going to do though, refuse to allocate memory when it has swap available? The problem is that the OS gives no feedback on when you're running low on RAM but haven't run out yet. There were experiments in the 90s with a SIGDANGER but I think it never worked great and I don't think it's widespread. If Postgres imposed a limit itself it would a) be hard for it to be a server-wide limit and b) wouldn't take into account other things running on the system. So that doesn't really help either. -- greg -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
Robert Haas escribió: On Tue, Dec 15, 2009 at 11:02 AM, Greg Stark gsst...@mit.edu wrote: On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote: This is an issue that other people have run into in the past, and I don't think we have a good solution. I wonder if we should put some kind of a limit in place so that queries like this will at least fail relatively gracefully with an error message rather than taking down the box. Eh? That's exactly what's supposed to happen now. When malloc returns 0 you're supposed to get a graceful error message and transaction abort. I didn't know that, but it I think by the time malloc returns 0 usually other bad things are happening. I don't think that's really an answer. Actually the real answer is that we can't spill the deferred trigger queue to disk, so it consumes all memory. *That* is very much our own shortcoming. The fact that this drove the operating system into thrashing is just an expected (if undesirable) side effect. Greg is right that if malloc returns 0 we act sanely, i.e. abort transaction and release all the memory; Craig is right that having this happen is a good thing. Whether or not this causes a lot of thrashing depends on the situation, but that's an OS level problem, not Postgres'. If we're to do anything about this, it is spilling the trigger queue so it doesn't eat an unbounded amount of memory. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
Alvaro Herrera alvhe...@commandprompt.com writes: If we're to do anything about this, it is spilling the trigger queue so it doesn't eat an unbounded amount of memory. Of course, the reason nothing much has been done about that is that by the time your trigger queue is long enough to cause such an issue, you're screwed anyway --- actually executing all those triggers would take longer than you'll want to wait. I tend to agree with the upthread opinion that if you're on a platform that responds so ungracefully to memory hogs, you'd be best off using ulimit to confine backends to a moderate amount of memory space. But this *is* a platform problem; it's just one more example of how poorly designed and implemented Linux's memory handling is. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: If we're to do anything about this, it is spilling the trigger queue so it doesn't eat an unbounded amount of memory. Of course, the reason nothing much has been done about that is that by the time your trigger queue is long enough to cause such an issue, you're screwed anyway --- actually executing all those triggers would take longer than you'll want to wait. What is the best way to go about doing that, anyway? ...Robert -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: If we're to do anything about this, it is spilling the trigger queue so it doesn't eat an unbounded amount of memory. Of course, the reason nothing much has been done about that is that by the time your trigger queue is long enough to cause such an issue, you're screwed anyway --- actually executing all those triggers would take longer than you'll want to wait. What is the best way to go about doing that, anyway? Well, we added conditional triggers which provides a partial fix. The only other idea I've heard that sounds like it'd really help is having some sort of lossy storage for foreign-key triggers, where we'd fall back to per-block or whole-table rechecking of the constraint instead of trying to track the exact rows that were modified. Not sure how you apply that to non-FK triggers though. Err, sorry, I quoted the wrong part. I meant, how would you rlimit the server memory usage? ...Robert -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
Robert Haas robertmh...@gmail.com writes: Err, sorry, I quoted the wrong part. I meant, how would you rlimit the server memory usage? Put a ulimit command in the server start script? Depending on the details of the start script you might need to put it in the postgres user's .profile instead, but it's certainly doable. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On Tue, Dec 15, 2009 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Err, sorry, I quoted the wrong part. I meant, how would you rlimit the server memory usage? Put a ulimit command in the server start script? Depending on the details of the start script you might need to put it in the postgres user's .profile instead, but it's certainly doable. This may be a stupid question, but when you hit the limit, will it result in an ERROR or a PANIC? ...Robert -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
Robert Haas robertmh...@gmail.com writes: On Tue, Dec 15, 2009 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Put a ulimit command in the server start script? Depending on the details of the start script you might need to put it in the postgres user's .profile instead, but it's certainly doable. This may be a stupid question, but when you hit the limit, will it result in an ERROR or a PANIC? Should be an ERROR ... if it isn't, that's probably a bug. The design intention is that malloc failure is just an ERROR. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] statement_timeout is not cancelling query
On 15/12/2009 12:35 PM, Mark Williamson wrote: So what happened is, the above update never completed and the Postgresql service consumed all available memory. We had to forcefully reboot the machine That means your server is misconfigured. PostgreSQL should never consume all available memory. If it does, you have work_mem and/or maintenance_work_mem set way too high, and you have VM overcommit enabled in the kernel. You also have too much swap. http://www.postgresql.org/docs/current/interactive/kernel-resources.html http://www.network-theory.co.uk/docs/postgresql/vol3/LinuxMemoryOvercommit.html I wouldn't be surprised if you had shared_buffers set too high as well, and you have no ulimit set on postgresql's memory usage. All those things add up to fatal. A properly configured machine should be able to survive memory exhaustion caused by a user process fine. Disable VM overcommit, set a ulimit on postgresql so it can't consume all memory, use a sane amount of swap, and set sane values for work_mem and maintenance_work_mem. Why does Postgresql NOT have a maximum memory allowed setting? We want to allocate resources efficiently and cannot allow one customer to impact others. It does. man ulimit. The operating system can enforce it much better than PostgreSQL can. If a Pg bug was to cause Pg to go runaway or try to allocate insane amounts of RAM, the ulimit would catch it. I *do* think it'd be nice to have ulimit values settable via postgresql.conf so that you didn't have to faff about editing init scripts, though. ( TODO item? ) -- Craig Ringer -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs