Re: [PERFORM] Kill a session
Hi, Tom, Tom Lane wrote: Our expectation is that all or at least most queries should respond to SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second timescale. However there are various loops in the backend that fail to execute CHECK_FOR_INTERRUPTS sufficiently often :-(. The same is true for user-defined C funtions. The PostGIS GEOS geometry functions come to mind, for complex geometries, they can need hours to complete. And as GEOS is a 3rd-Party library, I don't see an easy way to make them CHECK_FOR_INTERRUPTS. Does anybody know how this is for plpgsql, pljava and plpython? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Kill a session
Tom Lane wrote: Craig A. James [EMAIL PROTECTED] writes: Bottom line is that I was expecting instant death with SIGTERM, but instead got an agonizing, drawn out -- but safe -- death of the query. What was the query exactly? Our expectation is that all or at least most queries should respond to SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second timescale. However there are various loops in the backend that fail to execute CHECK_FOR_INTERRUPTS sufficiently often :-(. We've been gradually finding and fixing these, and will be glad to fix your case if you provide enough details to pin it down. You might be interested in this current thread about a similar problem: http://archives.postgresql.org/pgsql-patches/2006-07/msg00039.php Thanks, this is good information. The qsort is a distinct possibility. The query is a big insert into some_hitlist (select id from another_hitlist join data_table on (...)) where the hitlists are unindexed. So it may be using a merge-join with qsort. When I have a few minutes, I'll turn on logging in the app and find the exact SQL, and run an EXPLAIN ANALYZE and see what's really happening. It's also possible that the INSERT itself is the problem, or adds to the problem. The SIGINT may come after a few million rows have been inserted, so it would have to clean that up, right? Thanks, Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Kill a session
Craig A. James [EMAIL PROTECTED] writes: It's also possible that the INSERT itself is the problem, or adds to the problem. The SIGINT may come after a few million rows have been inserted, so it would have to clean that up, right? No, because we don't use UNDO. The next VACUUM would have a bit of a mess to clean up, but you wouldn't pay for it at the time of the abort. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Kill a session
Craig A. James wrote: I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? As others have mentioned, pg_cancel_backend(pid) will stop query execution by backend process id 'pid'. While this is often enough, if you actually want to disconnect a backend process then there is nothing to let you do this remotely. I recently did a patch for Bizgres that just implements the pg_terminate_backend(pid) function (currently #ifdef'ed out of the codebase) as a contrib so it can be easily installed. See http://pgfoundry.org/pipermail/bizgres-general/2006-May/000484.html If you want to try it out, please read the README (it discusses possible dangers associated with sending SIGTERM to backends). And I would certainly be interested in hearing what level of success (or otherwise) you have with it! Best wishes Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Kill a session
Thanks for your reply, Mark: I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? ... if you actually want to disconnect a backend process then there is nothing to let you do this remotely. I recently did a patch for Bizgres that just implements the pg_terminate_backend(pid) function (currently #ifdef'ed out of the codebase) as a contrib so it can be easily installed. See http://pgfoundry.org/pipermail/bizgres-general/2006-May/000484.html This answers my question. I've finally got a statement in concrete terms, Postgres has no way to kill a backend process via an SQL statement. If Mark had to resort to this, then there is no other way. If you want to try it out, please read the README (it discusses possible dangers associated with sending SIGTERM to backends). And I would certainly be interested in hearing what level of success (or otherwise) you have with it! Thanks, but I've already implemented my own, which is essentially identical in concept to yours, but simpler in the sense of being even less safe than yours -- I just let anyone send the signal, since I have no users other than my own app. I'll keep my version since it's embedded in my own plug-in. That way I won't have to keep remembering to modify the Postgres code when I upgrade. I like to keep Postgres stock. Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Kill a session
Steinar H. Gunderson wrote: On Wed, Jul 12, 2006 at 08:43:18AM -0700, Craig A. James wrote: Then you killed the wrong backend... No queries run in postmaster. They all run in postgres backends. The postmaster does very little actual work, other than keeping track of everybody else. It turns out I was confused by this: ps(1) reports a process called postgres, but top(1) reports a process called postmaster, but they both have the same pid. I guess postmaster replaces its own name in the process table when it's executing a query, and it's not really the postmaster even though top(1) calls it postmaster. So kill -15 pid is NOT killing the process -- to kill the process, I have to use signal 9. But if I do that, ALL queries in progress are aborted. I might as well shut down and restart the database, which is an unacceptable solution for a web site. I don't follow your logic here. If you do kill -15 pid of the postmaster doing the work, the query should be aborted without taking down the entire cluster. I don't see why you'd need -9 (which is a really bad idea anyhow)... I've solved this mystery. kill -15 doesn't immediately kill the job -- it aborts the query, but it might take 15-30 seconds to clean up. This confused me, because the query I was using to test took about 30 seconds, so the SIGTERM didn't seem to make a difference. But when I used a harder query, one that would run for 5-10 minutes, SIGTERM still stopped it after 15 seconds, which isn't great but it's acceptable. Bottom line is that I was expecting instant death with SIGTERM, but instead got an agonizing, drawn out -- but safe -- death of the query. At least that's my deduction based on experiments. I haven't dug into the source to confirm. Thanks everyone for your answers. My kill this query feature is now acceptable. Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Kill a session
Craig A. James [EMAIL PROTECTED] writes: Bottom line is that I was expecting instant death with SIGTERM, but instead got an agonizing, drawn out -- but safe -- death of the query. What was the query exactly? Our expectation is that all or at least most queries should respond to SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second timescale. However there are various loops in the backend that fail to execute CHECK_FOR_INTERRUPTS sufficiently often :-(. We've been gradually finding and fixing these, and will be glad to fix your case if you provide enough details to pin it down. You might be interested in this current thread about a similar problem: http://archives.postgresql.org/pgsql-patches/2006-07/msg00039.php regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Kill a session
Magnus Hagander wrote: This raises the question: Why doesn't Postgres have a kill session command that works? Oracle has it, and it's invaluable; there is no substitute. Various writers to these PG lists have raised the question repeatedly. Is it just a matter that nobody has had the time to do it (which I respect!), or is there a reason why the Postgres team decided a kill session is a bad idea? I beleive the function to kill a backend is actually in the codebase, it's just commented out because it's considered dangerous. There are some possible issues (see -hackers archives) about sending SIGTERM without actually shutting down the whole cluster. Doing the client-side function to call is the easy part. In many cases you just need to cancel a query, in which case you can use pg_cancel_backend() for exmaple. If you need to actually kill it, your only supported way is to restart postgresql. In other words, are you confirming that there is no way to kill a query from another process, other than shutting down the database? My understanding of the documentation tells me I can't use cancel, because the process doing the killing isn't the original process. But in spite earlier posting in these forums that say the killing the backend was the way to go, this doesn't really work. First, even though the postgres backend job is properly killed, a postmaster job keeps running at 99% CPU, which is pretty useless. Killing the client's backend didn't kill the process actually doing the work! Then you killed the wrong backend... No queries run in postmaster. They all run in postgres backends. The postmaster does very little actual work, other than keeping track of everybody else. It turns out I was confused by this: ps(1) reports a process called postgres, but top(1) reports a process called postmaster, but they both have the same pid. I guess postmaster replaces its own name in the process table when it's executing a query, and it's not really the postmaster even though top(1) calls it postmaster. So kill -15 pid is NOT killing the process -- to kill the process, I have to use signal 9. But if I do that, ALL queries in progress are aborted. I might as well shut down and restart the database, which is an unacceptable solution for a web site. I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? Thanks, Craig ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Kill a session
On Wed, Jul 12, 2006 at 08:43:18AM -0700, Craig A. James wrote: Then you killed the wrong backend... No queries run in postmaster. They all run in postgres backends. The postmaster does very little actual work, other than keeping track of everybody else. It turns out I was confused by this: ps(1) reports a process called postgres, but top(1) reports a process called postmaster, but they both have the same pid. I guess postmaster replaces its own name in the process table when it's executing a query, and it's not really the postmaster even though top(1) calls it postmaster. So kill -15 pid is NOT killing the process -- to kill the process, I have to use signal 9. But if I do that, ALL queries in progress are aborted. I might as well shut down and restart the database, which is an unacceptable solution for a web site. I don't follow your logic here. If you do kill -15 pid of the postmaster doing the work, the query should be aborted without taking down the entire cluster. I don't see why you'd need -9 (which is a really bad idea anyhow)... I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? Kill it with -15. If you're worried about your CGI scripts, use sudo or some sort of client/server solution. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Kill a session
I beleive the function to kill a backend is actually in the codebase, it's just commented out because it's considered dangerous. There are some possible issues (see -hackers archives) about sending SIGTERM without actually shutting down the whole cluster. Doing the client-side function to call is the easy part. In many cases you just need to cancel a query, in which case you can use pg_cancel_backend() for exmaple. If you need to actually kill it, your only supported way is to restart postgresql. In other words, are you confirming that there is no way to kill a query from another process, other than shutting down the database? My understanding of the documentation tells me I can't use cancel, because the process doing the killing isn't the original process. You can't kill another backend, no. You can *cancel* a query on it and return it to idle state. See http://www.postgresql.org/docs/8.1/interactive/functions-admin.html, pg_cancel_backend(). So kill -15 pid is NOT killing the process -- to kill the process, I have to use signal 9. But if I do that, ALL queries in progress are aborted. I might as well shut down and restart the database, which is an unacceptable solution for a web site. I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? Runaway queries can be killed with pg_cancel_backend(), or from the commandline using kill -INT pid. The backend will still be around, but it will have cancelled the query. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Kill a session
Craig A. James wrote: Magnus Hagander wrote: This raises the question: Why doesn't Postgres have a kill session command that works? Oracle has it, and it's invaluable; there is no substitute. Various writers to these PG lists have raised the question repeatedly. Is it just a matter that nobody has had the time to do it (which I respect!), or is there a reason why the Postgres team decided a kill session is a bad idea? I beleive the function to kill a backend is actually in the codebase, it's just commented out because it's considered dangerous. There are some possible issues (see -hackers archives) about sending SIGTERM without actually shutting down the whole cluster. Doing the client-side function to call is the easy part. In many cases you just need to cancel a query, in which case you can use pg_cancel_backend() for exmaple. If you need to actually kill it, your only supported way is to restart postgresql. In other words, are you confirming that there is no way to kill a query from another process, other than shutting down the database? My understanding of the documentation tells me I can't use cancel, because the process doing the killing isn't the original process. But in spite earlier posting in these forums that say the killing the backend was the way to go, this doesn't really work. First, even though the postgres backend job is properly killed, a postmaster job keeps running at 99% CPU, which is pretty useless. Killing the client's backend didn't kill the process actually doing the work! Then you killed the wrong backend... No queries run in postmaster. They all run in postgres backends. The postmaster does very little actual work, other than keeping track of everybody else. It turns out I was confused by this: ps(1) reports a process called postgres, but top(1) reports a process called postmaster, but they both have the same pid. I guess postmaster replaces its own name in the process table when it's executing a query, and it's not really the postmaster even though top(1) calls it postmaster. So kill -15 pid is NOT killing the process -- to kill the process, I have to use signal 9. But if I do that, ALL queries in progress are aborted. I might as well shut down and restart the database, which is an unacceptable solution for a web site. I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? are you maybe looking for pg_cancel_backend() ? http://www.postgresql.org/docs/current/interactive/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL-TABLE Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Kill a session
Craig A. James wrote: There have been dozens, perhaps hundreds, of entries in the pg-admin, pg-general, and pg-performance lists regarding killing a session, but as far as I can tell, there is no Postgres solution. Did I miss something? This raises the question: Why doesn't Postgres have a kill session command that works? Oracle has it, and it's invaluable; there is no substitute. Various writers to these PG lists have raised the question repeatedly. Is it just a matter that nobody has had the time to do it (which I respect!), or is there a reason why the Postgres team decided a kill session is a bad idea? You are sure you read: http://www.postgresql.org/docs/8.1/interactive/protocol-flow.html#AEN60635 ? Regards Tino Wildenhain ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Kill a session
There have been dozens, perhaps hundreds, of entries in the pg-admin, pg-general, and pg-performance lists regarding killing a session, but as far as I can tell, there is no Postgres solution. Did I miss something? This raises the question: Why doesn't Postgres have a kill session command that works? Oracle has it, and it's invaluable; there is no substitute. Various writers to these PG lists have raised the question repeatedly. Is it just a matter that nobody has had the time to do it (which I respect!), or is there a reason why the Postgres team decided a kill session is a bad idea? [snip] I beleive the function to kill a backend is actually in the codebase, it's just commented out because it's considered dangerous. There are some possible issues (see -hackers archives) about sending SIGTERM without actually shutting down the whole cluster. Doing the client-side function to call is the easy part. In many cases you just need to cancel a query, in which case you can use pg_cancel_backend() for exmaple. If you need to actually kill it, your only supported way is to restart postgresql. But in spite earlier posting in these forums that say the killing the backend was the way to go, this doesn't really work. First, even though the postgres backend job is properly killed, a postmaster job keeps running at 99% CPU, which is pretty useless. Killing the client's backend didn't kill the process actually doing the work! Then you killed the wrong backend... Second, the KILLING SESSION message to stderr is only printed in the PG log file sporadically. This confuses me, since the KILLING SESSION is printed by a *different* process than the one being killed, so it shouldn't be affected. So what happens to fprintf()'s output? Most of the time, I just get unexpected EOF on client connection in the log which is presumably the postmaster complaining that the postgres child process died. No, that's the postgres chlid process complaining that your client (CGI?) died without sending a close message. I know the kill_session() is working because it returns true, and the job is in fact killed. But the query keeps running in postmaster (or is it something else, like a rollback?), and the stderr output disappears. No queries run in postmaster. They all run in postgres backends. The postmaster does very little actual work, other than keeping track of everybody else. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match