Re: [HACKERS] backend hangs at immediate shutdown
As I promised yesterday, I'll show you the precise call stack: #0 0x003fa0cf542e in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x003fa0c7bed5 in _L_lock_9323 () from /lib64/libc.so.6 #2 0x003fa0c797c6 in malloc () from /lib64/libc.so.6 #3 0x003fa0c2fd99 in _nl_make_l10nflist () from /lib64/libc.so.6 #4 0x003fa0c2e0a5 in _nl_find_domain () from /lib64/libc.so.6 #5 0x003fa0c2d990 in __dcigettext () from /lib64/libc.so.6 #6 0x006f2a71 in errhint () #7 0x00634064 in quickdie () #8 signal handler called #9 0x003fa0c77813 in _int_free () from /lib64/libc.so.6 #10 0x0070e329 in AllocSetDelete () #11 0x0070e8cb in MemoryContextDelete () #12 0x00571723 in FreeExprContext () #13 0x00571781 in FreeExecutorState () #14 0x005dc883 in evaluate_expr () #15 0x005ddca0 in simplify_function () #16 0x005de69f in eval_const_expressions_mutator () #17 0x00599143 in expression_tree_mutator () #18 0x005de452 in eval_const_expressions_mutator () #19 0x00599143 in expression_tree_mutator () #20 0x005de452 in eval_const_expressions_mutator () #21 0x005dfa2f in eval_const_expressions () #22 0x005cf16d in preprocess_expression () #23 0x005d2201 in subquery_planner () #24 0x005d23cf in standard_planner () #25 0x0063426a in pg_plan_query () #26 0x00634354 in pg_plan_queries () #27 0x00635310 in exec_simple_query () #28 0x00636333 in PostgresMain () #29 0x005f64e9 in PostmasterMain () #30 0x00596e20 in main () -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backend hangs at immediate shutdown
From: Tom Lane t...@sss.pgh.pa.us MauMau maumau...@gmail.com writes: How about the case where some backend crashes due to a bug of PostgreSQL? In this case, postmaster sends SIGQUIT to all backends, too. The instance is expected to disappear cleanly and quickly. Doesn't the hanging backend harm the restart of the instance? [ shrug... ] That isn't guaranteed, and never has been --- for instance, the process might have SIGQUIT blocked, perhaps as a result of third-party code we have no control over. Are you concerned about user-defined C functions? I don't think they need to block signals. So I don't find it too restrictive to say do not block or send signals in user-defined functions. If it's a real concern, it should be noted in the manul, rather than writing do not use pg_ctl stop -mi as much as you can, because it can leave hanging backends. How about using SIGKILL instead of SIGQUIT? Because then we couldn't notify clients at all. One practical disadvantage of that is that it would become quite hard to tell from the outside which client session actually crashed, which is frequently useful to know. How is the message below useful to determine which client session actually crashed? The message doesn't contain information about the crashed session. Are you talking about log_line_prefix? ERROR: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. However, it is not quickdie() but LogChildExit() that emits useful information to tell which session crashed. So I don't think quickdie()'s message is very helpful. I think if we want to make it bulletproof we'd have to do what the OP suggested and switch to SIGKILL. I'm not enamored of that for the reasons I mentioned --- but one idea that might dodge the disadvantages is to have the postmaster wait a few seconds and then SIGKILL any backends that hadn't exited. I believe that SIGKILL is the only and simple way to choose. Consider again: the purpose of pg_ctl stop -mi is to immediately and reliably shut down the instance. If it is not reliable, what can we do instead? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backend hangs at immediate shutdown
This isn't an area that admits of quick-fix solutions --- everything we might do has disadvantages. Also, the lack of complaints to date shows that the problem is not so large as to justify panic responses. I'm not really inclined to mess around with a tradeoff that's been working pretty well for a dozen years or more. What about adding a caution to the doc something like: pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage support enabled. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backend hangs at immediate shutdown
On 2013-01-31 08:27:13 +0900, Tatsuo Ishii wrote: This isn't an area that admits of quick-fix solutions --- everything we might do has disadvantages. Also, the lack of complaints to date shows that the problem is not so large as to justify panic responses. I'm not really inclined to mess around with a tradeoff that's been working pretty well for a dozen years or more. What about adding a caution to the doc something like: pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage support enabled. That doesn't entirely solve the problem, see quote and reply in 6845.1359561...@sss.pgh.pa.us I think adding errmsg_raw() or somesuch that doesn't allocate any memory and only accepts constant strings could solve the problem more completely, at the obvious price of not allowing translated strings directly. Those could be pretranslated during startup, but thats mighty ugly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backend hangs at immediate shutdown
Andres Freund and...@2ndquadrant.com writes: On 2013-01-31 08:27:13 +0900, Tatsuo Ishii wrote: What about adding a caution to the doc something like: pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage support enabled. That doesn't entirely solve the problem, see quote and reply in 6845.1359561...@sss.pgh.pa.us I think adding errmsg_raw() or somesuch that doesn't allocate any memory and only accepts constant strings could solve the problem more completely, at the obvious price of not allowing translated strings directly. I really doubt that this would make a measurable difference in the probability of failure. The OP's case looks like it might not have occurred if we weren't translating, but (a) that's not actually proven, and (b) there are any number of other, equally low-probability, reasons to have a problem here. Please note for instance that elog.c would still be doing a whole lot of palloc's even if the passed strings were not copied. I think if we want to make it bulletproof we'd have to do what the OP suggested and switch to SIGKILL. I'm not enamored of that for the reasons I mentioned --- but one idea that might dodge the disadvantages is to have the postmaster wait a few seconds and then SIGKILL any backends that hadn't exited. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backend hangs at immediate shutdown
What about adding a caution to the doc something like: pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage support enabled. That doesn't entirely solve the problem, see quote and reply in 6845.1359561...@sss.pgh.pa.us Oh, I see now. I think adding errmsg_raw() or somesuch that doesn't allocate any memory and only accepts constant strings could solve the problem more completely, at the obvious price of not allowing translated strings directly. Those could be pretranslated during startup, but thats mighty ugly. Are you suggesting to call errmsg_raw() instead of errmsg() in quickdie()? Tom said: That would reduce our exposure slightly, but hardly to zero. For instance, if SIGQUIT happened in the midst of handling a regular error, ErrorContext might be pretty full already, necessitating further malloc requests. If I understand this correctly, I don't think errmsg_raw() solves the particular problem. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers