Re: [HACKERS] backend hangs at immediate shutdown

2013-01-31 Thread MauMau

As I promised yesterday, I'll show you the precise call stack:

#0  0x003fa0cf542e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x003fa0c7bed5 in _L_lock_9323 () from /lib64/libc.so.6
#2  0x003fa0c797c6 in malloc () from /lib64/libc.so.6
#3  0x003fa0c2fd99 in _nl_make_l10nflist () from /lib64/libc.so.6
#4  0x003fa0c2e0a5 in _nl_find_domain () from /lib64/libc.so.6
#5  0x003fa0c2d990 in __dcigettext () from /lib64/libc.so.6
#6  0x006f2a71 in errhint ()
#7  0x00634064 in quickdie ()
#8  signal handler called
#9  0x003fa0c77813 in _int_free () from /lib64/libc.so.6
#10 0x0070e329 in AllocSetDelete ()
#11 0x0070e8cb in MemoryContextDelete ()
#12 0x00571723 in FreeExprContext ()
#13 0x00571781 in FreeExecutorState ()
#14 0x005dc883 in evaluate_expr ()
#15 0x005ddca0 in simplify_function ()
#16 0x005de69f in eval_const_expressions_mutator ()
#17 0x00599143 in expression_tree_mutator ()
#18 0x005de452 in eval_const_expressions_mutator ()
#19 0x00599143 in expression_tree_mutator ()
#20 0x005de452 in eval_const_expressions_mutator ()
#21 0x005dfa2f in eval_const_expressions ()
#22 0x005cf16d in preprocess_expression ()
#23 0x005d2201 in subquery_planner ()
#24 0x005d23cf in standard_planner ()
#25 0x0063426a in pg_plan_query ()
#26 0x00634354 in pg_plan_queries ()
#27 0x00635310 in exec_simple_query ()
#28 0x00636333 in PostgresMain ()
#29 0x005f64e9 in PostmasterMain ()
#30 0x00596e20 in main ()




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backend hangs at immediate shutdown

2013-01-31 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

MauMau maumau...@gmail.com writes:

How about the case where some backend crashes due to a bug of PostgreSQL?
In this case, postmaster sends SIGQUIT to all backends, too.  The 
instance
is expected to disappear cleanly and quickly.  Doesn't the hanging 
backend

harm the restart of the instance?


[ shrug... ]  That isn't guaranteed, and never has been --- for
instance, the process might have SIGQUIT blocked, perhaps as a result
of third-party code we have no control over.


Are you concerned about user-defined C functions?  I don't think they need 
to block signals.  So I don't find it too restrictive to say do not block 
or send signals in user-defined functions.  If it's a real concern, it 
should be noted in the manul, rather than writing do not use pg_ctl 
stop -mi as much as you can, because it can leave hanging backends.



How about using SIGKILL instead of SIGQUIT?


Because then we couldn't notify clients at all.  One practical
disadvantage of that is that it would become quite hard to tell from
the outside which client session actually crashed, which is frequently
useful to know.


How is the message below useful to determine which client session actually 
crashed?  The message doesn't contain information about the crashed session. 
Are you talking about log_line_prefix?


ERROR:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.


However, it is not quickdie() but LogChildExit() that emits useful 
information to tell which session crashed.  So I don't think quickdie()'s 
message is very helpful.




I think if we want to make it bulletproof we'd have to do what the
OP suggested and switch to SIGKILL.  I'm not enamored of that for the
reasons I mentioned --- but one idea that might dodge the disadvantages
is to have the postmaster wait a few seconds and then SIGKILL any
backends that hadn't exited.


I believe that SIGKILL is the only and simple way to choose.  Consider 
again: the purpose of pg_ctl stop -mi is to immediately and reliably shut 
down the instance.  If it is not reliable, what can we do instead?



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backend hangs at immediate shutdown

2013-01-30 Thread Tatsuo Ishii
 This isn't an area that admits of quick-fix solutions --- everything
 we might do has disadvantages.  Also, the lack of complaints to date
 shows that the problem is not so large as to justify panic responses.
 I'm not really inclined to mess around with a tradeoff that's been
 working pretty well for a dozen years or more.

What about adding a caution to the doc something like:

 pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage 
support enabled.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backend hangs at immediate shutdown

2013-01-30 Thread Andres Freund
On 2013-01-31 08:27:13 +0900, Tatsuo Ishii wrote:
  This isn't an area that admits of quick-fix solutions --- everything
  we might do has disadvantages.  Also, the lack of complaints to date
  shows that the problem is not so large as to justify panic responses.
  I'm not really inclined to mess around with a tradeoff that's been
  working pretty well for a dozen years or more.
 
 What about adding a caution to the doc something like:
 
pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage 
 support enabled.

That doesn't entirely solve the problem, see quote and reply in
6845.1359561...@sss.pgh.pa.us

I think adding errmsg_raw() or somesuch that doesn't allocate any memory
and only accepts constant strings could solve the problem more
completely, at the obvious price of not allowing translated strings
directly.
Those could be pretranslated during startup, but thats mighty ugly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backend hangs at immediate shutdown

2013-01-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-01-31 08:27:13 +0900, Tatsuo Ishii wrote:
 What about adding a caution to the doc something like:
 pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage support 
 enabled.

 That doesn't entirely solve the problem, see quote and reply in
 6845.1359561...@sss.pgh.pa.us

 I think adding errmsg_raw() or somesuch that doesn't allocate any memory
 and only accepts constant strings could solve the problem more
 completely, at the obvious price of not allowing translated strings
 directly.

I really doubt that this would make a measurable difference in the
probability of failure.  The OP's case looks like it might not have
occurred if we weren't translating, but (a) that's not actually proven,
and (b) there are any number of other, equally low-probability, reasons
to have a problem here.  Please note for instance that elog.c would
still be doing a whole lot of palloc's even if the passed strings were
not copied.

I think if we want to make it bulletproof we'd have to do what the
OP suggested and switch to SIGKILL.  I'm not enamored of that for the
reasons I mentioned --- but one idea that might dodge the disadvantages
is to have the postmaster wait a few seconds and then SIGKILL any
backends that hadn't exited.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backend hangs at immediate shutdown

2013-01-30 Thread Tatsuo Ishii
 What about adding a caution to the doc something like:
 
   pg_ctl -m -i stop may cause a PostgreSQL hang if native laguage 
 support enabled.
 
 That doesn't entirely solve the problem, see quote and reply in
 6845.1359561...@sss.pgh.pa.us

Oh, I see now.

 I think adding errmsg_raw() or somesuch that doesn't allocate any memory
 and only accepts constant strings could solve the problem more
 completely, at the obvious price of not allowing translated strings
 directly.
 Those could be pretranslated during startup, but thats mighty ugly.

Are you suggesting to call errmsg_raw() instead of errmsg() in quickdie()?

Tom said:
 That would reduce our exposure slightly, but hardly to zero.  For
 instance, if SIGQUIT happened in the midst of handling a regular error,
 ErrorContext might be pretty full already, necessitating further malloc
 requests.

If I understand this correctly, I don't think errmsg_raw()
solves the particular problem.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers