Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Heikki Linnakangas
Ok, I've committed a minimal patch to pg_standby in CVS HEAD and 
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I added 
a signal handler for SIGUSR1 to trigger failover; that should be 
considered the preferred signal for that, even though SIGINT still works 
too.


SIGQUIT is trapped to just die immediately, but without core dumping. As 
we still use SIGQUIT for immediate shutdown, any other archive_command 
or restore_command will still receive SIGQUIT on immediate shutdown, and 
by default dump core. Let's just live with that for now..


This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT 
instead. Where should I make a note of that so that we don't forget?


Heikki Linnakangas wrote:

Fujii Masao wrote:

Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

I'm leaning towards option 3, but I wonder if anyone sees a better
solution.

4. Use the shared memory to tell the startup process about the shutdown
state.
When a shutdown signal arrives, postmaster sets the corresponding 
shutdown

state to the shared memory before signaling to the child processes. The
startup
process check the shutdown state whenever executing system(), and
determine
how to exit according to that state. This solution doesn't change any
existing
behavior of pg_standby. What is your opinion?
That would only solve the problem for pg_standby. Other programs you 
might
use as a restore_command or archive_command like cp or rsync 
would still

core dump on the SIGQUIT.


Right. I've just understood your intention. I also agree with option 3 
if nobody
complains about lack of backward compatibility of pg_standby. If no, 
how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the 
archiver

and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?


Hmm, the startup/archiver process would then in turn need to kill the 
external command with SIGINT. I guess that would work.


There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
return code. The startup process won't recognize that as killed by 
signal, and we're back to same problem we have with pg_standby that the 
startup process doesn't die but continues with the startup. Notably 
rsync seems to behave like that.


BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
reimplementing system(3). The conclusion of that thread was the usage of 
setsid() and process groups, to ensure that the SIGQUIT is delivered to 
the archive/recovery_command.


I'm starting to feel that this is getting too complicated. Maybe we 
should just fix pg_standby to not trap SIGQUIT, and live with the core 
dumps...



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
-
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Heikki Linnakangas

Andrew Dunstan wrote:

Heikki Linnakangas wrote:
This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or 
SIGINT instead. Where should I make a note of that so that we don't 
forget?


Unless I'm missing it the use of signals to trigger failover is not 
documented AT ALL. So why anyone would expect such behaviour is 
something of a mystery.


Well, some people do read source code. If it was more widely known, I 
would hesitate more to change it, though.



Perhaps doing that would be even more important than release notes.


Agreed it should be documented.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
-
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Bruce Momjian
Heikki Linnakangas wrote:
 This should be mentioned in release notes, as any script that might be 
 using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT 
 instead. Where should I make a note of that so that we don't forget?

The CVS commit message.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Robert Haas
On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian br...@momjian.us wrote:
 Heikki Linnakangas wrote:
 This should be mentioned in release notes, as any script that might be
 using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
 instead. Where should I make a note of that so that we don't forget?

 The CVS commit message.

Is there some reason we don't just put it in the release notes as
*part* of the commit?  Someone can always go back and edit it later.
It seems like that would be easier and less error-prone than grepping
the CVS commit logs for release notes...

...Robert
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian br...@momjian.us wrote:
 The CVS commit message.

 Is there some reason we don't just put it in the release notes as
 *part* of the commit?  Someone can always go back and edit it later.

That was suggested before, and I think we actually tried it for a few
months.  It didn't work.

Putting an item in the release notes *properly* is a whole lot more
work than putting a short bit of text in the CVS log (especially for
committers whose first language isn't English).  It would also
create a lot more merge-collision issues for unrelated patches.

It's less trouble overall to do the editing, organizing, and SGML-ifying
of all the release notes at once.  Also you end up with a better
product, assuming that whoever is doing the notes puts in reasonable
editorial effort.

regards, tom lane
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Robert Haas
On Wed, Mar 18, 2009 at 5:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian br...@momjian.us wrote:
 The CVS commit message.

 Is there some reason we don't just put it in the release notes as
 *part* of the commit?  Someone can always go back and edit it later.

 That was suggested before, and I think we actually tried it for a few
 months.  It didn't work.

 Putting an item in the release notes *properly* is a whole lot more
 work than putting a short bit of text in the CVS log (especially for
 committers whose first language isn't English).  It would also
 create a lot more merge-collision issues for unrelated patches.

Yeah, I wouldn't ask people to include it in the patches they post.
That would be a pain, and people would probably tend (with the best of
intentions) to inflate the relative importance of their own work.  I
was thinking that the committer could make a quick entry at the time
they actually committed the patch, so that the step you describe below
could start with something other than an email box.

 It's less trouble overall to do the editing, organizing, and SGML-ifying
 of all the release notes at once.  Also you end up with a better
 product, assuming that whoever is doing the notes puts in reasonable
 editorial effort.

If it works for the people who are doing it, good enough.

...Robert
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Andrew Dunstan



Heikki Linnakangas wrote:
Ok, I've committed a minimal patch to pg_standby in CVS HEAD and 
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I 
added a signal handler for SIGUSR1 to trigger failover; that should be 
considered the preferred signal for that, even though SIGINT still 
works too.


SIGQUIT is trapped to just die immediately, but without core dumping. 
As we still use SIGQUIT for immediate shutdown, any other 
archive_command or restore_command will still receive SIGQUIT on 
immediate shutdown, and by default dump core. Let's just live with 
that for now..


This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or 
SIGINT instead. Where should I make a note of that so that we don't 
forget?





Unless I'm missing it the use of signals to trigger failover is not 
documented AT ALL. So why anyone would expect such behaviour is 
something of a mystery.


Perhaps doing that would be even more important than release notes.

cheers

andrew
-
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-04 Thread Heikki Linnakangas
Per discussion, here's a patch for pg_standby in REL8_3_STABLE. The 
signal handling is changed so that SIGQUIT no longer triggers failover, 
but immediately kills pg_standby, triggering FATAL death of the startup 
process too. That's what you want with immediate shutdown.


SIGUSR1 is now accepted as a signal to trigger failover. SIGINT is still 
accepted too, but that should be considered deprecated since we're 
likely to use SIGINT for immediate shutdown (for startup process) in 8.4.


We should document the use of signals to trigger failover in the 
manual... Any volunteers?


This should be noted in the release notes:

If you are using pg_standby, and if you are using signals (e.g killall 
-SIGINT pg_standby) to trigger failover, change your scripts to use 
SIGUSR1 instead of SIGQUIT or SIGINT. SIGQUIT no longer triggers 
failover, but aborts the recovery and shuts down the standby database. 
SIGINT is still accepted as failover trigger, but should be considered 
as deprecated and will also be changed to trigger immediate shutdown in 
a future release.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: pg_standby.c
===
RCS file: /cvsroot/pgsql/contrib/pg_standby/pg_standby.c,v
retrieving revision 1.10.2.3
diff -c -r1.10.2.3 pg_standby.c
*** pg_standby.c	6 Jan 2009 17:27:19 -	1.10.2.3
--- pg_standby.c	4 Mar 2009 09:13:34 -
***
*** 451,464 
  	signaled = true;
  }
  
  /* MAIN */
  int
  main(int argc, char **argv)
  {
  	int			c;
  
! 	(void) signal(SIGINT, sighandler);
! 	(void) signal(SIGQUIT, sighandler);
  
  	while ((c = getopt(argc, argv, cdk:lr:s:t:w:)) != -1)
  	{
--- 451,487 
  	signaled = true;
  }
  
+ /* We don't want SIGQUIT to core dump */
+ static void
+ sigquit_handler(int sig)
+ {
+ 	signal(SIGINT, SIG_DFL);
+ 	kill(getpid(), SIGINT);
+ }
+ 
+ 
  /* MAIN */
  int
  main(int argc, char **argv)
  {
  	int			c;
  
! 	/*
! 	 * You can send SIGUSR1 to trigger failover.
! 	 *
! 	 * Postmaster uses SIGQUIT to request immediate shutdown. The default
! 	 * action is to core dump, but we don't want that, so trap it and
! 	 * commit suicide without core dump.
! 	 *
! 	 * We used to use SIGINT and SIGQUIT to trigger failover, but that
! 	 * turned out to be a bad idea because postmaster uses SIGQUIT to
! 	 * request immediate shutdown. We still trap SIGINT, but that is
! 	 * deprecated. We will likely switch to using SIGINT for immediate
! 	 * shutdown in future releases.
! 	 */
! 	(void) signal(SIGUSR1, sighandler);
! 	(void) signal(SIGINT, sighandler); /* deprecated, use SIGUSR1 */
! 	(void) signal(SIGQUIT, sigquit_handler);
  
  	while ((c = getopt(argc, argv, cdk:lr:s:t:w:)) != -1)
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-04 Thread Heikki Linnakangas

Fujii Masao wrote:

Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

I'm leaning towards option 3, but I wonder if anyone sees a better
solution.

4. Use the shared memory to tell the startup process about the shutdown
state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The
startup
process check the shutdown state whenever executing system(), and
determine
how to exit according to that state. This solution doesn't change any
existing
behavior of pg_standby. What is your opinion?

That would only solve the problem for pg_standby. Other programs you might
use as a restore_command or archive_command like cp or rsync would still
core dump on the SIGQUIT.


Right. I've just understood your intention. I also agree with option 3 if nobody
complains about lack of backward compatibility of pg_standby. If no, how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?


Hmm, the startup/archiver process would then in turn need to kill the 
external command with SIGINT. I guess that would work.


There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
return code. The startup process won't recognize that as killed by 
signal, and we're back to same problem we have with pg_standby that the 
startup process doesn't die but continues with the startup. Notably 
rsync seems to behave like that.


BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
reimplementing system(3). The conclusion of that thread was the usage of 
setsid() and process groups, to ensure that the SIGQUIT is delivered to 
the archive/recovery_command.


I'm starting to feel that this is getting too complicated. Maybe we 
should just fix pg_standby to not trap SIGQUIT, and live with the core 
dumps...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-03 Thread Zdenek Kotala

Dne  2.03.09 08:59, Heikki Linnakangas napsal(a):

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
I'm leaning towards option 3, but I wonder if anyone sees a better 
solution.


4. Use the shared memory to tell the startup process about the 
shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding 
shutdown
state to the shared memory before signaling to the child processes. 
The startup
process check the shutdown state whenever executing system(), and 
determine
how to exit according to that state. This solution doesn't change any 
existing

behavior of pg_standby. What is your opinion?


That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like cp or rsync 
would still core dump on the SIGQUIT.




I think that we could have two methods. Extended method will use share 
memory to say what child should do and standard which send appropriate 
signal to child. For example pg_ctl could use extended communication to 
better postmaster controlling.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-03 Thread Heikki Linnakangas

Zdenek Kotala wrote:

Dne  2.03.09 08:59, Heikki Linnakangas napsal(a):

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
I'm leaning towards option 3, but I wonder if anyone sees a better 
solution.


4. Use the shared memory to tell the startup process about the 
shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding 
shutdown
state to the shared memory before signaling to the child processes. 
The startup
process check the shutdown state whenever executing system(), and 
determine
how to exit according to that state. This solution doesn't change any 
existing

behavior of pg_standby. What is your opinion?


That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like cp or rsync 
would still core dump on the SIGQUIT.




I think that we could have two methods. Extended method will use share 
memory to say what child should do and standard which send appropriate 
signal to child. For example pg_ctl could use extended communication to 
better postmaster controlling.


The problem isn't in the signaling between external tools like pg_ctl 
and postmaster, but the signaling between postmaster and the child 
processes.


Signal multiplexing would help by releasing some signals, but to kill a 
child process that can be executing an external command with system(3), 
we'd still want to use a signal that does the right thing for external 
commands, per usual Unix semantics. Also, the archiver process currently 
detaches itself from shared memory at start, so using shared memory 
doesn't seem like an improvement.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-03 Thread Fujii Masao
Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:

 On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 I'm leaning towards option 3, but I wonder if anyone sees a better
 solution.

 4. Use the shared memory to tell the startup process about the shutdown
 state.
 When a shutdown signal arrives, postmaster sets the corresponding shutdown
 state to the shared memory before signaling to the child processes. The
 startup
 process check the shutdown state whenever executing system(), and
 determine
 how to exit according to that state. This solution doesn't change any
 existing
 behavior of pg_standby. What is your opinion?

 That would only solve the problem for pg_standby. Other programs you might
 use as a restore_command or archive_command like cp or rsync would still
 core dump on the SIGQUIT.

Right. I've just understood your intention. I also agree with option 3 if nobody
complains about lack of backward compatibility of pg_standby. If no, how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-02 Thread Heikki Linnakangas

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

I'm leaning towards option 3, but I wonder if anyone sees a better solution.


4. Use the shared memory to tell the startup process about the shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The startup
process check the shutdown state whenever executing system(), and determine
how to exit according to that state. This solution doesn't change any existing
behavior of pg_standby. What is your opinion?


That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like cp or rsync 
would still core dump on the SIGQUIT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-02 Thread ITAGAKI Takahiro

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:

 1. Implement a custom version of system(3) using fork+exec that let's us 
 trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It 
 might be a bit tricky to get this right in a portable way; Windows would 
 certainly need a completely separate implementation.

I think the custom system() approach is the most ideal plan for us because
it could open the door for faster recovery; If there were an asynchronous
version of system(), startup process could parallelly execute both
restoring archived wal files and redoing operations in them.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-03-01 Thread Fujii Masao
Hi,

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 We're using SIGQUIT to signal immediate shutdown request. Upon receiving
 SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT and
 exits.

 This is a problem when child processes use system(3) to call other programs.
 We use system(3) in two places: to execute archive_command and
 restore_command. Fujii Masao identified this with pg_standby back in
 November:

 http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d...@mail.gmail.com
 and recently discussed here
 http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa...@mail.gmail.com

 I'm starting a new thread to bring this to attention of those who haven't
 been following the hot standby stuff. pg_standby has a particular problem
 because it traps SIGQUIT to mean end recovery, promote standby to master,
 which it shouldn't do IMHO. But ignoring that for a moment, the problem is
 generic.

 SIGQUIT by default dumps core. That's not what we want to happen on
 immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
 immediately instead, but external commands will dump core. system(3) ignores
 SIGQUIT, so we can't trap it in the parent process; it is always relayed to
 the child.

 There's a few options on how to fix that:

 1. Implement a custom version of system(3) using fork+exec that let's us
 trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It might
 be a bit tricky to get this right in a portable way; Windows would certainly
 need a completely separate implementation.

 2. Use a signal other than SIGQUIT for immediate shutdown of child
 processes. We can't change the signal sent to postmaster for
 backwards-compatibility reasons, but the signal sent by postmaster to child
 processes we could change. We've already used all signals in normal
 backends, but perhaps we could rearrange them.

 3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
 processes that use system(3): the archiver process and the startup process.
 Neither of them use SIGINT currently. SIGINT is ignored by system(3), like
 SIGQUIT, but the default action is to terminate the process rather than core
 dump. Unfortunately pg_standby traps SIGINT too to mean promote to master,
 but we could change it to use SIGUSR1 instead for that purpose. If someone
 has a script that uses killall -INT pg_standby to promote a standby server
 to master, it would need to be changed. Looking at the manual page of
 pg_standby, however, it seems that the kill-method of triggering a promotion
 isn't documented, so with a notice in release notes we could do that.

 I'm leaning towards option 3, but I wonder if anyone sees a better solution.

4. Use the shared memory to tell the startup process about the shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The startup
process check the shutdown state whenever executing system(), and determine
how to exit according to that state. This solution doesn't change any existing
behavior of pg_standby. What is your opinion?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Heikki Linnakangas
We're using SIGQUIT to signal immediate shutdown request. Upon receiving 
SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT 
and exits.


This is a problem when child processes use system(3) to call other 
programs. We use system(3) in two places: to execute archive_command and 
restore_command. Fujii Masao identified this with pg_standby back in 
November:


http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d...@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa...@mail.gmail.com

I'm starting a new thread to bring this to attention of those who 
haven't been following the hot standby stuff. pg_standby has a 
particular problem because it traps SIGQUIT to mean end recovery, 
promote standby to master, which it shouldn't do IMHO. But ignoring 
that for a moment, the problem is generic.


SIGQUIT by default dumps core. That's not what we want to happen on 
immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit 
immediately instead, but external commands will dump core. system(3) 
ignores SIGQUIT, so we can't trap it in the parent process; it is always 
relayed to the child.


There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us 
trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It 
might be a bit tricky to get this right in a portable way; Windows would 
certainly need a completely separate implementation.


2. Use a signal other than SIGQUIT for immediate shutdown of child 
processes. We can't change the signal sent to postmaster for 
backwards-compatibility reasons, but the signal sent by postmaster to 
child processes we could change. We've already used all signals in 
normal backends, but perhaps we could rearrange them.


3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child 
processes that use system(3): the archiver process and the startup 
process. Neither of them use SIGINT currently. SIGINT is ignored by 
system(3), like SIGQUIT, but the default action is to terminate the 
process rather than core dump. Unfortunately pg_standby traps SIGINT too 
to mean promote to master, but we could change it to use SIGUSR1 
instead for that purpose. If someone has a script that uses killall 
-INT pg_standby to promote a standby server to master, it would need to 
be changed. Looking at the manual page of pg_standby, however, it seems 
that the kill-method of triggering a promotion isn't documented, so with 
a notice in release notes we could do that.


I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just 
remove the signal handler for SIGQUIT from pg_standby and leave it at 
that. If you perform an immediate shutdown, you can get a core dump from 
archive_command or restore_command, but that's a minor inconvenience.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Greg Stark
On Fri, Feb 27, 2009 at 9:52 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 2. Use a signal other than SIGQUIT for immediate shutdown of child
 processes. We can't change the signal sent to postmaster for
 backwards-compatibility reasons, but the signal sent by postmaster to child
 processes we could change. We've already used all signals in normal
 backends, but perhaps we could rearrange them.

This isn't the first time we've run into the problem that we've run
out of signals. I think we need to multiplex all our event signals
onto a single signal and use some other mechanism to indicate the type
of message.

Perhaps we do need two signals though, so subprocesses don't need to
connect to shared memory to distinguish exit now from other events.
SIGINT for exit now and USR1 for every postgres-internal signal
using shared memory to determine the meaning sounds like the most
logical arrangement to me.

Do we really need a promote to master message at all? Is pg_standby
responsible for this or could the master write out the configuration
changes necessary itself?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Heikki Linnakangas

Greg Stark wrote:

This isn't the first time we've run into the problem that we've run
out of signals. I think we need to multiplex all our event signals
onto a single signal and use some other mechanism to indicate the type
of message.


Yeah. A patch to do that was discussed a while ago, as Fujii's 
synchronous replication patch bumped into that as well. I don't feel 
like changing the signaling so dramatically right now, however.



Do we really need a promote to master message at all? Is pg_standby
responsible for this or could the master write out the configuration
changes necessary itself?


The way pg_standby works is that it keeps waiting for new WAL files to 
arrive, until it's told to stop and return a non-zero exit code. 
Non-zero exit code from restore_command basically means file not 
found, making the startup process to end recovery and start up the 
database. There's two ways to tell pg_standby to stop: create a trigger 
file with a particular name, or signal it with SIGINT or SIGQUIT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Greg Stark wrote:
 This isn't the first time we've run into the problem that we've run
 out of signals. I think we need to multiplex all our event signals
 onto a single signal and use some other mechanism to indicate the type
 of message.

 Yeah. A patch to do that was discussed a while ago, as Fujii's 
 synchronous replication patch bumped into that as well. I don't feel 
 like changing the signaling so dramatically right now, however.

It's not really a feasible answer anyway for auxiliary processes that
have no need to be connected to shared memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers