subject:"\[HACKERS\] Immediate shutdown and system\(3\)"

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Andrew Dunstan




Heikki Linnakangas wrote:
Ok, I've committed a minimal patch to pg_standby in CVS HEAD and 
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I 
added a signal handler for SIGUSR1 to trigger failover; that should be 
considered the preferred signal for that, even though SIGINT still 
works too.


SIGQUIT is trapped to just die immediately, but without core dumping. 
As we still use SIGQUIT for immediate shutdown, any other 
archive_command or restore_command will still receive SIGQUIT on 
immediate shutdown, and by default dump core. Let's just live with 
that for now..


This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or 
SIGINT instead. Where should I make a note of that so that we don't 
forget?





Unless I'm missing it the use of signals to trigger failover is not 
documented AT ALL. So why anyone would expect such behaviour is 
something of a mystery.


Perhaps doing that would be even more important than release notes.

cheers

andrew
-
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Robert Haas

On Wed, Mar 18, 2009 at 5:12 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian  wrote:
>>> The CVS commit message.
>
>> Is there some reason we don't just put it in the release notes as
>> *part* of the commit?  Someone can always go back and edit it later.
>
> That was suggested before, and I think we actually tried it for a few
> months.  It didn't work.
>
> Putting an item in the release notes *properly* is a whole lot more
> work than putting a short bit of text in the CVS log (especially for
> committers whose first language isn't English).  It would also
> create a lot more merge-collision issues for unrelated patches.

Yeah, I wouldn't ask people to include it in the patches they post.
That would be a pain, and people would probably tend (with the best of
intentions) to inflate the relative importance of their own work.  I
was thinking that the committer could make a quick entry at the time
they actually committed the patch, so that the step you describe below
could start with something other than an email box.

> It's less trouble overall to do the editing, organizing, and SGML-ifying
> of all the release notes at once.  Also you end up with a better
> product, assuming that whoever is doing the notes puts in reasonable
> editorial effort.

If it works for the people who are doing it, good enough.

...Robert
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Tom Lane

Robert Haas  writes:
> On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian  wrote:
>> The CVS commit message.

> Is there some reason we don't just put it in the release notes as
> *part* of the commit?  Someone can always go back and edit it later.

That was suggested before, and I think we actually tried it for a few
months.  It didn't work.

Putting an item in the release notes *properly* is a whole lot more
work than putting a short bit of text in the CVS log (especially for
committers whose first language isn't English).  It would also
create a lot more merge-collision issues for unrelated patches.

It's less trouble overall to do the editing, organizing, and SGML-ifying
of all the release notes at once.  Also you end up with a better
product, assuming that whoever is doing the notes puts in reasonable
editorial effort.

regards, tom lane
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Robert Haas

On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian  wrote:
> Heikki Linnakangas wrote:
>> This should be mentioned in release notes, as any script that might be
>> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
>> instead. Where should I make a note of that so that we don't forget?
>
> The CVS commit message.

Is there some reason we don't just put it in the release notes as
*part* of the commit?  Someone can always go back and edit it later.
It seems like that would be easier and less error-prone than grepping
the CVS commit logs for "release notes"...

...Robert
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Bruce Momjian

Heikki Linnakangas wrote:
> This should be mentioned in release notes, as any script that might be 
> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT 
> instead. Where should I make a note of that so that we don't forget?

The CVS commit message.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Heikki Linnakangas


Andrew Dunstan wrote:

Heikki Linnakangas wrote:
This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or 
SIGINT instead. Where should I make a note of that so that we don't 
forget?


Unless I'm missing it the use of signals to trigger failover is not 
documented AT ALL. So why anyone would expect such behaviour is 
something of a mystery.


Well, some people do read source code. If it was more widely known, I 
would hesitate more to change it, though.



Perhaps doing that would be even more important than release notes.


Agreed it should be documented.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
-
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-18 Thread Heikki Linnakangas

Ok, I've committed a minimal patch to pg_standby in CVS HEAD and 
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I added 
a signal handler for SIGUSR1 to trigger failover; that should be 
considered the preferred signal for that, even though SIGINT still works 
too.


SIGQUIT is trapped to just die immediately, but without core dumping. As 
we still use SIGQUIT for immediate shutdown, any other archive_command 
or restore_command will still receive SIGQUIT on immediate shutdown, and 
by default dump core. Let's just live with that for now..


This should be mentioned in release notes, as any script that might be 
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT 
instead. Where should I make a note of that so that we don't forget?


Heikki Linnakangas wrote:

Fujii Masao wrote:

Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
 wrote:

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 wrote:

I'm leaning towards option 3, but I wonder if anyone sees a better
solution.

4. Use the shared memory to tell the startup process about the shutdown
state.
When a shutdown signal arrives, postmaster sets the corresponding 
shutdown

state to the shared memory before signaling to the child processes. The
startup
process check the shutdown state whenever executing system(), and
determine
how to exit according to that state. This solution doesn't change any
existing
behavior of pg_standby. What is your opinion?
That would only solve the problem for pg_standby. Other programs you 
might
use as a restore_command or archive_command like "cp" or "rsync" 
would still

core dump on the SIGQUIT.


Right. I've just understood your intention. I also agree with option 3 
if nobody
complains about lack of backward compatibility of pg_standby. If no, 
how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the 
archiver

and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?


Hmm, the startup/archiver process would then in turn need to kill the 
external command with SIGINT. I guess that would work.


There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
return code. The startup process won't recognize that as "killed by 
signal", and we're back to same problem we have with pg_standby that the 
startup process doesn't die but continues with the startup. Notably 
rsync seems to behave like that.


BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
reimplementing system(3). The conclusion of that thread was the usage of 
setsid() and process groups, to ensure that the SIGQUIT is delivered to 
the archive/recovery_command.


I'm starting to feel that this is getting too complicated. Maybe we 
should just fix pg_standby to not trap SIGQUIT, and live with the core 
dumps...



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
-
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-04 Thread Heikki Linnakangas


Fujii Masao wrote:

Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
 wrote:

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 wrote:

I'm leaning towards option 3, but I wonder if anyone sees a better
solution.

4. Use the shared memory to tell the startup process about the shutdown
state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The
startup
process check the shutdown state whenever executing system(), and
determine
how to exit according to that state. This solution doesn't change any
existing
behavior of pg_standby. What is your opinion?

That would only solve the problem for pg_standby. Other programs you might
use as a restore_command or archive_command like "cp" or "rsync" would still
core dump on the SIGQUIT.


Right. I've just understood your intention. I also agree with option 3 if nobody
complains about lack of backward compatibility of pg_standby. If no, how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?


Hmm, the startup/archiver process would then in turn need to kill the 
external command with SIGINT. I guess that would work.


There's a problem with my idea of just using SIGINT instead of SIGQUIT. 
Some (arguably bad-behaving) programs trap SIGINT and exit() with a 
return code. The startup process won't recognize that as "killed by 
signal", and we're back to same problem we have with pg_standby that the 
startup process doesn't die but continues with the startup. Notably 
rsync seems to behave like that.


BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of 
reimplementing system(3). The conclusion of that thread was the usage of 
setsid() and process groups, to ensure that the SIGQUIT is delivered to 
the archive/recovery_command.


I'm starting to feel that this is getting too complicated. Maybe we 
should just fix pg_standby to not trap SIGQUIT, and live with the core 
dumps...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-04 Thread Heikki Linnakangas

Per discussion, here's a patch for pg_standby in REL8_3_STABLE. The 
signal handling is changed so that SIGQUIT no longer triggers failover, 
but immediately kills pg_standby, triggering FATAL death of the startup 
process too. That's what you want with immediate shutdown.


SIGUSR1 is now accepted as a signal to trigger failover. SIGINT is still 
accepted too, but that should be considered deprecated since we're 
likely to use SIGINT for immediate shutdown (for startup process) in 8.4.


We should document the use of signals to trigger failover in the 
manual... Any volunteers?


This should be noted in the release notes:

If you are using pg_standby, and if you are using signals (e.g "killall 
-SIGINT pg_standby") to trigger failover, change your scripts to use 
SIGUSR1 instead of SIGQUIT or SIGINT. SIGQUIT no longer triggers 
failover, but aborts the recovery and shuts down the standby database. 
SIGINT is still accepted as failover trigger, but should be considered 
as deprecated and will also be changed to trigger immediate shutdown in 
a future release.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: pg_standby.c
===
RCS file: /cvsroot/pgsql/contrib/pg_standby/pg_standby.c,v
retrieving revision 1.10.2.3
diff -c -r1.10.2.3 pg_standby.c
*** pg_standby.c	6 Jan 2009 17:27:19 -	1.10.2.3
--- pg_standby.c	4 Mar 2009 09:13:34 -
***
*** 451,464 
  	signaled = true;
  }
  
  /* MAIN */
  int
  main(int argc, char **argv)
  {
  	int			c;
  
! 	(void) signal(SIGINT, sighandler);
! 	(void) signal(SIGQUIT, sighandler);
  
  	while ((c = getopt(argc, argv, "cdk:lr:s:t:w:")) != -1)
  	{
--- 451,487 
  	signaled = true;
  }
  
+ /* We don't want SIGQUIT to core dump */
+ static void
+ sigquit_handler(int sig)
+ {
+ 	signal(SIGINT, SIG_DFL);
+ 	kill(getpid(), SIGINT);
+ }
+ 
+ 
  /* MAIN */
  int
  main(int argc, char **argv)
  {
  	int			c;
  
! 	/*
! 	 * You can send SIGUSR1 to trigger failover.
! 	 *
! 	 * Postmaster uses SIGQUIT to request immediate shutdown. The default
! 	 * action is to core dump, but we don't want that, so trap it and
! 	 * commit suicide without core dump.
! 	 *
! 	 * We used to use SIGINT and SIGQUIT to trigger failover, but that
! 	 * turned out to be a bad idea because postmaster uses SIGQUIT to
! 	 * request immediate shutdown. We still trap SIGINT, but that is
! 	 * deprecated. We will likely switch to using SIGINT for immediate
! 	 * shutdown in future releases.
! 	 */
! 	(void) signal(SIGUSR1, sighandler);
! 	(void) signal(SIGINT, sighandler); /* deprecated, use SIGUSR1 */
! 	(void) signal(SIGQUIT, sigquit_handler);
  
  	while ((c = getopt(argc, argv, "cdk:lr:s:t:w:")) != -1)
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-03 Thread Fujii Masao

Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
 wrote:
> Fujii Masao wrote:
>>
>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>  wrote:
>>>
>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>> solution.
>>
>> 4. Use the shared memory to tell the startup process about the shutdown
>> state.
>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>> state to the shared memory before signaling to the child processes. The
>> startup
>> process check the shutdown state whenever executing system(), and
>> determine
>> how to exit according to that state. This solution doesn't change any
>> existing
>> behavior of pg_standby. What is your opinion?
>
> That would only solve the problem for pg_standby. Other programs you might
> use as a restore_command or archive_command like "cp" or "rsync" would still
> core dump on the SIGQUIT.

Right. I've just understood your intention. I also agree with option 3 if nobody
complains about lack of backward compatibility of pg_standby. If no, how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-03 Thread Heikki Linnakangas


Zdenek Kotala wrote:

Dne  2.03.09 08:59, Heikki Linnakangas napsal(a):

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 wrote:
I'm leaning towards option 3, but I wonder if anyone sees a better 
solution.


4. Use the shared memory to tell the startup process about the 
shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding 
shutdown
state to the shared memory before signaling to the child processes. 
The startup
process check the shutdown state whenever executing system(), and 
determine
how to exit according to that state. This solution doesn't change any 
existing

behavior of pg_standby. What is your opinion?


That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like "cp" or "rsync" 
would still core dump on the SIGQUIT.




I think that we could have two methods. Extended method will use share 
memory to say what child should do and standard which send appropriate 
signal to child. For example pg_ctl could use extended communication to 
better postmaster controlling.


The problem isn't in the signaling between external tools like pg_ctl 
and postmaster, but the signaling between postmaster and the child 
processes.


Signal multiplexing would help by releasing some signals, but to kill a 
child process that can be executing an external command with system(3), 
we'd still want to use a signal that does the right thing for external 
commands, per usual Unix semantics. Also, the archiver process currently 
detaches itself from shared memory at start, so using shared memory 
doesn't seem like an improvement.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-03 Thread Zdenek Kotala


Dne  2.03.09 08:59, Heikki Linnakangas napsal(a):

Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 wrote:
I'm leaning towards option 3, but I wonder if anyone sees a better 
solution.


4. Use the shared memory to tell the startup process about the 
shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding 
shutdown
state to the shared memory before signaling to the child processes. 
The startup
process check the shutdown state whenever executing system(), and 
determine
how to exit according to that state. This solution doesn't change any 
existing

behavior of pg_standby. What is your opinion?


That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like "cp" or "rsync" 
would still core dump on the SIGQUIT.




I think that we could have two methods. Extended method will use share 
memory to say what child should do and standard which send appropriate 
signal to child. For example pg_ctl could use extended communication to 
better postmaster controlling.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-02 Thread ITAGAKI Takahiro

Heikki Linnakangas  wrote:

> 1. Implement a custom version of system(3) using fork+exec that let's us 
> trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It 
> might be a bit tricky to get this right in a portable way; Windows would 
> certainly need a completely separate implementation.

I think the custom system() approach is the most ideal plan for us because
it could open the door for faster recovery; If there were an asynchronous
version of system(), startup process could parallelly execute both
restoring archived wal files and redoing operations in them.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-02 Thread Heikki Linnakangas


Fujii Masao wrote:

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 wrote:

I'm leaning towards option 3, but I wonder if anyone sees a better solution.


4. Use the shared memory to tell the startup process about the shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The startup
process check the shutdown state whenever executing system(), and determine
how to exit according to that state. This solution doesn't change any existing
behavior of pg_standby. What is your opinion?


That would only solve the problem for pg_standby. Other programs you 
might use as a restore_command or archive_command like "cp" or "rsync" 
would still core dump on the SIGQUIT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-03-01 Thread Fujii Masao

Hi,

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
 wrote:
> We're using SIGQUIT to signal immediate shutdown request. Upon receiving
> SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT and
> exits.
>
> This is a problem when child processes use system(3) to call other programs.
> We use system(3) in two places: to execute archive_command and
> restore_command. Fujii Masao identified this with pg_standby back in
> November:
>
> http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d...@mail.gmail.com
> and recently discussed here
> http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa...@mail.gmail.com
>
> I'm starting a new thread to bring this to attention of those who haven't
> been following the hot standby stuff. pg_standby has a particular problem
> because it traps SIGQUIT to mean "end recovery, promote standby to master",
> which it shouldn't do IMHO. But ignoring that for a moment, the problem is
> generic.
>
> SIGQUIT by default dumps core. That's not what we want to happen on
> immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
> immediately instead, but external commands will dump core. system(3) ignores
> SIGQUIT, so we can't trap it in the parent process; it is always relayed to
> the child.
>
> There's a few options on how to fix that:
>
> 1. Implement a custom version of system(3) using fork+exec that let's us
> trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It might
> be a bit tricky to get this right in a portable way; Windows would certainly
> need a completely separate implementation.
>
> 2. Use a signal other than SIGQUIT for immediate shutdown of child
> processes. We can't change the signal sent to postmaster for
> backwards-compatibility reasons, but the signal sent by postmaster to child
> processes we could change. We've already used all signals in normal
> backends, but perhaps we could rearrange them.
>
> 3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
> processes that use system(3): the archiver process and the startup process.
> Neither of them use SIGINT currently. SIGINT is ignored by system(3), like
> SIGQUIT, but the default action is to terminate the process rather than core
> dump. Unfortunately pg_standby traps SIGINT too to mean "promote to master",
> but we could change it to use SIGUSR1 instead for that purpose. If someone
> has a script that uses "killall -INT pg_standby" to promote a standby server
> to master, it would need to be changed. Looking at the manual page of
> pg_standby, however, it seems that the kill-method of triggering a promotion
> isn't documented, so with a notice in release notes we could do that.
>
> I'm leaning towards option 3, but I wonder if anyone sees a better solution.

4. Use the shared memory to tell the startup process about the shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The startup
process check the shutdown state whenever executing system(), and determine
how to exit according to that state. This solution doesn't change any existing
behavior of pg_standby. What is your opinion?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Tom Lane

Heikki Linnakangas  writes:
> Greg Stark wrote:
>> This isn't the first time we've run into the problem that we've run
>> out of signals. I think we need to multiplex all our event signals
>> onto a single signal and use some other mechanism to indicate the type
>> of message.

> Yeah. A patch to do that was discussed a while ago, as Fujii's 
> synchronous replication patch bumped into that as well. I don't feel 
> like changing the signaling so dramatically right now, however.

It's not really a feasible answer anyway for auxiliary processes that
have no need to be connected to shared memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Heikki Linnakangas


Greg Stark wrote:

This isn't the first time we've run into the problem that we've run
out of signals. I think we need to multiplex all our event signals
onto a single signal and use some other mechanism to indicate the type
of message.


Yeah. A patch to do that was discussed a while ago, as Fujii's 
synchronous replication patch bumped into that as well. I don't feel 
like changing the signaling so dramatically right now, however.



Do we really need a "promote to master" message at all? Is pg_standby
responsible for this or could the master write out the configuration
changes necessary itself?


The way pg_standby works is that it keeps waiting for new WAL files to 
arrive, until it's told to stop and return a non-zero exit code. 
Non-zero exit code from restore_command basically means "file not 
found", making the startup process to end recovery and start up the 
database. There's two ways to tell pg_standby to stop: create a trigger 
file with a particular name, or signal it with SIGINT or SIGQUIT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Greg Stark

On Fri, Feb 27, 2009 at 9:52 AM, Heikki Linnakangas
 wrote:
>
> 2. Use a signal other than SIGQUIT for immediate shutdown of child
> processes. We can't change the signal sent to postmaster for
> backwards-compatibility reasons, but the signal sent by postmaster to child
> processes we could change. We've already used all signals in normal
> backends, but perhaps we could rearrange them.

This isn't the first time we've run into the problem that we've run
out of signals. I think we need to multiplex all our event signals
onto a single signal and use some other mechanism to indicate the type
of message.

Perhaps we do need two signals though, so subprocesses don't need to
connect to shared memory to distinguish "exit now" from other events.
SIGINT for "exit now" and USR1 for every postgres-internal signal
using shared memory to determine the meaning sounds like the most
logical arrangement to me.

Do we really need a "promote to master" message at all? Is pg_standby
responsible for this or could the master write out the configuration
changes necessary itself?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Immediate shutdown and system(3)

2009-02-27 Thread Heikki Linnakangas

We're using SIGQUIT to signal immediate shutdown request. Upon receiving
SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT
and exits.

This is a problem when child processes use system(3) to call other
programs. We use system(3) in two places: to execute archive_command and
restore_command. Fujii Masao identified this with pg_standby back in
November:

http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d...@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa...@mail.gmail.com

I'm starting a new thread to bring this to attention of those who
haven't been following the hot standby stuff. pg_standby has a
particular problem because it traps SIGQUIT to mean "end recovery,
promote standby to master", which it shouldn't do IMHO. But ignoring
that for a moment, the problem is generic.

SIGQUIT by default dumps core. That's not what we want to happen on
immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
immediately instead, but external commands will dump core. system(3)
ignores SIGQUIT, so we can't trap it in the parent process; it is always
relayed to the child.

There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us
trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It
might be a bit tricky to get this right in a portable way; Windows would
certainly need a completely separate implementation.

2. Use a signal other than SIGQUIT for immediate shutdown of child
processes. We can't change the signal sent to postmaster for
backwards-compatibility reasons, but the signal sent by postmaster to
child processes we could change. We've already used all signals in
normal backends, but perhaps we could rearrange them.

3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
processes that use system(3): the archiver process and the startup
process. Neither of them use SIGINT currently. SIGINT is ignored by
system(3), like SIGQUIT, but the default action is to terminate the
process rather than core dump. Unfortunately pg_standby traps SIGINT too
to mean "promote to master", but we could change it to use SIGUSR1
instead for that purpose. If someone has a script that uses "killall
-INT pg_standby" to promote a standby server to master, it would need to
be changed. Looking at the manual page of pg_standby, however, it seems
that the kill-method of triggering a promotion isn't documented, so with
a notice in release notes we could do that.

I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just
remove the signal handler for SIGQUIT from pg_standby and leave it at
that. If you perform an immediate shutdown, you can get a core dump from
archive_command or restore_command, but that's a minor inconvenience.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

Re: [HACKERS] Immediate shutdown and system(3)

[HACKERS] Immediate shutdown and system(3)

19 matches

Site Navigation

Mail list logo

Footer information