Re: We shouldn't signal process groups with SIGQUIT

2023-03-01 Thread Thomas Munro
On Thu, Mar 2, 2023 at 1:09 PM Andres Freund  wrote:
> On 2023-03-02 12:29:28 +1300, Thomas Munro wrote:
> > ... Huh... what am I missing?  I
> > thought the only risk was handlers running in the opposite of send
> > order because they 'overlapped', not non-handler code being allowed to
> > run in between.
>
> I see ProcessInterrupts() being called too - but it's independent of the
> changes we discuss here.  The reason for it is the CFI() at the end of
> errfinish().

Ahh, right, I see.




Re: We shouldn't signal process groups with SIGQUIT

2023-03-01 Thread Michael Paquier
On Wed, Mar 01, 2023 at 03:34:30PM -0800, Andres Freund wrote:
> On 2023-02-28 13:45:41 +0900, Michael Paquier wrote:
>> From what I can see, SIGTERM is actually received by the backends
>> before SIGQUIT, and I can also see that the backends have enough room
>> to process CFIs in some cases, especially short queries, even before 
>> reaching quickdie() and its exit().  So the window between SIGTERM and
>> SIGQUIT is not as long as one would think.
> 
> What do you mean with the last ssentence? Why would one think that the window
> between them is long? Do you mean that it's not as short?

That should have been worded as "short".  In what I looked at, both
signal handlers are processed in the same millisecond, still the
backend can have time to process a full CFI between the SIGTERM and
SIGQUIT handlers, before the SIGQUIT handler has the time to exit().
--
Michael


signature.asc
Description: PGP signature


Re: We shouldn't signal process groups with SIGQUIT

2023-03-01 Thread Andres Freund
Hi,

On 2023-03-02 12:29:28 +1300, Thomas Munro wrote:
> In theory you could straighten this out by asking what else is pending
> so that we imposed our own priority, if that were a problem, but there
> is something I don't understand: you said we could handle SIGTERM and
> then make it all the way to CFI() (= non-signal handler code) before
> handling a SIGQUIT that was sent first.  Huh... what am I missing?  I
> thought the only risk was handlers running in the opposite of send
> order because they 'overlapped', not non-handler code being allowed to
> run in between.

I see ProcessInterrupts() being called too - but it's independent of the
changes we discuss here.  The reason for it is the CFI() at the end of
errfinish().

Note that ProcessInterrupts() immediately returns, due to the
HOLD_INTERRUPTS() at the start of quickdie().

FWIW, here's the strace output of a backend, enriched with a few debug
write()s.

epoll_wait(5, 0x55b25764fd70, 1, -1)= -1 EINTR (Interrupted system call)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=759211, si_uid=1000} ---
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=759211, si_uid=1000} ---
write(2, "start die\n", 10) = 10
kill(759218, SIGURG)= 0
write(2, "end die\n", 8)= 8
rt_sigreturn({mask=[QUIT URG]}) = 0
write(2, "start quickdie\n", 15)= 15
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], 
NULL, 8) = 0
sendto(10, "N\0\0\0tSWARNING\0VWARNING\0C57P01\0Mt"..., 117, 0, NULL, 0) = 117
write(2, "ProcessInterrupts\n", 18) = 18
write(2, "ProcessInterrupts held off\n", 27) = 27
write(2, "end quickdie\n", 13)  = 13
exit_group(2)   = ?
+++ exited with 2 +++


We do way too many non-signal safe things in quickdie(). But I'm not sure what
the alternative is, given we probably do want to send something to the client.

Greetings,

Andres Freund




Re: We shouldn't signal process groups with SIGQUIT

2023-03-01 Thread Andres Freund
Hi,

On 2023-02-28 13:45:41 +0900, Michael Paquier wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Just naively hacking this behaviour change into the current code, would 
> > yield
> > sending SIGQUIT to postgres, and then SIGTERM to the whole process
> > group. Which seems like a reasonable order?  quickdie() should _exit()
> > immediately in the signal handler, so we shouldn't get to processing the
> > SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> > with SIGTERM being processed first, the SIGQUIT handler should be executed
> > long before the next CFI().
> 
> I have been poking a bit at that, and did a change as simple as this
> one in signal_child():
>  #ifdef HAVE_SETSID
> +   if (signal == SIGQUIT)
> +   signal = SIGTERM;

FWIW, one thing that kept me from actually proposing a patch is that I thought
it might be useful to write a test for this, but that I didn't yet have the
cycles to look into that.


> From what I can see, SIGTERM is actually received by the backends
> before SIGQUIT, and I can also see that the backends have enough room
> to process CFIs in some cases, especially short queries, even before 
> reaching quickdie() and its exit().  So the window between SIGTERM and
> SIGQUIT is not as long as one would think.

What do you mean with the last ssentence? Why would one think that the window
between them is long? Do you mean that it's not as short?

Greetings,

Andres Freund




Re: We shouldn't signal process groups with SIGQUIT

2023-03-01 Thread Thomas Munro
On Tue, Feb 28, 2023 at 5:45 PM Michael Paquier  wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Just naively hacking this behaviour change into the current code, would 
> > yield
> > sending SIGQUIT to postgres, and then SIGTERM to the whole process
> > group. Which seems like a reasonable order?  quickdie() should _exit()
> > immediately in the signal handler, so we shouldn't get to processing the
> > SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> > with SIGTERM being processed first, the SIGQUIT handler should be executed
> > long before the next CFI().
>
> I have been poking a bit at that, and did a change as simple as this
> one in signal_child():
>  #ifdef HAVE_SETSID
> +   if (signal == SIGQUIT)
> +   signal = SIGTERM;
>
> From what I can see, SIGTERM is actually received by the backends
> before SIGQUIT, and I can also see that the backends have enough room
> to process CFIs in some cases, especially short queries, even before
> reaching quickdie() and its exit().  So the window between SIGTERM and
> SIGQUIT is not as long as one would think.

Pop quiz: in what order do signal handlers run, if SIGQUIT and SIGTERM
are both pending when a process wakes up or unblocks?  I *think* the
answer on all typical implementation that follow conventions going
back to ancient Unix (but not standardised, so you can't count on
it!*), is that pending signals are delivered in order of the bits in
the pending signals bitmap from lowest to highest, and SIGQUIT <
SIGTERM (again: tradition, not standard), and then:

1.  If the handlers block each other via their sa_mask so that they
are serialised (note: ours don't) then you'll see the SIGQUIT handler
run and then the SIGTERM handler, for example if you do kill(self,
SIGTERM), kill(self, SIGQUIT), sigprocmask(SIG_SETMASK, _all,
NULL).

2.  If the handlers don't block each other (our case), then their
stack frames will be set up in that order (you might say they start in
that order but are immediately interrupted by the next one before they
can do anything), so they then run in the reverse order, SIGTERM
first.  I guess that is what you saw?

In theory you could straighten this out by asking what else is pending
so that we imposed our own priority, if that were a problem, but there
is something I don't understand: you said we could handle SIGTERM and
then make it all the way to CFI() (= non-signal handler code) before
handling a SIGQUIT that was sent first.  Huh... what am I missing?  I
thought the only risk was handlers running in the opposite of send
order because they 'overlapped', not non-handler code being allowed to
run in between.

*The standard explicitly says that delivery order is unspecified,
except for realtime signals which are aren't using.




Re: We shouldn't signal process groups with SIGQUIT

2023-02-27 Thread Michael Paquier
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Just naively hacking this behaviour change into the current code, would yield
> sending SIGQUIT to postgres, and then SIGTERM to the whole process
> group. Which seems like a reasonable order?  quickdie() should _exit()
> immediately in the signal handler, so we shouldn't get to processing the
> SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> with SIGTERM being processed first, the SIGQUIT handler should be executed
> long before the next CFI().

I have been poking a bit at that, and did a change as simple as this
one in signal_child():
 #ifdef HAVE_SETSID
+   if (signal == SIGQUIT)
+   signal = SIGTERM;

From what I can see, SIGTERM is actually received by the backends
before SIGQUIT, and I can also see that the backends have enough room
to process CFIs in some cases, especially short queries, even before 
reaching quickdie() and its exit().  So the window between SIGTERM and
SIGQUIT is not as long as one would think.
--
Michael


signature.asc
Description: PGP signature


Re: We shouldn't signal process groups with SIGQUIT

2023-02-22 Thread Michael Paquier
On Wed, Feb 22, 2023 at 09:39:55AM -0500, Tom Lane wrote:
> Michael Paquier  writes:
>> What would be the advantage of doing that for groups other than
>> -StartupPID and -PgArchPID?  These are the two groups of processes we
>> need to worry about, AFAIK.
> 
> No, we have the issue for regular backends too, since they could be
> executing COPY FROM PROGRAM or the like (not to mention that functions
> in plperlu, plpythonu, etc could spawn child processes).

Indeed, right.  I completely forgot about these cases.
--
Michael


signature.asc
Description: PGP signature


Re: We shouldn't signal process groups with SIGQUIT

2023-02-22 Thread Tom Lane
Michael Paquier  writes:
> What would be the advantage of doing that for groups other than
> -StartupPID and -PgArchPID?  These are the two groups of processes we
> need to worry about, AFAIK.

No, we have the issue for regular backends too, since they could be
executing COPY FROM PROGRAM or the like (not to mention that functions
in plperlu, plpythonu, etc could spawn child processes).

regards, tom lane




Re: We shouldn't signal process groups with SIGQUIT

2023-02-21 Thread Michael Paquier
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Just naively hacking this behaviour change into the current code, would yield
> sending SIGQUIT to postgres, and then SIGTERM to the whole process
> group. Which seems like a reasonable order?  quickdie() should _exit()
> immediately in the signal handler, so we shouldn't get to processing the
> SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> with SIGTERM being processed first, the SIGQUIT handler should be executed
> long before the next CFI().

I can see the sense in this argument and this order should work, still
adding more complication for the backends does not sound that
appealing to me.

What would be the advantage of doing that for groups other than
-StartupPID and -PgArchPID?  These are the two groups of processes we
need to worry about, AFAIK.
--
Michael


signature.asc
Description: PGP signature


Re: We shouldn't signal process groups with SIGQUIT

2023-02-15 Thread Nathan Bossart
On Wed, Feb 15, 2023 at 10:12:58AM -0800, Andres Freund wrote:
> On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote:
>> Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?
> 
> I think it'd be too noisy. Right now you get just a core dump of the crashed
> process, but if we set send_abort_for_crash we'd end up with a lot of core
> dumps, making it harder to know what to look at.
> 
> We should never need the send_abort_for_kill path, so I don't think the noise
> issue applies to the same degree.

Makes sense.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: We shouldn't signal process groups with SIGQUIT

2023-02-15 Thread Andres Freund
Hi,

On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote:
> On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote:
> > On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
> >> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> >> > Not really related: I do wonder how often we end up self deadlocking in
> >> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it 
> >> > soon
> >> > after, due to postmasters SIGKILL.  Perhaps we should turn on
> >> > send_abort_for_kill on CI?
> >> 
> >> +1, this seems like it'd be useful in general.  I'm guessing this will
> >> require a bit of work on the CI side to generate the backtrace.
> > 
> > They're already generated for all current platforms.  It's possible that 
> > debug
> > info for some system libraries is missing, but the most important one (like
> > libc) should be available.
> > 
> > Since yesterday the cfbot website allows to easily find the coredumps, too:
> > http://cfbot.cputube.org/highlights/core-7.html
> 
> Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?

I think it'd be too noisy. Right now you get just a core dump of the crashed
process, but if we set send_abort_for_crash we'd end up with a lot of core
dumps, making it harder to know what to look at.

We should never need the send_abort_for_kill path, so I don't think the noise
issue applies to the same degree.

Greetings,

Andres




Re: We shouldn't signal process groups with SIGQUIT

2023-02-15 Thread Nathan Bossart
On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote:
> On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
>> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
>> > Not really related: I do wonder how often we end up self deadlocking in
>> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
>> > after, due to postmasters SIGKILL.  Perhaps we should turn on
>> > send_abort_for_kill on CI?
>> 
>> +1, this seems like it'd be useful in general.  I'm guessing this will
>> require a bit of work on the CI side to generate the backtrace.
> 
> They're already generated for all current platforms.  It's possible that debug
> info for some system libraries is missing, but the most important one (like
> libc) should be available.
> 
> Since yesterday the cfbot website allows to easily find the coredumps, too:
> http://cfbot.cputube.org/highlights/core-7.html

Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?

> There's definitely some work to be done to make the contents of the backtraces
> more useful though.  E.g. printing out the program name, the current directory
> (although that doesn't seem to be doable for all programs). For everything but
> windows that's in src/tools/ci/cores_backtrace.sh.

Got it, thanks for the info.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: We shouldn't signal process groups with SIGQUIT

2023-02-14 Thread Andres Freund
Hi,

On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Not really related: I do wonder how often we end up self deadlocking in
> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
> > after, due to postmasters SIGKILL.  Perhaps we should turn on
> > send_abort_for_kill on CI?
> 
> +1, this seems like it'd be useful in general.  I'm guessing this will
> require a bit of work on the CI side to generate the backtrace.

They're already generated for all current platforms.  It's possible that debug
info for some system libraries is missing, but the most important one (like
libc) should be available.

Since yesterday the cfbot website allows to easily find the coredumps, too:
http://cfbot.cputube.org/highlights/core-7.html

There's definitely some work to be done to make the contents of the backtraces
more useful though.  E.g. printing out the program name, the current directory
(although that doesn't seem to be doable for all programs). For everything but
windows that's in src/tools/ci/cores_backtrace.sh.

Greetings,

Andres Freund




Re: We shouldn't signal process groups with SIGQUIT

2023-02-14 Thread Nathan Bossart
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Not really related: I do wonder how often we end up self deadlocking in
> quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
> after, due to postmasters SIGKILL.  Perhaps we should turn on
> send_abort_for_kill on CI?

+1, this seems like it'd be useful in general.  I'm guessing this will
require a bit of work on the CI side to generate the backtrace.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: We shouldn't signal process groups with SIGQUIT

2023-02-14 Thread Andres Freund
Hi,

On 2023-02-14 15:38:24 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > ISTM that signal_child() should downgrade SIGQUIT to SIGTERM when sending to
> > the process group. That way we'd maintain the current behaviour for postgres
> > itself, but stop core-dumping archive/restore scripts (as well as other
> > subprocesses that e.g. trusted PLs might create).
> 
> Yeah, I had been thinking along the same lines.  One issue
> is that that means the backend itself will get SIGQUIT and SIGTERM
> in close succession.  We need to make sure that that won't cause
> problems.  It might be prudent to think about what order to send
> the two signals in.

I hope we already deal with that reasonably well - I think it's not uncommon
for that to happen, regardless of this change.

Just naively hacking this behaviour change into the current code, would yield
sending SIGQUIT to postgres, and then SIGTERM to the whole process
group. Which seems like a reasonable order?  quickdie() should _exit()
immediately in the signal handler, so we shouldn't get to processing the
SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
with SIGTERM being processed first, the SIGQUIT handler should be executed
long before the next CFI().


Not really related: I do wonder how often we end up self deadlocking in
quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
after, due to postmasters SIGKILL.  Perhaps we should turn on
send_abort_for_kill on CI?

Greetings,

Andres Freund




Re: We shouldn't signal process groups with SIGQUIT

2023-02-14 Thread Tom Lane
Andres Freund  writes:
> ISTM that signal_child() should downgrade SIGQUIT to SIGTERM when sending to
> the process group. That way we'd maintain the current behaviour for postgres
> itself, but stop core-dumping archive/restore scripts (as well as other
> subprocesses that e.g. trusted PLs might create).

Yeah, I had been thinking along the same lines.  One issue
is that that means the backend itself will get SIGQUIT and SIGTERM
in close succession.  We need to make sure that that won't cause
problems.  It might be prudent to think about what order to send
the two signals in.

regards, tom lane