On Wed, Nov 11, 2020 at 02:34:57PM -0600, Derek Martin wrote: > [...] I would also agree that Mutt's exit status handling is > incomplete. That probably needs to be fixed. I'd need to review > the code in more detail to be confident about whether > blocking/ignoring SIGCHLD is appropriate or not.
So I've just had a look at this, and I have some comments. A proper
fix would be...complicated. Bottom line, Mutt's code IS definitely
causing part of the headache, and I'd hoped to provide a proposal to
fix it with this message, but it would take more of a rework than I
have time for.
THAT SAID, my read of the code strongly suggests my initial guess
about what's wrong with Matthias' case was correct. That means fixing
this bug won't actually fix that problem--it can only make it clearer
to the user what happened--and it also means that Matthias should be
able to fix the problem by fixing the broken sendmail script Mutt is
calling.
Specifically, Mutt uses alarm() to stop waiting too long for a rogue
sendmail process, and the mutt-users post indicated that SIGALRM was
what stopped waitpid() from continuing to wait. So, the script that
Mutt launched never completed within SendmailWait seconds, whatever
that is, and Mutt decided to stop waiting for it. SIGALRM doesn't
just happen at random--that's really the only plausible explanation
and it matches the code. The rest of what I said then would
necessarily have happened: The intermediate fork exited (see more
below about sendlib.c), which closed the script's STDIN, which caused
sendmail to get EOF reading the message and finally send it. This
also explains why waitpid() was getting ECHILD: The child was in fact
not dead (see more below under background.c).
So to be clear, (I'm as certain as I can be without access to the code
that) Matthias' problem is in Mattias' code, not Mutt. Mutt just
makes it worse/confusing by doing poor error handling.
Now, turning attention back to Mutt's code, a few notes about the bad,
the worse, and the ugly... :)
system.c:
Mostly looks fine, BUT resets the disposition of SIGCHLD to SIG_DFL,
which seems right, but differs from signal.c where it sets it to
chld_handler (which just sets a flag). See below.
signal.c:
Sets the handler for SIGCHLD to chld_handler. This is largely
pointless. It just sets SigChld = 1. That's generally the correct
way to handle a signal, but in this case it does nothing useful
since Mutt calls waitpid all over the code.
The only effect of setting SigChld is it triggers a redraw in
update_bg_menu, but that can easily be eliminated by making the
decision to do a redraw depend entirely on unconditionally calling
mutt_background_process_waitpid, eliminating the SigChld check. In
one place, mutt_background_compose_menu arguably abuses SigChld (the
global flag, not the signal) to force a redraw. Not exactly
incorrect... just a little gross, and completely unnecessary.
Plus, unless I missed something, it appears that if _mutt_system()
is ever called, this stops happening anyway, since it resets
SIGCHLD's disposition to SIG_DFL (the signal handler will no longer
ever execute), which likely proves that the above suggestion is
already working. And with SIGCHLD set that way, there's no cause
to block it. This is The Right Thing™.
background.c:
background_edit_landing_page() has this code:
while (!done)
{
wait_rc = waitpid (bg_pid, NULL, WNOHANG);
if ((wait_rc > 0) ||
((wait_rc < 0) && (errno == ECHILD)))
{
rc = 0;
break;
}
...
Checking for ECHILD is unnecessary if children are handled properly.
Or, said another way, if you're not ignoring SIGCHLD (i.e. SIG_IGN)
then the only way you can get ECHILD is if the child IS NOT DEAD.
In this context (background editing), that scenario makes no sense.
Since you're not ignoring your children, they will zombify until you
ask for their status, so their PID can not be reused, and they will
still be in the process table until you call waitpid(). The only
other potential way you could have this happen is if you called
waitpid() twice on the same pid. Don't do that. :)
This also eats the exit status of the editor, so mutt can't do
anything intelligent with it later (like report that the editor
failed/crashed).
Worse, mutt_background_compose_menu() calls waitpid and doesn't even
check the return value. Not sure what the point of this is, but I'd
assume the user would want to know if that process failed or
crashed, and that's not happening.
sendlib.c:
Properly handling the return from waitpid() is a little tricky, and
like most code I've seen, this code doesn't. This looks like the
cause of the behavior Matthias saw, though it probably won't happen
if there isn't also something wrong with his send script. The
NATURE of the bug is that it causes Mutt to misreport to the user
what happened.
Fixing this almost certainly won't fix Matthias' problem, but may at
least help make it clear why it's happening. There's also a bit of
trickiness in what to do about some of the cases... But in this
instance, basically Mutt should report to the user that the process
was killed by a signal, what the signal was, and indicate that it
can't know if sending the message was actually successful.
Trouble is, the way send_msg() forks twice, it provides no good
means for communicating the full status back to the parent Mutt
process. When you call exit(status), The OS hands the least
significant bits of status to you, plus a bunch of extra bits. If
an intermediate process gets back an exit status, and then calls
exit with that same exit status, a bunch of important information is
lost. Mutt could provide a different means of communicating this
status back down the process tree to the parent Mutt process, but...
doesn't.
So... What I would do, if I were going to fix this is:
- Get rid of SigChld and the chld_handler, and remove all instances
of blocking or ignoring SIGCHLD. Leave SIGCHLD set to SIG_DFL.
- Fix all calls to waid or waitpid to look for at least 3 cases:
+ WIFEXITED (exited "normally" with some exit status)
+ WIFSIGNALED (killed by a signal, WSTOPSIG() gives the sigal)
+ "else" (stopped, continued, any OS-specific cases). You could
split these out to separate cases, but prolly not useful.
These cases aren't the same and shouldn't overlap, so not checking
for them produces undefined behavior in Mutt.
Some calls to waitpid are not even checking their return status.
- drop the extra fork in send_msg() and/or implement a different
means of communicating the FULL exit status (not just the exit
code) so you can use the W* macros on it
- It may make sense that sometimes, you can't do much of anything
with the status returned by waitpid(). Mutt already has a
mechanism implemented to save those for later, but many of the
calls are not even using that. Any time this happens, Mutt is
throwing away status that the user NEEDS in order to make an
informed decision about how to handle the exception. Though,
mostly I think if you can't use it where it was obtained, the code
flow needs a redesign.
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
signature.asc
Description: PGP signature
