On Mon, 6 Apr 2009, David Houlder wrote:
I don't think longjmp() is async signal safe.

There is "safe" and there is "safe"....

There is "safe" in the sense of being able to return back to what the program was doing. But in this case, the program has no intention of returning. It's reached an exception; it wants to take a specific action and then exit.

In the case of imapd:

imapd got some form of "time to die" signal: a hangup, a termination, a kiss of death. imapd has determined that whatever it was doing, it was NOT an update to the mailbox; it is perfectly alright to abort whatever it was.

So, imapd has no intention of continuing what it is doing.  imapd simply
wants to do the following:
 (1) If the mailbox traditional UNIX format, it wants to save any unsaved
     changes.
 (2) it wants to syslog that it is exiting, and why.
 (3) it wants to exit.

For 15 or so years, imapd simply did this in the signal handler. That worked well; and older versions of libc explicitly supported signal handlers doing this. You could screw up your context as long as you didn't try to go back. Let me emphasize: libc explicitly supported you doing this.

Then glibc came along and applied mutexes. Suddenly in newer versions of Linux, imapd would be hanging in the syslog() because it may have been doing a printf() in the main line.

And the answer from the glibc developers was that you couldn't do syslog() in a single handler. You have to continue what the program was doing, and somehow in gawdknowswhat code figure out that the signal happened and take the error path.

The problem was, the server ended up getting hung, typically in TCP wait on a socket that was dead but somehow failing to fault the IOT on it.

So going back to what the program was doing wasn't working out.

Lo and behold, in looking at glibc code it appeared that longjmp() unwound the mutexes. And it seems to work.

But now we have these wierd corruptions, which have nothing to do with anything since it isn't even writing the file at that point! It's almost as if glibc randomly picks a file descriptor, seeked to 0, and piddled some stuff there.

At this point, it looks like the whole exercise is futile. Since glibc has broken how signal handlers used to work, the only way out is not to try to log why the server terminated. Just vanish without a trace. Similarly, don't even try to save updates in traditional UNIX mailbox format ...even though we KNOW that the server wasn't doing anything to the file at the time therefore that file descriptor is completely clean.

It's a shame that Linux (and I guess BSD) does not have useful signals any longer. For nearly 40 years, it has been commonplace for a signal handler to take an abort action with logging without going back to what it was doing. That apparently has been "improved" into abolition.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
_______________________________________________
Imap-uw mailing list
Imap-uw@u.washington.edu
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw

Reply via email to