I apologise for going off-topic, but this is an explanation of why I said that signal handling is not reliable. The only relevance to Python is that Python should avoid relying on signals if possible, and try to be a little defensive if not. Signals will USUALLY do what is expected, but not always :-(
Anything further by Email, please. Greg Ewing <[EMAIL PROTECTED]> wrote: > > > This one looks like an oversight in Python code, and so is a bug, > > but it is important to note that signals do NOT work reliably under > > any Unix or Microsoft system. > > That's a rather pessimistic way of putting it. In my > experience, signals in Unix mostly do what they're > meant to do quite reliably -- it's just a matter of > understanding what they're meant to do. Yes, it is pessimistic, but I am afraid that my experience is that it is so :-( That doesn't deny your point that they MOSTLY do 'work', but car drivers MOSTLY don't need to wear seat belts, either. I am talking about high-RAS objectives, and ones where very rare failure modes can become common (e.g. HPC and other specialist uses). More commonly, there are plain bugs in the implementations which are sanctioned by the standards (Linux is relatively disdainful of such legalistic games). Because they say that everything is undefined behaviour, many vendors' support mechanisms will refuse to accept bug reports unless you push like hell. And, as some are DIABOLICALLY difficult to explain, let alone demonstrate, they can remain lurking for years or decades. > There may be bugs in certain systems that cause > signals to get lost under obscure circumstances, but > that's no reason for Python to make the situation > worse by introducing bugs of its own. 100% agreed. > > Two related signals received between two 'checkpoints' (i.e. when > > the signal is tested and cleared). You may only get one of them, > > and 'related' does not mean 'the same'. > > I wasn't aware that this could happen between > different signals. If it can, there must be some > rationale as to why the second signal is considered > redundant. Otherwise there's a bug in either the > design or the implementation. Nope. There is often a clash between POSIX and the hardware, or a cause where a 'superior' signal overrides an 'inferior' one. I have seen SIGKILL flush some other signals, for example. And, on some systems, SIGFPE may be divided into the basic hardware exceptions. If you catch SIGFPE as such, all of those may be cleared. I don't think that many (any?) current systems do that. And it is actually specified to occur for the SISSTOP, SIGTSTP, SIGTTIN, SIGTTOU, SIGCONT group. > > A second signal received while the first is being 'handled' by the > > operating system or language run-time system. > > That one sounds odd to me. I would expect a signal > received during the execution of a handler to be > flagged and cause the handler to be called again > after it returns. But then I'm used to the BSD > signal model, which is relatively sane. It's nothing to do with the BSD model, which may be saner but still isn't 100% reliable, but occurs at a lower layer. At the VERY lowest level, when a genuine hardware event causes an interrupt, the FLIH (first-level interrupt handler) runs in God mode (EVERYTHING disabled) until it classifies what is going on. This is a ubiquitous misdesign of modern hardware, but that is off-topic. Hardware 'signals' from other CPUs/devices may well get lost if they occur in that window. And there are other, but less extreme, causes at higher levels in the operating system. Unix and Microsoft do NOT have a reliable signal delivery model, where the sender of a signal checks if the recipient has got it and retries if not. Some operating systems do - but I don't think that BSD does. > > A signal sent while the operating system is doing certain things to > > the application (including, sometimes, when it is swapped out or > > deep in I/O.) > > That sounds like an outright bug. I can't think > of any earthly reason why the handler shouldn't > be called eventually, if it remains installed and > the process lives long enough. See above. It gets lost at a low level. That is why you can cause serious time drift on an "IBM PC" (most modern ones) by hammering the video card or generating streams of floating-point fixups. Most people don't notice, because xntp or equivalent fixes it up. And there are worse problems. I could start on cross-CPU TLB and ECC handling on large shared memory systems. I managed to get an Origin in a state where it wouldn't even power down from the power-off button, and I had to flip breakers, due to THAT one! I have reason to believe that all largish SMP systems have similar problems. Again, it is possible to design an operating system to avoid those issues, but we are talking about mainstream ones, and they don't. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761 Fax: +44 1223 334679 _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com