Grrk. I have done this myself, and been involved in one of the VERY few commercial projects that attempted to do it properly (IBM CEL, the other recent one being VMS). I am afraid that there are a lot of misapprehensions here.
Several people have said things like: > The thing to model this on, I think, would be the > BSD sigmask mechanism, which lets you selectively > block certain signals to create a critical section > of code. A context manager could be used to make > its use easier and less error-prone (i.e. harder > to block async exceptions and then forget to unblock > them). No, no, no! That is an TRULY horrible! It works fairly well for things like device drivers, which are both structurally simple and with no higher level recovery mechanism, so that a failure turning into a hard hang is not catastrophic. But it is precisely what you DON'T want for complex applications, especially when a thread may need to call an external service 'non-interruptibly'. Think of updating a complex object in a multi-file database, for example. Interrupting half-way through leaves the database in a mess, but blocking interrupts while (possibly remote) file updates complete is asking for a hang. You also see it in horrible GUI (including raw mode text) programs that won't accept interrupts until you have completed the action they think you have started. One of the major advantages of networked systems is that you can usually log in remotely and kill -9 the damn process! The way that I, IBM and DEC approached it was by the classic callback mechanism, with a carefully designed way of promoting unhandled exceptions/interrupts. For example, the following is roughly what I did (somewhat extended, as I didn't do all of this for all exceptions): An event set a defined flag, which could be tested (and cleared) by the thread. If a second, similar event arrived (or it was not handled after a suitable time), the event was escalated. If so, a handler was called that HAD to return (again within a specific time). If a second, similar event arrived or it didn't return by a suitable time, the event was escalated. If so, another handler was called that COULDN'T return. If another event arrived, it returned, or it failed to close down the thread, the event was escalated. If so, the thread's built-in environment was closed down without giving the thread a chance to intervene. If that failed, the event was escalated. If so, the thread was frozen and process termination started. If clean termination failed, the event was escalated. If so, the run-time system produced a dump and killed itself. You can implement a BSD-style ignore by having an initial handler that just clears the flag and returns, but a third interrupt before it does so will force close-down. There was also a facility to escalate an exception at the point of generation, which could be useful for forcible closedown. There are a zillion variations of the above, but all mainframe experience is that callbacks are the only sane way to approach the problem IN APPLICATIONS. In kernel code, that is not so, which is why so many of the computer scientists design BSD-style handling (i.e. they think of kernel programming rather than very complex application programming). > Unconditionally killing a whole process is no big > problem because all the resources it's using get > cleaned up by the OS, and the effect on other > processes is minimal and well-defined (pipes and > sockets get EOF, etc.). But killing a thread can > leave the rest of the program in an awkward state. I wish that were so :-( Sockets, terminals etc. are stateful devices, and killing a process can leave them in a very unclean state. It is one of the most common causes of unkillable processes (the process can't go until its files do, and the socket is jammed). Many people can witness the horrible effects of ptys being left in 'echo off' or worse states, the X focus being left in a stuck override redirect window and so on. But you also have the multi-file database problem, which also applies to shared memory segments. Even if the process dies cleanly, it may be part of an application whose state is global across many processes. One common example is adding or deleting a user, where an unclean kill can leave the system in a very weird state. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761 Fax: +44 1223 334679 _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com