On Mon, 2004-12-06 at 06:21, Duncan Grisby wrote: > Hi, > > Does anyone know of a deadlock detector for Python? I don't think it > would be too hard to hook into the threading module and instrument > mutexes so they can be tested for deadlocks. I've googled around but I > haven't found anything.
In general, accurate deadlock detection is intractable. Like many problems of this class you have two choices - either reduce the scope of the problem to a non-general one or try a heuristic to guess. As I recall, deadlock prevention is similar to the halting problem; the only question you can answer is "which category am I in:" A) I know for sure there are no deadlocks B) I don't know, maybe there are, maybe there arn't. In the halting problem, the answer to your question is B until you actually halt, in which case the answer to your problem is obvious. Here is a quick and dirty heuristics to filter some programs as being in bin A or bin B. First, for bin B. Instrument your mutex so that every time you lock, it creates a directed edge in a global system wide graph from your current mutex (mutex_N) to the next to most recently locked mutex you are currently holding for the locking thread. If your graph becomes cyclic, you might be in bin B (well, you were never in bin A to being with.) Throw a "I've lost faith in my inability to deadlock" exception. If you can *prove* that there is a strict topological order between nested mutex invocations, then you are in bin A. The degree of rigor in your proof is up to you, your level of comfort and how many people die if your code deadlocks (your personal website and medical instruments have different standards.) Not everybody trusts themselves. There are a number of alternative approaches, including having a single global critical section lock (ancient linux smp code) or designing your mutex operation to imply the release of all held locks. Of course, if you must hold more than one lock at a time, your mutex function can take a list of locks that it will atomically apply. The proper design of this is left as an exercise for the reader. Unless you thought of this from the beginning, retrofitting safe locks into your existing large project will be expensive. The next possibility won't work for python, but it is useful to keep in mind. The halting problem has a small caveat, it is applicable to "general" Turing machines. Place restrictions on your Turing machine that makes it not a Turing machine and the problem goes away. In real time systems (oh my, cat < aborted.phd.thesis.tex > mail [EMAIL PROTECTED] ) where you have to compute the longest time any piece of code will take to execute this sort of analysis is common place. Get rid of function pointers (including weakly typed OOPLs like Python.) You don't have to worry about limiting loop counts like in a RTL, because we arn't interested in timing information. Oh, and ditch recursion. Maybe you don't have to, but I'm not sure. Now walk through your code taking each possible path. You can collapse loops to each meaningful combination (depends on the nature of your languages loop construct), collapse paths that don't have any mutex operations. You get the idea. Unless mutex calls are rare, or your code simple, you might spend a while. Largely this problem is intractable, even with simplifications, but it is done which is why safety critical programs are (well, should be) small and their languages not very expressive (as in finite state machine, and not in the "but my computer is a FSM sense.") Adam DePrince -- http://mail.python.org/mailman/listinfo/python-list