Hi folks - wondering if anyone has any pointers on troubleshooting garbage collection. My colleagues and I are running into an interesting problem:
Intermittently, we get into a situation where the garbage collection code is running in an infinite loop. The data structures within the garbage collector have been corrupted, but it is unclear how or why. The problem is extremely difficult to reproduce consistently as it is unpredictable. The infinite loop itself occurs in gcmodule.c, update_refs. After hitting this in the debugger a couple of times, it appears that that one of the nodes in the second or third generation list contains a pointer to the first generation head node. The first generation was cleared shortly before the call into this function, so it contains a prev and next which point to itself. Once this loop hits that node, it spins infinitely. Chances are another module we're depending on has done something hinkey with GC. The challenge is tracking that down. If anyone has seen something like this before and has either pointers to specific GC usage issues that can create this behavior or some additional thoughts on tricks to track it down to the offending module, they would be most appreciated. You can assume we've done some of the "usual" things - hacking up gcmodule to spit information when the condition occurs, various headstands and gymnastics in an attempt to identify reliable steps to reproduce - the challenge is the layers of indirection that we think are likely present between the manifestation of the problem and the module that produced it. Many thanks, Dave -- http://mail.python.org/mailman/listinfo/python-list