Hi,
I'm still struggling trying to analye the distcc 2.17 crashes.
Martin, maybe you can have a look at this if you have time, I'm sure you can help me find out what's wrong with the new timeout code.
I've tried to instrument distcc 2.17 with Insure++. I did get some runtime errors, such as uninitialized reads, but they're unrelated and I'd rather fix the crash before looking into them.
The result is that distcc crashes Insure++ just like it crashes Valgrind:
### Unix/Signal.cc:332: panic: received signal 11 while in runtime ### <at> (#)$RCSfile: Signal.cc,v $ $Revision: 32.52 $ $Date: 2003/07/28 16:15:14 $ ### ThisThread.cc:593: abort ### <at> (#)$RCSfile: ThisThread.cc,v $ $Revision: 32.119.2.3 $ $Date: 2003/08/01 22:37:30 $
At least this seems to indicate that something's wrong in the timeout signal handler. Also the comments from the Valgrind team were:
| That's an ENTER instruction with a non-zero nesting level. It | sounds like a pretty unusual instruction to be using - are you | sure your program isn't jumping through a bad pointer somewhere?
| If the pointer is undefined and valgrind realises that it is | undefined then it should warn you. It could be well defined but | bogus however, in which case valgrind wouldn't be able to help.
See also this thread on the Valgrind-users mailing list: http://thread.gmane.org/gmane.comp.debugging.valgrind/1801
It seems distcc is jumping to some bogus address somwhere in the timeout handlers. Do you have a clue where that could be happening?
Regards,
Dimitri
__ distcc mailing list http://distcc.samba.org/
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/distcc