The only place in distcc where longjmp is called is timeout.c.
I'm not at ease with longjmp() / setjmp() since I haven't used these functions for a long time. I'm wondering whether it's a problem with variable "timeout_jmpbuf". Maybe it's not initialized properly.
The code in timeout.c looks like:
dcc_timeout_arm(const int timeout, int phase) { static enum dcc_phase saved_phase;
saved_phase = phase;
if (setjmp(timeout_jmpbuf)) { /* setjmp return through here if it timed out. */ rs_log_error("%s timeout", dcc_get_phase_name(saved_phase)); return EXIT_TIMEOUT; }
The crash happens always after a "Connect timeout" message, probably some time after line "setjmp(timeout_jmpbuf)" has been executed. Does this ring some bell?
Yes. Maybe the signal handler is doing something unsafe. In particular, http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html says
... Note that longjmp() and siglongjmp() are not in the list of reentrant functions. This is because the code executing after longjmp() and siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() and siglongjmp() from within signal handlers require rigorous protection in order to be portable. ...
The list of safe functions is in http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
I suspect the timeout code in the new distcc is flawed and needs to be
rethought, but I haven't looked at it myself. Could it be done without
longjmp?
- Dan
__ distcc mailing list http://distcc.samba.org/
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/distcc