Hello, I am using distcc to distribute jobs on several computers in an university network. Some of them usually have problems like a very high load caused by unsuccesful closed X-sessions or strange cups-daemon-processes. Due to problems like this, some machines allow normal users like me to connect, but not to log-in, i.e., resulting in "hanging" ssh- or rsh-sessions. Only root is able to log-in and terminate such high-load-processes or reboot the machine. I cannot ask my system administrator to do this several times a day.
However, in these cases, distcc seems to have a similar problem. The graphical monitor shows that distcc is in "Connect"-Status for several seconds or even minutes without anything happening for that job. All other machines get their jobs, finish them and get new jobs, only this one machine hangs. After everything has been completed, I can terminate the make-run using CTRL-C and start it again, so that the last jobs gets finished this time on another machine. So I wonder, whether distcc can do the following for me: If a distributed job remains in "Connect"-status for a certain amount of time, perhaps a user-defined number of seconds or a default of 10 seconds, distcc should kill this job, mark the machine as not available and redistribute the job in the same way as if the machine is not reachable at all. If "Send"-status is reached before this time limit everything should be processed as before. Is this easy to integrate or would it cause a big amount of work? Best regards, Christian Breimann __ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: http://lists.samba.org/cgi-bin/mailman/listinfo/distcc