On Wed, 2006-03-08 at 09:32 +0100, Thomas Mueller wrote: > Hi Geo, > > > But even working init.d scripts and working daemons can be killed or > > stopped for lots of reasons, and unless you duplicate the work of init, > > or manually intervene, they won't start back up. > > I read that argument several times now but I still think init doesn't > help here. > > I have to monitor my processes - it doesn't matter how I start them. > What if my webserver doesn't answer requests but is still running? Init > doesn't care. > That's why I use monit [1] and let it check if the process is running > and if it serves a specific HTML, PHP, CGI page. If any test fails I > always try to kill all the processes and restart them. > With init I could only leave out the 'restart' part.
So using inittab _does_ help. Init can help in another way- it can keep monit running, after all, what happens if monit were killed? Nonetheless, "hung" processes are another deal: "hung" processes are the result of broken code, or a broken design. The problem I've been pointing out is that processes can be killed for any reason- and no amount of defensive programming in the program itself can protect against those kills. The only process guaranteed to be running is pid=1, aka init. Fortunately, init can monitor processes and restart them if they die for any reason. > Additionally monit watches the ressources a process uses and can warn or > restart it based on the ressource usage. So can setrlimit/ulimit. With ulimit, the process gets killed, and init restarts it. I also advocate enforced limits. Enforced limits are very good. I think daemons should setrlimit themselves (if started as root, or put in the startup script a ulimit line) whenever possible. Sometimes deadlock (a kind of hang) can be handled using ulimit -x, other times (spins) the hangs can be handled using ulimit -t. Other times still, programs like monit can be helpful in monitoring usability of a service, but since the usability is directly impacted by software bugs within the service provider, I wonder how long people want to run such software... ... but I suppose that's a different discussion :) > [1] http://www.tildeslash.com/monit/ > [2] http://www.rsbac.org/ These are good things to know about. I don't suspect that init can replace _every_ monitoring and system management function- I'm not that ludicrous, but I'm specifically looking for ways that it cannot replace init.d. As a side note, do you have any links to how to get rsbac to enforce limits- or rather, what other kinds of limits it can enforce? I found some links on the site referring to RES and doing essentially a sitewide per-user or per-grandfather-pid rlimit, but are there others? Examples? -- Internet Connection High Quality Web Hosting http://www.internetconnection.net/