On Fri, 11 Sep 2009, Tracy Reed wrote: > On Fri, Sep 11, 2009 at 05:55:59PM -0400, [email protected] spake thusly: >> Ha! What happens if something else gets killed instead (sshd? iptables? >> syslogd?)...then things get really ugly...not only is apache running badly, > > Important stuff is monitored and restarted by configuration > management. So far I haven't had it be a problem.
Hi tracy. Good luck with that. The OOM killer could easily hit part of your management system (even if it only uses sshd to allow a remote system to login and restart the app). Figuring out how the OOM killer should decide what to whack is a much bigger problem than most people realise. Have a look at the discussions on LKML that span years. The bottom line is that for a general purpose server there are no really good solutions to this problem. The best way to manage an OOM condition is to avoid it in the first place[1]. If the OOM killer activates all bets are off about the stability of the system. Security might even be impacted. Even if the OOM killer algorithm tries to spare important processes (Linux does) there is no way of knowing how bad the OOM is or how long it will last. No process is immune. Having the box reboot on OOM isn't necessarily a bad idea in some situations but what if the OOM condition hits all the servers in the farm at the same time[2]? [1] At this point I'm thinking about a quote from the movie Wargames ;) [2] Not as unrealistic as it might seem at first. The OOM might be a by product of external stimuli like an attempted DoS against the servers. The servers might help the attackers by DoSing themselves. Cheers, Rob -- I tried to change the world but they had a no-return policy http://www.practicalsysadmin.com _______________________________________________ Discuss mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
