Solar winds as you send in the specific mib required to monitor and a week later it's general release
Sent from my iPhone On 2011-12-13, at 7:11 PM, "Robert Brockway" <rob...@timetraveller.org> wrote: > On Mon, 12 Dec 2011, Eric J Esslinger wrote: > >> I'm not looking to monitor a massive infrastructure: 3 web sites, 2 >> mail >> servers (pop,imap,submission port, https webmail), 4 dns servers >> (including lookups to ensure they're not listening but not >> talking), and >> one inbound mx. A few network points to ping to ensure connectivity >> throughout my system. Scheduled notification windows (for example, >> during work hours I don't want my phone pinged unless it's everything >> going offline. Off hours I do. Secondary notifications if problem >> persists to other users, or in the event of many triggers. That >> sort of >> thing). Sensitivity settings (If web server 1 shows down for 5 min, >> that's not a big deal. Another one if it doesn't respond to repeated >> queries within 1 minute is a big deal) A Weekly summary of issues >> would >> be nice. (especially the 'well it was down for a short bit but we >> didn't >> notify as per settings') I don't have a lot of money to throw at >> this. I > > Hi Eric. The feature set you are describing should be in any > monitoring > system worthy of the name. I've used Nagios to good effect for the > best > part of the last 12 years or so. Before that I used Big Brother, > which > sucked in various ways. > > I did an evaluation on a wide variety of FOSS monitoring systems 2-3 > years > ago and Nagios won at the time (again). Generally I found the > alternatives had problems that I considered to be quite serious > (such as > being overly complicated or doing checks so frequently that they > loaded > the systems they were supposed to be monitoring[1]). > > I'm currently trialing Icinga, a fork of Nagios. > > Puppet can be set up to manage Nagios/Icinga config which cuts down > on the > admin overhead. > > Nagios/Icinga can be hooked up to Collectd to provide performance > data as > well as alert monitoring. > > One concern about external monitoring services is the level of > visibility > they need to have in to your network to adequately monitor them. > > My recommendation is to do a proper risk assessment on the available > options. > >> DO have detailed internal monitoring of our systems but sometimes >> that >> is not entirely useful, due to the fact that there are a few 'single >> points of failure' within our network/notification system, not to >> mention if the monitor itself goes offline it's not exactly going >> to be >> able to tell me about it. (and that happened once, right before the >> mail >> server decided to stop receiving mail). > > There are a couple of ways to deal with this. Some monitoring > applications can fail-over to a standby server if the primary > fails. But > this isn't even really necessary. You will arguably gain higher > reliability by running multiple _independent_ monitors and have them > monitor each other[2]. I have often used this approach. > > The principal aim here is to guarantee that you are alerted to any > single > failure (a production service, system or a monitor). Multiple > simultaneous failures could still produce a blackspot. It is > possible to > design a system that will discover multiple simultaneous failures, > but it > takes more effort and resources. > > > [1] Sometimes I wonder if the people developing certain systems have > any > operational experience at all. > > [2] A system designed to fail-over on certain conditions may fail to > fail-over, ah, so to speak. > > Cheers, > > Rob > > -- > Email: rob...@timetraveller.org Linux counter ID #16440 > IRC: Solver (OFTC & Freenode) > Web: http://www.practicalsysadmin.com > Director, Software in the Public Interest (http://spi-inc.org/) > Free & Open Source: The revolution that quietly changed the world > "One ought not to believe anything, save that which can be proven by > nature and the force of reason" -- Frederick II (26 December 1194 – > 13 December 1250)