Re: [lopsa-discuss] Metrics vs Monitoring...

Lamont Granquist Tue, 14 Jan 2014 15:27:42 -0800


On Mon, 13 Jan 2014, Kelvin Ku wrote:

You can eliminate a lot of active checks if you watch the logs for normal 
activity (you can even
setup your alerts so instead of just calling a person, it first does a 
monitoring probe in case the
traffic had just dropped off)

One thing to remember, your load balancer's test is not testing to see if the 
product works, just
that the webserver works. you need other tests to make sure that all the web 
hits you are getting
aren't just generating a 'database error, try again later' response ;-)

Yes, the best active checks you have of a webserver or a database are theclients of that service. If they are wrapping all their calls in timersand reporting success and failure and perc99 times, then if you are notgetting any failures and the perc99 times are within your SLAs, then thewebserver/database is probably up -- in fact that is probably thedefinition of up or down. Those are the alerts that should be wired upinto paging people into action at 3am in the morning.

Then there's trending of resources like disk space and other issues thatwill become issues if they aren't addressed, but those should be yellowalerts or should flap yellow/green long enough that they can be caught andaddressed during normal business hours before they cause an impact.

In a large enough site monitoring stuff like CPU utilization and wiring itup to pagers becomes a tedious job of dealing with false alerts. Oftenthose go off for services that are designed to grind CPU and there's noimpact to SLAs. I've generally wound up only displaying CPU grindinghosts (useful information when trying to find the root cause of an outage)but not alerting on CPU and only alerting on actual appperformance/availability metrics.

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-discuss] Metrics vs Monitoring...

Reply via email to