There's kind of a cool tool for connecting Nagios to PagerDuty called
Flapjack - designed to avoid flooding people with pages when things go
south.
http://flapjack.io/
On 08/25/2014 05:02 PM, Lawrence K. Chen, P.Eng. wrote:
On 08/25/14 13:44, Warner wrote:
On Mon, Aug 25, 2014 at 06:09:10AM -0700, Nathan Clemons([email protected])
wrote:
We're looking to set up small teams in nagios and rotate between
primary and secondary contacts, vs having one global on call person.
(Ie, two networking folks, two vmware folks, two Unix folks, etc.)
What kind of solutions have folks tried for this? Pagerduty seems
excessively priced for this kind of task, especially when we're trying
to trim opex costs. When I worked at /. we used sendmail aliases to
control the paging and just ran a script from cron to adjust the list
to the next person in line on Monday morning.
In the past, I've used qmail dot files and shell scripts. Standardized
the contacts on e-mail aliases. That can work well.
Since then, I've become a big fan of Pager Duty. Not having to maintain
a separate schedule, having a central point for notifications, and
additional bells and whistles such as notification when going on call
are huge wins.
Both approaches work well. Pager Duty does have value though, I wouldn't
write it off.
Warner
I don't know much about pagerduty, except one group on campus that shares our
Nagios server is using it.
So, there's perl script to tie into nagios hasn't left a good impression on me.
A couple week after I had set it up, I noticed it had spawned 1000s of copies
of itself and our server was close to death....clearing, it would just start
building up again. I thought about making a promise to deal with it in the
short term, though I could recall if CFEngine 2.2 had the capability or what
its syntax might be. Saw there were some notificaitons queued, and that they
were all hanging on that....seems the first get's stuck on it, and the rest
get stuck on the first process still being there.
In trying to see what it was doing...found its trying to post to some https
URL through LWP. Except it still seems that after more than 10+ years...LWP
https through a proxy is still busted, so don't know why this script would
expect to work....
And, a proxy is needed because the server is in private IP space (eventually
our entire datacenter will be....though sounds like it'll all be behind our
F5, but its been WIP for almost a couple of years now.)
In the meantime its largely neglected/forgotten squid proxy server that I
threw up back in 2007 to replace the one that everybody depended on, but
nobody claimed ownership for when the last of some UltraServer 2's were
decommissioned. Its running in a Solaris Zone, which has been moved and
undergone upgrade on attaches a few times....
After a couple of days, I opted for an earlier suggestion I had found online.
I used a Perl module of LWP-Proxy-Connect (still waiting to see if it'll get
accepted into FreeBSD Ports) to make the script work. Just a one line change
to the pagerduty script, IIRC, and it started working again....
That was until I let CFEngine loose again, and it reverted it :)
While I was working on it, the group using it finally logged a couple of
tickets...one about unable to fork errors, and that they had stopped getting
notifications, where they thought there should've been some on the weekend.
(they were the ones killing my Nagios server.)
Later, they added that it had worked up to the Friday before....
Finally they admitted that they had changed it that Friday from using email to
posting to web for notifications. (had I known, I might have just suggested
they switch it back :)
Hadn't really thought about our notifications from this Nagios server now
being dependent on our smtp server....our old server had been in the
datacenter range that is completely open to the world....so it did its own
mail delivery (especially important when it used to largely inform us of
problems with campus email...) Though its getting hard for me to handle
notifications timely/safely....
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/