Hi all. I attended the Nagios World Conference North America last week and though I'd dish out some kudos where such are due, and also dense up the information to any newcomers that might get lucky when looking for solutions to any particular problems.
Overall, the standard of the conference was very, very high. It was the first Nagios conference I've gone to where I learned something new. A rare occasion indeed, so many thanks to Ethan, Mary and Nagios Enterprises for arranging such a high-quality event. I won't mention their talks, since I don't want to inflate their egos too much, but check out the one on visualizations by Mike Guthrie. Pretty cool stuff :) Much of the focus was on scaling up Nagios. mod_gearman and livestatus seem to be the most known and used projects for achieving that goal. Reading status files is just too slow when viewing the UI, and a single server just doesn't scale to enough checks (yet). DNX also seemed very well investigated and used in some places, although a documentation mishap seems to have lead many potential users away from it. For those wondering, DNX can indeed distribute checks to workers based on host- groups, just as mod_gearman can. It's just not well documented. LivestatusSlave also got a lot of interest, although it didn't seem to be as well used as either of the other three. Kudos to Sven Nierlein (mod_gearman author/maintainer) Mathias Kettner (mk_livestatus author/maintainer) and Lars Michelsen (LivestatusSlave author). Your stuff is being used in production for positively *huge* installs, so well done guys :) I sure hope you go to the conference next year so you can talk about future development and gather even more interest for your projects. Merlin wasn't much discussed, although the DNX maintainers (and I) recommend it as the only sane way to get redundancy and automagic loadbalancing. Probably because of the misconception that you're required to run a separate UI and a fork of Nagios when using it. At least that's what my slightly hurt ego wants to believe ;) General tips for running large installations is to offload the various spool directories to ramdisk, along with status.dat and objects.cache (since they're read quite frequently). Work is under way to make that unnecessary by simply getting rid of disk I/O as much as possible. It was pretty much headnodding when these tips were iterated in one talk after another, so it seems the attending part of the Nagios community have reach consensus that that's the best way to do it. Mounting all disks with the noatime option is also a very good tip that'll get your disk write operations (the slow ones) down to a fragment of what they were before you latched that option on. Many have large headaches with getting various graphing solutions to scale properly. Some resorted to using Fusion I/O cards with exabyte performance (quite expensive...), since using ramdisk to store the tens or hundreds of gigabytes of rrd-files generated in large installs isn't really an option. It would be nice to hear Joerg Linge's (author of PNP4Nagios) take on other paths to increase performance next year. It seems his project is the most widely used for graphing, so getting it to perform exceptionally well would be time well spent. Apart from that, there were plenty of other good presentations and very awesome drinking^H^H^H^H^H^H^H mingle sessions. I highly recommend you attend it next year if you're managing nagios install at $dayjob, or if you're working on a Nagios addon project and want to get immediate feedback on what users are looking for. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null