On 22.11.2011 10:31, Andreas Ericsson wrote: > On 11/22/2011 01:02 AM, Michael Friedrich wrote: >> On 21.11.2011 20:56, Andreas Ericsson wrote: >>> On 11/01/2011 02:05 PM, Michael Friedrich wrote: >>>> hi, >>>> >>>> recently we've been debugging on team icinga in the middle of >>>> notifications and macros, and while investigating on a users problem, >>>> we've digged a bit deeper into the notification viability checks, >>>> resulting in deeper analysis of an Opsview patch to reduce the >>>> notification load significantly by moving the viability checks from >>>> the actual notification into the creation of the contacts notified, >>>> passing only a list of 'qualified' contacts to the actual >>>> notification logic. the only thing to remark over here is that the >>>> checks against the valid notification_period now happen sooner, and >>>> not actually when the notification is sent to each contact. >>>> >>>> while implementing that patch into current code (needs some macro >>>> passing with current code), we did remember nagios bug #98 where the >>>> $NOTIFICATIONRECEIPIENTS$ macro is demanded to be only populated with >>>> the actual contacts to be notified, but not all of those assigned to >>>> the host/service. while this is considered to be a real bug, further >>>> investigation showed that thanks to the viability checks before >>>> calling add_notification(), contacts won't be added to that macro as >>>> the macro logic happens within that function too. >>>> >>>> so by applying the attached git patch, you will a. reduce >>>> notification load and b. fix the $NOTIFICATIONRECEIPIENTS$ macro >>>> holding all contacts, but not the viable contacts. >>>> >>>> since the code remains actually the same on icinga and nagios in this >>>> stage, the tests can be found at the icinga dev tracker as usual. >>>> https://dev.icinga.org/issues/1744 >>>> https://dev.icinga.org/issues/2023 >>>> >>> >>> I've started looking into this patch right now. It's good to get that >>> issue (#98) fixed, but I fail to see any noticeable performance >>> improvement. All contacts potentionally viable for being contacted are >>> still looked at, but the difference with this patch is that it checks >>> the viability before shipping it off to add_notification(), which does >>> fix issue 98 but at the expense of quite a lot of code duplication. >> >> normally, all contacts would have been added to the notification_list in >> memory, even those not actually passing the viability checks. but at >> this stage of the code, nobody is aware of that so the list gets >> populated either way by calling add_notification(). >> >> /* add all individual contacts for this host */ >> ^^^ >> >> having that notification_list created, this remains fully linked in >> memory. let's say, you have a bunch of some 1k contacts for that >> service, and actually the alarm would hit only those in the nonworkhours >> or workhours timeperiod and only on critical, for the ops team e.g. >> so by looping through the notification_list, you will encounter *all* >> contacts for that host, only the duplicates have been removed. >> >> /* notify each contact (duplicates have been removed) */ >> >> then you'll fire up the actual notification with calling >> notify_contact_of_host - and actually in there, the current core checks >> the viability for the contact to be notified. >> >> you are right, if each contact gets notified 24x7 on all >> notification_options, the algorithm stays the same. but if you happen to >> have a lot of different contacts assigned to hosts and services, not >> getting notified each time a notification is triggered, the overall >> amount of looping through notification_list will be shorter and save >> some cpu cycles, and probably on larger systems, a bit more than just >> some as this means a reduction of the looping for each contact to be >> checked to be notified on the actual end-of-the-line. >> > > Right, but all contacts are still checked for viability, so the amount > of looping is reduced once for all those who aren't viable, while the > number of viability checks (which I presume is the expensive part) will > remain the same.
from that point of view you are absolutely right. thanks for clarification. > >> furthermore, where do you get the idea of code duplication from? the >> only changes made by this patch is actually moving the viability checks >> and therefore passing an additional function parameter which makes the >> diff a bit more bloated than it should be. >> > > The fact that the patch introduces eight locations with identical code > headed with "check now if contact can be notified". > > The proper way to do this would be to introduce create_recipient_list(), > passing it all the variables it needs to produce a list of recipients > that have duplicates removed *and* are viable for being contacted. If a > lot of code still has to be duplicated (as in the patch), more helpers > in the form of add_recipient_for_service(&mac, srv, cntct) would be > nifty so the viability check can be moved there without breaking the > abi for create_notification_list_from_{host,service}(). i see. sounds like some hacky hours ;) > > I'm in the middle of a release at $dayjob so I've had to postpone that > until next week or so. I wouldn't mind if you beat me to it ;) hehe. you are not alone. i'm in the middle of release ($icinga) and preparations ($osmc) for next week, so maybe i'd just get onto some devs in nuremberg and we'll try hack together ;-) -- DI (FH) Michael Friedrich Vienna University Computer Center Universitaetsstrasse 7 A-1010 Vienna, Austria email: michael.friedr...@univie.ac.at phone: +43 1 4277 14359 mobile: +43 664 60277 14359 fax: +43 1 4277 14338 web: http://www.univie.ac.at/zid http://www.aco.net Lead Icinga Core Developer http://www.icinga.org ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null