>> Doing some testing, I notice that the polling interval doesn't seem  
>> to update with the failure_interval after a montrap is received  
>> causing the service to fail.
>>
>> If you manually test the service, it updates the polling interval  
>> correctly, and after mon does the first check of the service after  
>> the montrap is received, interval is also updated. But not when the  
>> actual trap is received.
>>
>> Is this intentional, by design or a bug? I would think that the  
>> wanted behaviour would be to update the interval (call  
>> reset_timer()?) when a montrap is received aswell?
>
>
> Since traps are asynchronous, updating the polling interval is  
> generally meaningless (or at least lacking context).
>

Thank you for the reply, and I understand you point, however, in our 
setup it is useful. Perhaps a config example can help illustrate:

server side:

service somelocalcheck
  interval 60m
  failure_interval 5m
  monitor rmon -s somelocalcheck
  period wd {Sun-Sat}
    alertevery 60m
    alert mainalertscript
    alertafter 5


on the agent:

service somelocalcheck
  interval 1m
  monitor somelocalcheck
  period wd {Sun-Sat}
    alertevery 60m
    alert remote.alert -P 2583 -H server
    upalert remote.alert -P 2583 -H server


This way the local agents runs the monitors and check 
something(TM) often, and sends a montrap when i detects 
a problem.

On the server side however, the check works as a heartbeat.
Checking if the localservice is still alive. But this is 
only performed once every hour.

Since it's the server that triggers the emails/sms alerts, 
it is useful in this setup to have the first montrap reset 
the interval and make it start using the failure_interval.

Not sure if this makes it easier to understand why prefer 
to have this behaviour.



Anders Synstad
Basefarm AS
_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to