>> Doing some testing, I notice that the polling interval doesn't seem
>> to update with the failure_interval after a montrap is received
>> causing the service to fail.
>>
>> If you manually test the service, it updates the polling interval
>> correctly, and after mon does the first check of the service after
>> the montrap is received, interval is also updated. But not when the
>> actual trap is received.
>>
>> Is this intentional, by design or a bug? I would think that the
>> wanted behaviour would be to update the interval (call
>> reset_timer()?) when a montrap is received aswell?
>
>
> Since traps are asynchronous, updating the polling interval is
> generally meaningless (or at least lacking context).
>
Thank you for the reply, and I understand you point, however, in our
setup it is useful. Perhaps a config example can help illustrate:
server side:
service somelocalcheck
interval 60m
failure_interval 5m
monitor rmon -s somelocalcheck
period wd {Sun-Sat}
alertevery 60m
alert mainalertscript
alertafter 5
on the agent:
service somelocalcheck
interval 1m
monitor somelocalcheck
period wd {Sun-Sat}
alertevery 60m
alert remote.alert -P 2583 -H server
upalert remote.alert -P 2583 -H server
This way the local agents runs the monitors and check
something(TM) often, and sends a montrap when i detects
a problem.
On the server side however, the check works as a heartbeat.
Checking if the localservice is still alive. But this is
only performed once every hour.
Since it's the server that triggers the emails/sms alerts,
it is useful in this setup to have the first montrap reset
the interval and make it start using the failure_interval.
Not sure if this makes it easier to understand why prefer
to have this behaviour.
Anders Synstad
Basefarm AS
_______________________________________________
mon mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/mon