Re: monit not catching failed ping test

mart...@tildeslash.com Fri, 08 Mar 2019 13:05:36 -0800

The interval between checks is 120 seconds => it can take up to ~2 minutes to 
detect error with this settings.


You can lower the interval to for example 5 seconds for faster error detection.

Best regards,
Martin


> On 8 Mar 2019, at 22:00, Fant, Andrew (NIH/NIDA) [E] <andrew.f...@nih.gov> 
> wrote:
> 
> In the monitrc file, I have:
>  
> set daemon   120
>  
> As for the monit -vi output, it has 22 remote host checks in total.  A 
> shortened, anonymized copy of it is:
>  
> Adding 'allow localhost' -- host resolved to [::ffff:127.0.0.1]
> Adding credentials for user 'admin'
> Runtime constants:
>  Control file       = /etc/monitrc
>  Log file           = syslog
>  Pid file           = /etc/monit/monit.pid
>  Id file            = /etc/monit/monit.id <http://monit.id/>
>  State file         = /etc/monit/monit.state
>  Debug              = True
>  Log                = True
>  Use syslog         = True
>  Is Daemon          = True
>  Use process engine = True
>  Limits             = {
>                     =   programOutput:     512 B
>                     =   sendExpectBuffer:  256 B
>                     =   fileContentBuffer: 512 B
>                     =   httpContentBuffer: 1 MB
>                     =   networkTimeout:    5 s
>                     =   programTimeout:    5 m
>                     =   stopTimeout:       30 s
>                     =   startTimeout:      30 s
>                     =   restartTimeout:    30 s
>                     = }
>  On reboot          = start
>  Poll time          = 120 seconds with start delay 0 seconds
>  Event queue        = base directory /var/monitor with 1000 slots
>  M/Monit(s)         = http://[host1.local]:8080/collector 
> <http://[host1.local]:8080/collector> with timeout 5 s with credentials
>  Start monit httpd  = True
>  httpd bind address = localhost
>  httpd portnumber   = 2812
>  httpd signature    = Enabled
>  httpd auth. style  = Basic Authentication and Host/Net allow list
>  
> The service list contains the following entries:
>  
> System Name           = host1
>  Monitoring mode      = active
>  On reboot            = start
>  
> Remote Host Name      = host2_ping
>  Address              = 192.168.1.2
>  Monitoring mode      = active
>  On reboot            = start
>  Ping                 = if failed [count 3 size 64 with timeout 5 s] then 
> alert
>  
> -------------------------------------------------------------------------------
>  
> Hopefully this will be of some use.
>  
>  
> --                                        
> Andrew Fant                      |            Systems Administrator
> andrew.f...@nih.gov <mailto:andrew.f...@nih.gov>       |      Lei Shi Lab , 
> NIH/NIDA/IRP
> (443)740-2849                   |
>  
> From: "mart...@tildeslash.com <mailto:mart...@tildeslash.com>" 
> <mart...@tildeslash.com <mailto:mart...@tildeslash.com>>
> Reply-To: This is the general mailing list for monit 
> <monit-general@nongnu.org <mailto:monit-general@nongnu.org>>
> Date: Friday, March 8, 2019 at 3:26 PM
> To: This is the general mailing list for monit <monit-general@nongnu.org 
> <mailto:monit-general@nongnu.org>>
> Subject: Re: monit not catching failed ping test
>  
> Hello, 
>  
> monit checks the service in intervals given by the "set daemon <x>" settings. 
> If the interval between checks is long or the check is blocked by some 
> service timeout/action, then the interval can be longer.
>  
> Please can you check the "set daemon" settings and run monit in debug mode?:
>  
> 1.) stop monit
> 2.) monit -vI
>  
> Best regards,
> Martin
>  
> 
> 
> On 8 Mar 2019, at 16:49, Fant, Andrew (NIH/NIDA) [E] <andrew.f...@nih.gov 
> <mailto:andrew.f...@nih.gov>> wrote:
>  
> Good morning.
>      I have a small monitoring setup with m/monit 3.7.2, using monit 5.25.2 
> as the agent.   There are a couple of systems that I cannot install monit on 
> that I still need to be aware of any downtime, so I have added them as ping 
> checks in the monitrc on the host where I installed m/monit.  Yesterday, one 
> of those remote systems went down, but monit and m/monit didn’t report an 
> alert for it and still have its status as OK.  Using anonymized information,  
> the entry in the monitrc on host1 is:
>  
> CHECK HOST host2_ping with ADDRESS 192.168.1.2
>         IF FAILED ping THEN ALERT
>  
> And from the command line on host1:
>  
> host1% monit status host2_ping
> Monit 5.25.2 uptime: 48d 19h 8m
>  
> Remote Host 'host2_ping'
>   status                       OK
>   monitoring status            Monitored
>   monitoring mode              active
>   on reboot                    start
>   ping response time           -
>   data collected               Fri, 08 Mar 2019 10:41:33
>  
> But:
>  
> host1% ping host2
> PING host2.example.org <http://host2.example.org/> (192.168.1.2) 56(84) bytes 
> of data.
> From host1.example.org <http://host1.example.org/> (192.168.1.1) icmp_seq=1 
> Destination Host Unreachable
> From host1.example.org <http://host1.example.org/> (192.168.1.1) icmp_seq=2 
> Destination Host Unreachable
> From host1.example.org <http://host1.example.org/> (192.168.1.1) icmp_seq=3 
> Destination Host Unreachable
>  
> Clearly there is a disconnect between the OS-provided ping utility and what 
> monit is seeing.   I’m sure that it’s probably a simple error in 
> configuration, but I am not seeing what I did wrong.   Can someone please set 
> me on the correct path?
>  
> Thank you
>  
> --                                        
> Andrew Fant                      |            Systems Administrator
> andrew.f...@nih.gov <mailto:andrew.f...@nih.gov>       |      Lei Shi Lab , 
> NIH/NIDA/IRP
> (443)740-2849                   |
> -- 
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general 
> <https://lists.nongnu.org/mailman/listinfo/monit-general>
>  
> -- 
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general 
> <https://lists.nongnu.org/mailman/listinfo/monit-general>

-- 
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: monit not catching failed ping test

Reply via email to