Running the command from the script locally (on Mac): $ /usr/bin/snmpwalk -t 5 -Oe -Oq -Os -v 1 -c public localhost if Timeout: No Response from localhost $ echo $? 1
Looks like the script should parse the output from snmpwalk and provide some hint if unexpected result is reported. Cheers On Sat, Feb 4, 2017 at 6:40 AM, Lars George <[email protected]> wrote: > Hi, > > I tried the supplied `healthcheck.sh`, but did not have snmpd running. > That caused the script to take a long time to error out, which exceed > the 10 seconds the check was meant to run. That resets the check and > it keeps reporting the error, but never stops the servers: > > 2017-02-04 05:55:08,962 INFO > [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020] > hbase.HealthCheckChore: Health Check Chore runs every 10sec > 2017-02-04 05:55:08,975 INFO > [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020] > hbase.HealthChecker: HealthChecker initialized with script at > /opt/hbase/bin/healthcheck.sh, timeout=60000 > > ... > > 2017-02-04 05:55:50,435 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 55mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:55:50,436 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: CompactionChecker missed its start time > 2017-02-04 05:55:50,437 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: > slave-1.internal.larsgeorge.com,16020,1486216506007-MemstoreFlusherChore > missed its start time > 2017-02-04 05:55:50,438 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:56:20,522 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 20sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:56:20,523 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:56:50,600 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:56:50,600 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:57:20,681 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 20sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:57:20,681 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:57:50,763 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:57:50,764 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:58:20,844 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 20sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:58:20,844 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:58:50,923 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:58:50,923 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:59:21,017 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.HealthCheckChore: Health status at 412837hrs, 59mins, 21sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:59:21,018 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > > That seems like a bug, no? > > Lars >
