Alexandros Kosiaris has submitted this change and it was merged.

Change subject: Simplify check_puppetrun.
......................................................................


Simplify check_puppetrun.

Previously it had different modes; now it just checks anything.

Now:

It reports staleness first.  If fresh, it reports compile failures.
If compile is working, it reports errors.  If no errors, it reports
time since last run.
Also differentiate between the runlockfile/adminlockfile introduced in
puppet 3.X and use symbols instead of magic numbers. Finally warn that
puppet is disabled unless told otherwise

Change-Id: I5a4439b18758a0915bc4ac6666f8f22435fb1689
---
M modules/base/files/monitoring/check_puppetrun
1 file changed, 54 insertions(+), 62 deletions(-)

Approvals:
  Andrew Bogott: Looks good to me, but someone else must approve
  Alexandros Kosiaris: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/modules/base/files/monitoring/check_puppetrun 
b/modules/base/files/monitoring/check_puppetrun
index d16a6f3..efc159e 100755
--- a/modules/base/files/monitoring/check_puppetrun
+++ b/modules/base/files/monitoring/check_puppetrun
@@ -12,7 +12,8 @@
 require 'optparse'
 require 'yaml'
 
-lockfile = "/var/lib/puppet/state/puppetdlock"
+runlockfile = "/var/lib/puppet/state/agent_catalog_run.lock"
+adminlockfile = "/var/lib/puppet/state/agent_disabled.lock"
 statefile = "/var/lib/puppet/state/state.yaml"
 summaryfile = "/var/lib/puppet/state/last_run_summary.yaml"
 enabled = true
@@ -23,28 +24,27 @@
 warn = 0
 crit = 0
 enabled_only = false
-failures = false
 
 opt = OptionParser.new
 
-opt.on("--critical [CRIT]", "-c", Integer, "Critical threshold, time or failed 
resources") do |f|
+opt.on("--critical [CRIT]", "-c", Integer, "Critical staleness threshold, time 
in seconds") do |f|
     crit = f.to_i
 end
 
-opt.on("--warn [WARN]", "-w", Integer, "Warning thresold, time of failed 
resources") do |f|
+opt.on("--warn [WARN]", "-w", Integer, "Warning staleness threshold, time in 
seconds") do |f|
     warn = f.to_i
-end
-
-opt.on("--check-failures", "-f", "Check for failed resources instead of time 
since run") do |f|
-    failures = true
 end
 
 opt.on("--only-enabled", "-e", "Only alert if Puppet is enabled") do |f|
     enabled_only = true
 end
 
-opt.on("--lock-file [FILE]", "-l", "Location of the lock file, default 
#{lockfile}") do |f|
-    lockfile = f
+opt.on("--runlock-file [FILE]", "-l", "Location of the run lock file, default 
#{runlockfile}") do |f|
+    runlockfile = f
+end
+
+opt.on("--adminlock-file [FILE]", "-a", "Location of the admin lock file, 
default #{adminlockfile}") do |f|
+    adminlockfile = f
 end
 
 opt.on("--state-file [FILE]", "-t", "Location of the state file, default 
#{statefile}") do |f|
@@ -62,12 +62,12 @@
     exit 3
 end
 
-if File.exists?(lockfile)
-    if File::Stat.new(lockfile).zero?
+if File.exists?(adminlockfile)
        enabled = false
-    else
+end
+
+if File.exists?(runlockfile)
        running = true
-    end
 end
 
 lastrun = File.stat(statefile).mtime.to_i if File.exists?(statefile)
@@ -81,63 +81,55 @@
         # are treated as huge failures. The yaml file will be valid but
         # it wont have anything but last_run in it
         unless summary.include?("events")
-            failcount = 99
+            failcount = :failed
         else
             # and unless there are failures, the events hash just wont have 
the failure count
             failcount = summary["events"]["failure"] || 0
         end
     rescue
-        failcount = 0
+        failcount = :unknown
         summary = nil
     end
+else
+    failcount = :nostatefile
 end
 
 time_since_last_run = Time.now.to_i - lastrun
 
-unless failures
-    if enabled_only && enabled == false
-        puts "OK: Puppet is currently disabled, not alerting. Last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
-        exit 0
-    end
-
-    if time_since_last_run >= crit
-        puts "CRITICAL: Puppet last ran #{time_since_last_run} seconds ago, 
expected < #{crit}"
-        exit 2
-
-    elsif time_since_last_run >= warn
-        puts "WARNING: Puppet last ran #{time_since_last_run} seconds ago, 
expected < #{warn}"
-        exit 1
-
-    else
-        if enabled
-            puts "OK: Puppet is currently enabled, last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
-        else
-            puts "OK: Puppet is currently disabled, last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
-        end
-
-        exit 0
-    end
-else
-    if enabled_only && enabled == false
-        puts "OK: Puppet is currently disabled, not alerting. Last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
-        exit 0
-    end
-
-    if failcount >= crit
-        puts "CRITICAL: Puppet last ran had #{failcount} failures, expected < 
#{crit}"
-        exit 2
-
-    elsif failcount >= warn
-        puts "WARNING: Puppet last ran had #{failcount} failures, expected < 
#{warn}"
-        exit 1
-
-    else
-        if enabled
-            puts "OK: Puppet is currently enabled, last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
-        else
-            puts "OK: Puppet is currently disabled, last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
-        end
-
-        exit 0
-    end
+if enabled_only && enabled == false
+    puts "OK: Puppet is currently disabled, not alerting. Last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
+    exit 0
 end
+
+if not enabled
+    puts "WARNING: Puppet is currently disabled, last run 
#{time_since_last_run} seconds ago with #{failcount} failures"
+    exit 1
+end
+
+if failcount == :failed
+    puts "CRITICAL: Complete puppet failure"
+    exit 2
+end
+
+if failcount == :unknown or failcount == :nostatefile
+    puts "UNKNOWN: Failed to check. Probably failed to read the state file"
+    exit 3
+end
+
+if failcount > 0
+    puts "CRITICAL: Puppet has ${failcount} failures"
+    exit 2
+end
+
+if time_since_last_run >= crit
+    puts "CRITICAL: Puppet last ran #{time_since_last_run} seconds ago, 
expected < #{crit}"
+    exit 2
+end
+
+if time_since_last_run >= warn
+    puts "WARNING: Puppet last ran #{time_since_last_run} seconds ago, 
expected < #{warn}"
+    exit 1
+end
+
+puts "OK: Puppet is currently enabled, last run #{time_since_last_run} seconds 
ago with #{failcount} failures"
+exit 0

-- 
To view, visit https://gerrit.wikimedia.org/r/143332
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5a4439b18758a0915bc4ac6666f8f22435fb1689
Gerrit-PatchSet: 2
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Andrew Bogott <[email protected]>
Gerrit-Reviewer: Alexandros Kosiaris <[email protected]>
Gerrit-Reviewer: Andrew Bogott <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to