Faidon Liambotis has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/97008


Change subject: Add check_graphite & a reqstats 5xx check
......................................................................

Add check_graphite & a reqstats 5xx check

There are a bazillion check_graphite checks out there, but this one from
Disqus seems to be the most reasonable, well-written and in a nice
enough language we're familiar with (Python). Import it in our
repository from https://github.com/disqus/nagios-plugins/ and its
accompanying LICENSE file.

Use it to provision a check for reqstats_5xx. The monitor_service and
associated configs suck at the moment, so this means writing a new
"check_reqstats_5xx" inflexible command and us needing to add such
commands for every graphite we need to check, but such is life.

Use a time window of 2 hours, warn at 250 req/min and critical at 500
req/min. Start with that and let's tune it as we go -- if it works out
nicely, we could even make it a paging check along the way.

Change-Id: I433dc48dc18ba07a25dc2348526103d621e0f2d8
---
A files/icinga/LICENSE.check_graphite
A files/icinga/check_graphite
M manifests/misc/icinga.pp
M manifests/role/graphite.pp
M templates/icinga/checkcommands.cfg.erb
5 files changed, 502 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/08/97008/1

diff --git a/files/icinga/LICENSE.check_graphite 
b/files/icinga/LICENSE.check_graphite
new file mode 100644
index 0000000..a991377
--- /dev/null
+++ b/files/icinga/LICENSE.check_graphite
@@ -0,0 +1,202 @@
+
+                              Apache License
+                        Version 2.0, January 2004
+                     http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+   "License" shall mean the terms and conditions for use, reproduction,
+   and distribution as defined by Sections 1 through 9 of this document.
+
+   "Licensor" shall mean the copyright owner or entity authorized by
+   the copyright owner that is granting the License.
+
+   "Legal Entity" shall mean the union of the acting entity and all
+   other entities that control, are controlled by, or are under common
+   control with that entity. For the purposes of this definition,
+   "control" means (i) the power, direct or indirect, to cause the
+   direction or management of such entity, whether by contract or
+   otherwise, or (ii) ownership of fifty percent (50%) or more of the
+   outstanding shares, or (iii) beneficial ownership of such entity.
+
+   "You" (or "Your") shall mean an individual or Legal Entity
+   exercising permissions granted by this License.
+
+   "Source" form shall mean the preferred form for making modifications,
+   including but not limited to software source code, documentation
+   source, and configuration files.
+
+   "Object" form shall mean any form resulting from mechanical
+   transformation or translation of a Source form, including but
+   not limited to compiled object code, generated documentation,
+   and conversions to other media types.
+
+   "Work" shall mean the work of authorship, whether in Source or
+   Object form, made available under the License, as indicated by a
+   copyright notice that is included in or attached to the work
+   (an example is provided in the Appendix below).
+
+   "Derivative Works" shall mean any work, whether in Source or Object
+   form, that is based on (or derived from) the Work and for which the
+   editorial revisions, annotations, elaborations, or other modifications
+   represent, as a whole, an original work of authorship. For the purposes
+   of this License, Derivative Works shall not include works that remain
+   separable from, or merely link (or bind by name) to the interfaces of,
+   the Work and Derivative Works thereof.
+
+   "Contribution" shall mean any work of authorship, including
+   the original version of the Work and any modifications or additions
+   to that Work or Derivative Works thereof, that is intentionally
+   submitted to Licensor for inclusion in the Work by the copyright owner
+   or by an individual or Legal Entity authorized to submit on behalf of
+   the copyright owner. For the purposes of this definition, "submitted"
+   means any form of electronic, verbal, or written communication sent
+   to the Licensor or its representatives, including but not limited to
+   communication on electronic mailing lists, source code control systems,
+   and issue tracking systems that are managed by, or on behalf of, the
+   Licensor for the purpose of discussing and improving the Work, but
+   excluding communication that is conspicuously marked or otherwise
+   designated in writing by the copyright owner as "Not a Contribution."
+
+   "Contributor" shall mean Licensor and any individual or Legal Entity
+   on behalf of whom a Contribution has been received by Licensor and
+   subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   copyright license to reproduce, prepare Derivative Works of,
+   publicly display, publicly perform, sublicense, and distribute the
+   Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   (except as stated in this section) patent license to make, have made,
+   use, offer to sell, sell, import, and otherwise transfer the Work,
+   where such license applies only to those patent claims licensable
+   by such Contributor that are necessarily infringed by their
+   Contribution(s) alone or by combination of their Contribution(s)
+   with the Work to which such Contribution(s) was submitted. If You
+   institute patent litigation against any entity (including a
+   cross-claim or counterclaim in a lawsuit) alleging that the Work
+   or a Contribution incorporated within the Work constitutes direct
+   or contributory patent infringement, then any patent licenses
+   granted to You under this License for that Work shall terminate
+   as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the
+   Work or Derivative Works thereof in any medium, with or without
+   modifications, and in Source or Object form, provided that You
+   meet the following conditions:
+
+   (a) You must give any other recipients of the Work or
+       Derivative Works a copy of this License; and
+
+   (b) You must cause any modified files to carry prominent notices
+       stating that You changed the files; and
+
+   (c) You must retain, in the Source form of any Derivative Works
+       that You distribute, all copyright, patent, trademark, and
+       attribution notices from the Source form of the Work,
+       excluding those notices that do not pertain to any part of
+       the Derivative Works; and
+
+   (d) If the Work includes a "NOTICE" text file as part of its
+       distribution, then any Derivative Works that You distribute must
+       include a readable copy of the attribution notices contained
+       within such NOTICE file, excluding those notices that do not
+       pertain to any part of the Derivative Works, in at least one
+       of the following places: within a NOTICE text file distributed
+       as part of the Derivative Works; within the Source form or
+       documentation, if provided along with the Derivative Works; or,
+       within a display generated by the Derivative Works, if and
+       wherever such third-party notices normally appear. The contents
+       of the NOTICE file are for informational purposes only and
+       do not modify the License. You may add Your own attribution
+       notices within Derivative Works that You distribute, alongside
+       or as an addendum to the NOTICE text from the Work, provided
+       that such additional attribution notices cannot be construed
+       as modifying the License.
+
+   You may add Your own copyright statement to Your modifications and
+   may provide additional or different license terms and conditions
+   for use, reproduction, or distribution of Your modifications, or
+   for any such Derivative Works as a whole, provided Your use,
+   reproduction, and distribution of the Work otherwise complies with
+   the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise,
+   any Contribution intentionally submitted for inclusion in the Work
+   by You to the Licensor shall be under the terms and conditions of
+   this License, without any additional terms or conditions.
+   Notwithstanding the above, nothing herein shall supersede or modify
+   the terms of any separate license agreement you may have executed
+   with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade
+   names, trademarks, service marks, or product names of the Licensor,
+   except as required for reasonable and customary use in describing the
+   origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or
+   agreed to in writing, Licensor provides the Work (and each
+   Contributor provides its Contributions) on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+   implied, including, without limitation, any warranties or conditions
+   of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+   PARTICULAR PURPOSE. You are solely responsible for determining the
+   appropriateness of using or redistributing the Work and assume any
+   risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory,
+   whether in tort (including negligence), contract, or otherwise,
+   unless required by applicable law (such as deliberate and grossly
+   negligent acts) or agreed to in writing, shall any Contributor be
+   liable to You for damages, including any direct, indirect, special,
+   incidental, or consequential damages of any character arising as a
+   result of this License or out of the use or inability to use the
+   Work (including but not limited to damages for loss of goodwill,
+   work stoppage, computer failure or malfunction, or any and all
+   other commercial damages or losses), even if such Contributor
+   has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing
+   the Work or Derivative Works thereof, You may choose to offer,
+   and charge a fee for, acceptance of support, warranty, indemnity,
+   or other liability obligations and/or rights consistent with this
+   License. However, in accepting such obligations, You may act only
+   on Your own behalf and on Your sole responsibility, not on behalf
+   of any other Contributor, and only if You agree to indemnify,
+   defend, and hold each Contributor harmless for any liability
+   incurred by, or claims asserted against, such Contributor by reason
+   of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+APPENDIX: How to apply the Apache License to your work.
+
+   To apply the Apache License to your work, attach the following
+   boilerplate notice, with the fields enclosed by brackets "[]"
+   replaced with your own identifying information. (Don't include
+   the brackets!)  The text should be enclosed in the appropriate
+   comment syntax for the file format. We also recommend that a
+   file or class name and description of purpose be included on the
+   same "printed page" as the copyright notice for easier
+   identification within third-party archives.
+
+Copyright 2011 DISQUS
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/files/icinga/check_graphite b/files/icinga/check_graphite
new file mode 100644
index 0000000..bedb7c1
--- /dev/null
+++ b/files/icinga/check_graphite
@@ -0,0 +1,286 @@
+#!/usr/bin/env python
+"""
+check_graphite.py
+~~~~~~~
+
+:copyright: (c) 2012 DISQUS.
+:license: Apache License 2.0, see LICENSE for more details.
+"""
+
+import json
+import optparse
+import urllib
+import urllib2
+import sys
+
+from numbers import Real
+
+NAGIOS_STATUSES = {
+    'OK': 0,
+    'WARNING': 1,
+    'CRITICAL': 2,
+    'UNKNOWN': 3
+}
+
+class Graphite(object):
+
+    def __init__(self, url, targets, _from, _until):
+        self.url = url.rstrip('/')
+        self.targets = targets
+        self._from = _from
+        self._until = _until
+        params = [('target', t) for t in self.targets] +\
+            [('from', self._from)] +\
+            [('until', self._until)] +\
+            [('format', 'json')]
+        self.full_url = self.url + '/render?' +\
+            urllib.urlencode(params)
+
+    def check_datapoints(self, datapoints, check_func, **kwargs):
+        """Find alerting datapoints
+
+        Args:
+            datapoints (list): The list of datapoints to check
+
+        Kwargs:
+            check_func (function): The function to find out of bounds 
datapoints
+            bounds (list): Compare against `datapoints` to find out of bounds 
list
+            compare (list): Used for comparison if `datapoints` is out of 
bounds
+            threshold (float): `check_func` is called for each datapoint 
against `threshold`
+            beyond (float): Return datapoint if `beyond` value in bounds list 
(percentage).
+
+        Returns:
+            The list of out of bounds datapoints
+        """
+        if 'threshold' in kwargs:
+            return [x for x in datapoints if isinstance(x, Real) and 
check_func(x, kwargs['threshold'])]
+        elif 'bounds' in kwargs:
+            if 'compare' in kwargs:
+              return [datapoints[x] for x in xrange(len(datapoints)) if 
all([datapoints[x], kwargs['bounds'][x], kwargs['compare'][x]]) and 
check_func(datapoints[x] / kwargs['bounds'][x], kwargs['beyond']) and 
check_func(datapoints[x], kwargs['compare'][x])]
+            else:
+                return [datapoints[x] for x in xrange(len(datapoints)) if 
all([datapoints[x], kwargs['bounds'][x]]) and check_func(datapoints[x], 
kwargs['bounds'][x])]
+
+    def fetch_metrics(self):
+        try:
+            response = urllib2.urlopen(self.full_url)
+
+            if response.code != 200:
+                return None
+            else:
+                return json.loads(response.read())
+        except urllib2.URLError, TypeError:
+            return None
+
+    def generate_output(self, datapoints, *args, **kwargs):
+        """Generate check output
+
+        Args:
+            datapoints (list): The list of datapoints to check
+            warn_oob (list): Optional list of datapoints considered in warning 
state
+            crit_oob (list): Mandatory list of datapoints considered in 
warning state
+
+        Kwargs:
+            count (int): Number of metrics that would generate an alert
+            warning (float): The check's warning threshold
+            critical (float): The check's critical threshold
+            target (str): The target for `datapoints`
+
+        Returns:
+            A dictionary of datapoints grouped by their status ('CRITICAL', 
'WARNING', 'OK')
+        """
+        check_output = dict(OK=[], WARNING=[], CRITICAL=[])
+        count = kwargs['count']
+        warning = kwargs.get('warning', 0)
+        critical = kwargs.get('critical', 0)
+        target = kwargs.get('target', 'timeseries')
+
+        if len(args) > 1:
+            (warn_oob, crit_oob) = args
+        else:
+            crit_oob = [x for x in args[0] if isinstance(x, Real)]
+            warn_oob = []
+
+        if self.has_numbers(crit_oob) and len(crit_oob) >= count:
+            check_output['CRITICAL'].append('%s [crit=%f|datapoints=%s]' %\
+                (target, critical, ','.join(['%s' % str(x) for x in 
crit_oob])))
+        elif self.has_numbers(warn_oob) and len(warn_oob) >= count:
+            check_output['WARNING'].append('%s [warn=%f|datapoints=%s]' %\
+                (target, warning, ','.join(['%s' % str(x) for x in warn_oob])))
+        else:
+            check_output['OK'].append('%s [warn=%0.3f|crit=%f|datapoints=%s]' 
%\
+                (target, warning, critical, ','.join(['%s' % str(x) for x in 
datapoints])))
+
+        return check_output
+
+    def has_numbers(self, lst):
+        try:
+            return any([isinstance(x, Real) for x in lst])
+        except TypeError:
+            return False
+
+
+if __name__ == '__main__':
+    parser = optparse.OptionParser()
+    parser.add_option('-U', '--graphite-url', dest='graphite_url',
+                      default='http://localhost/',
+                      metavar='URL',
+                      help='Graphite URL [%default]')
+    parser.add_option('-t', '--target', dest='target',
+                      action='append',
+                      help='Target to check')
+    parser.add_option('--compare', dest='compare',
+                      metavar='SERIES',
+                      help='Compare TARGET against SERIES')
+    parser.add_option('--from', dest='_from',
+                      help='From timestamp/date')
+    parser.add_option('--until', dest='_until',
+                      default='now',
+                      help='Until timestamp/date [%default]')
+    parser.add_option('-c', '--count', dest='count',
+                      default=0,
+                      type='int',
+                      help='Alert on at least COUNT metrics [%default]')
+    parser.add_option('--beyond', dest='beyond',
+                      default=0.7,
+                      type='float',
+                      help='Alert if metric is PERCENTAGE beyond comparison 
value [%default]')
+    parser.add_option('--percentile', dest='percentile',
+                      default=0,
+                      type='int',
+                      metavar='PERCENT',
+                      help='Use nPercentile Graphite function on the target 
(returns one datapoint)')
+    parser.add_option('--empty-ok', dest='empty_ok',
+                      default=False,
+                      action='store_true',
+                      help='Empty data from Graphite is OK')
+    parser.add_option('--confidence', dest='confidence_bands',
+                      default=False,
+                      action='store_true',
+                      help='Use holtWintersConfidenceBands Graphite function 
on the target')
+    parser.add_option('--over', dest='over',
+                      default=True,
+                      action='store_true',
+                      help='Over specified WARNING or CRITICAL threshold 
[%default]')
+    parser.add_option('--under', dest='under',
+                      default=False,
+                      action='store_true',
+                      help='Under specified WARNING or CRITICAL threshold 
[%default]')
+    parser.add_option('-W', dest='warning',
+                      type='float',
+                      metavar='VALUE',
+                      help='Warning if datapoints beyond VALUE')
+    parser.add_option('-C', dest='critical',
+                      type='float',
+                      metavar='VALUE',
+                      help='Critical if datapoints beyond VALUE')
+
+    (options, args) = parser.parse_args()
+
+    if not all([getattr(options, option) for option in ('_from', 'target')]):
+        parser.print_help()
+        sys.exit(NAGIOS_STATUSES['UNKNOWN'])
+
+    real_from = options._from
+
+    if options.under:
+        check_func = lambda x, y: x < y
+        options.over = False
+    else:
+        check_func = lambda x, y: x > y
+
+    if options.confidence_bands:
+        targets = [options.target[0], 'holtWintersConfidenceBands(%s)' % 
options.target[0]]
+        check_threshold = None
+        from_slice = int(options._from) * -1
+        real_from = '-2w'
+
+        if options.compare:
+            targets.append(options.compare)
+    else:
+        if not all([getattr(options, option) for option in ('critical', 
'warning')]):
+            parser.print_help()
+            sys.exit(NAGIOS_STATUSES['UNKNOWN'])
+
+        if options.percentile:
+            targets = ['nPercentile(%s, %d)' % (options.target[0], 
options.percentile)]
+        else:
+            targets = options.target
+
+        try:
+            warn = float(options.warning)
+            crit = float(options.critical)
+        except ValueError:
+            print 'ERROR: WARNING or CRITICAL threshold is not a number\n'
+            parser.print_help()
+            sys.exit(NAGIOS_STATUSES['UNKNOWN'])
+
+    check_output = {}
+    graphite = Graphite(options.graphite_url, targets, real_from, 
options._until)
+    metric_data = graphite.fetch_metrics()
+
+    if metric_data:
+        if options.confidence_bands:
+            actual = [x[0] for x in metric_data[0].get('datapoints', 
[])][from_slice:]
+            target_name = metric_data[0]['target']
+            kwargs = {}
+            kwargs['beyond'] = options.beyond
+
+            if options.over:
+                kwargs['bounds'] = [x[0] for x in 
metric_data[1].get('datapoints', [])][from_slice:]
+            elif options.under:
+                kwargs['bounds'] = [x[0] for x in 
metric_data[2].get('datapoints', [])][from_slice:]
+
+            if options.compare:
+                kwargs['compare'] = [x[0] for x in 
metric_data[3].get('datapoints', [])][from_slice:]
+
+                if not graphite.has_numbers(kwargs['compare']):
+                    print 'CRITICAL: No compare target output from Graphite!'
+                    sys.exit(NAGIOS_STATUSES['CRITICAL'])
+
+            if graphite.has_numbers(actual) and 
graphite.has_numbers(kwargs['bounds']):
+                points_oob = graphite.check_datapoints(actual, check_func, 
**kwargs)
+                check_output[target_name] = graphite.generate_output(actual,
+                                                                     
points_oob,
+                                                                     
count=options.count,
+                                                                     
target=target_name)
+
+            else:
+                print 'CRITICAL: No output from Graphite for target(s): %s' % 
', '.join(targets)
+                sys.exit(NAGIOS_STATUSES['CRITICAL'])
+        else:
+            for target in metric_data:
+                datapoints = [x[0] for x in target.get('datapoints', []) if 
isinstance(x[0], Real)]
+                if not graphite.has_numbers(datapoints) and not 
options.empty_ok:
+                    print 'CRITICAL: No output from Graphite for target(s): 
%s' % ', '.join(targets)
+                    sys.exit(NAGIOS_STATUSES['CRITICAL'])
+
+                crit_oob = graphite.check_datapoints(datapoints, check_func, 
threshold=crit)
+                warn_oob = graphite.check_datapoints(datapoints, check_func, 
threshold=warn)
+                check_output[target['target']] = 
graphite.generate_output(datapoints,
+                                                                          
warn_oob,
+                                                                          
crit_oob,
+                                                                          
count=options.count,
+                                                                          
target=target['target'],
+                                                                          
warning=warn,
+                                                                          
critical=crit)
+    else:
+        if options.empty_ok and isinstance(metric_data, list):
+            print 'OK: No output from Graphite for target(s): %s' % ', 
'.join(targets)
+            sys.exit(NAGIOS_STATUSES['OK'])
+
+        print 'CRITICAL: No output from Graphite for target(s): %s' % ', 
'.join(targets)
+        sys.exit(NAGIOS_STATUSES['CRITICAL'])
+
+    for target, messages in check_output.iteritems():
+        if messages['CRITICAL']:
+            exit_code = NAGIOS_STATUSES['CRITICAL']
+        elif messages['WARNING']:
+            exit_code = NAGIOS_STATUSES['WARNING']
+        else:
+            exit_code = NAGIOS_STATUSES['OK']
+
+        for status_code in ['CRITICAL', 'WARNING', 'OK']:
+            if messages[status_code]:
+                print '\n'.join(['%s: %s' % (status_code, status) for status 
in messages[status_code]])
+
+    sys.exit(exit_code)
diff --git a/manifests/misc/icinga.pp b/manifests/misc/icinga.pp
index f711d70..6ddc492 100644
--- a/manifests/misc/icinga.pp
+++ b/manifests/misc/icinga.pp
@@ -591,6 +591,11 @@
       owner => 'root',
       group => 'root',
       mode => '0755';
+    '/usr/lib/nagios/plugins/check_graphite':
+      source => 'puppet:///files/icinga/check_graphite',
+      owner => 'root',
+      group => 'root',
+      mode => '0755';
   }
 
   # Include check_elasticsearch from elasticsearch module
diff --git a/manifests/role/graphite.pp b/manifests/role/graphite.pp
index 6ac57c3..000f519 100644
--- a/manifests/role/graphite.pp
+++ b/manifests/role/graphite.pp
@@ -75,4 +75,9 @@
         description  => 'Graphite Carbon',
         nrpe_command => '/sbin/carbonctl check',
     }
+
+    monitor_service { 'reqstats_5xx':
+        description   => 'Req/min 5xx',
+        check_command => 
'check_reqstats_5xx!http://graphite.wikimedia.org!-2hours!250!500',
+    }
 }
diff --git a/templates/icinga/checkcommands.cfg.erb 
b/templates/icinga/checkcommands.cfg.erb
index 4778154..f5c8dcd 100644
--- a/templates/icinga/checkcommands.cfg.erb
+++ b/templates/icinga/checkcommands.cfg.erb
@@ -600,6 +600,10 @@
        command_line    $USER1$/check_elasticsearch -H $HOSTADDRESS$
 }
 
+define command{
+       command_name    check_reqstats_5xx
+       command_line    $USER1$/check_graphite -U $ARG1$ --from $ARG2$  -t 
reqstats.5xx -W $ARG3$ -C $ARG4$
+}
 
 
 # Checks whether a host belongs to given dsh group(s)

-- 
To view, visit https://gerrit.wikimedia.org/r/97008
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I433dc48dc18ba07a25dc2348526103d621e0f2d8
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <fai...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to