Re: SLA monitoring and reporting to customers

2007-03-19 Thread william(at)elan.net



On Sun, 18 Mar 2007, Rubens Kuhl Jr. wrote:




 What open-source or low-budget tools are operators using for SLA
 monitoring when the reports (current state and historical) should be
 available to customers ?

Please define SLA in terms of monitoring.


- 99.x% availability (defined by packet loss and response time) monthly
- A certain number of hours from service interruption to service recovery


So what you're looking for is a number for a monthly report to be 
calculated based on known downtime as measured by monitoring software.



 Looking at NANOG archives, NAGIOS is the most prevalent tool, but its
 authorization mechanisms are somewhat below I would like so customers
 could not change anything both in configuration and in SLA software
 state

You can setup so that customer only sees the data on status of the
services he or she has access to by adding customer into as a contact
for host or services.


There are 2 main issues on my reading of
http://nagios.sourceforge.net/docs/2_0/cgiauth.html
- Users can issue commands for hosts/services they are contact for.
They could acknowledge an outage even when we should know about it.


If they acknowledge an outage you'll know about it (acknowledgement
notification). I also don't necessarily see it as bad that user for
some service to acknowledge that certain service (say HTTP) that you
monitore is down and tells that they purposely took apache down.

But I guess what you're asking for is additional permission list for 
nagios users for view-only access...



- Some devices of interest to a customer are not specific to a
customer: a switch, a router. If they are considered contact for such
devices, they can issue commands for it.


Depends on how you set it up. The setup that I use is that each
router  switch port is separate service and can have separate
list of associated users and they will see no other data about
the switch or issue commands for anything other then that switch.


Do you think that your customers should or
should not have such access to your central nagios system?


That's something I woud like to hear opinions on, but even with NAGIOS
such an issue could be solved by having one NOC-only NAGIOS and one
customers-only NAGIOS. Using NagiosQL would be probably make
replication easier.


Yes that can be done. But maintaining separate parallel systems is
actually a pain. I also would like to hear options on if more complex
user permission systems is good to have for nagios web interface
and if so what those permissions should be.


 I'm looking for something more like Cacti, where customers can be
 contained to only see some of the generated graphs.

Would you be satisfied with graphing extension to nagios that is
tied replicates nagios security mechanism where customer can see
graphs for the service he/she is listed as contact for?


Is it http://nagiosgraph.sourceforge.net/ ? Can a user be a
nagiosgraph contact without being a NAGIOS contact ?


I'm actually asking because I wrote my own web interface (see 
ngraph.cgi at http://www.elan.net/~william/nagios/) originally

for nagiosgrapher but it is now being decoupled from particular
graphing package and I plan to have it support multiple nagios
data collection  backend systems.

The next step on TODO list is user access  authenication which
is supposed to replicate how nagios itself does it by allowing
only authenticated users who are contacts for the service to see
the graphs, BUT you do have opportunity here to tell what else
such interface should support as far as user access rights control.
(BTW, the current cgi does support specifying users who would have
access to graphs but not nagios itself - however user would have
access to see all graphs then...)

--
William Leibzon
Elan Networks
[EMAIL PROTECTED]


Re: SLA monitoring and reporting to customers

2007-03-19 Thread william(at)elan.net



On Sun, 18 Mar 2007, virendra rode // wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

william(at)elan.net wrote:



On Sun, 18 Mar 2007, Rubens Kuhl Jr. wrote:


What open-source or low-budget tools are operators using for SLA
monitoring when the reports (current state and historical) should be
available to customers ?


Please define SLA in terms of monitoring.

- ---
I would say,

- - availability


OK - network connection up or UP/DOWN with list of when its down
and for how long and SLA based on amount of time its been down
or more commonly time_up/time_down*100


- - response time / latency


ok ping latency graph for user view with SLA based on maximum
average latency over given time period


- - utilization


How is that part of SLA? Or do you mean you gurantee that
your own upstream network connection would not be overutilized?


- - accuracy and errors


accuracy of what? what type of errors, packet drops?


- - five nines, six nines , take your pick and define your own holy grail.


$ echo 60*24*365*(1-0.9) | bc -l
5.25600

You wish to tell me you guarantee network connection to customer to
be down for no more then 5 minutes during the year? Yeh, right :)
(but don't let me discourage any of you in trying to achieve it!)

--
William Leibzon
Elan Networks
[EMAIL PROTECTED]


Re: SLA monitoring and reporting to customers

2007-03-19 Thread virendra rode //

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

william(at)elan.net wrote:
 
 On Sun, 18 Mar 2007, virendra rode // wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 william(at)elan.net wrote:


 On Sun, 18 Mar 2007, Rubens Kuhl Jr. wrote:

 What open-source or low-budget tools are operators using for SLA
 monitoring when the reports (current state and historical) should be
 available to customers ?

 Please define SLA in terms of monitoring.
 - ---
 I would say,

 - - availability
 
 OK - network connection up or UP/DOWN with list of when its down
 and for how long and SLA based on amount of time its been down
 or more commonly time_up/time_down*100
 
 - - response time / latency
 
 ok ping latency graph for user view with SLA based on maximum
 average latency over given time period
 
 - - utilization
 
 How is that part of SLA? Or do you mean you gurantee that
 your own upstream network connection would not be overutilized?
- -
When an object exceeds a specified threshold (e.g. cpu, interface,
temperature, routing table, etc) which could cause it to be unavailable
triggering an event.


 
 - - accuracy and errors
 
 accuracy of what? what type of errors, packet drops?
- --
availability and reachability because we care about of uptime, correct?


 
 - - five nines, six nines , take your pick and define your own holy
 grail.
 
 $ echo 60*24*365*(1-0.9) | bc -l
 5.25600
 
 You wish to tell me you guarantee network connection to customer to
 be down for no more then 5 minutes during the year? Yeh, right :)
 (but don't let me discourage any of you in trying to achieve it!)


regards,
/virendra



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/rTYpbZvCIJx1bcRAh5vAJ91QWFjQ19jPrB/uzd+eZ8GSztvQACfV4vq
LOT5Mf8E/1jG729NrgY8QKw=
=zIg8
-END PGP SIGNATURE-


RE: SLA monitoring and reporting to customers

2007-03-19 Thread Gregori Parker

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
william(at)elan.net
Sent: Monday, March 19, 2007 3:20 AM
To: virendra rode //
Cc: Rubens Kuhl Jr.; NANOG list
Subject: Re: SLA monitoring and reporting to customers

 How is that part of SLA? Or do you mean you gurantee that
 your own upstream network connection would not be overutilized?
 ...
 accuracy of what? what type of errors, packet drops?

SLA's are simply contracts that two parties negotiate...any number of
metrics can be decided upon as the 'agreed level of service'.  Different
kinds of providers obviously have different metrics that are important -
while the availability of services is pretty ubiquitous, accuracy and
utilization might not make sense for an ISP SLA...then again they're
often deal-breakers for an ASP SLA.

 You wish to tell me you guarantee network connection to customer to
 be down for no more then 5 minutes during the year? Yeh, right :)
 (but don't let me discourage any of you in trying to achieve it!)

Maintenance Windows / Planned Downtime are nearly always present and
defined in an SLA, and should be excluded from the calculation of x
number of 9's.  Furthermore, all SLAs I've come across also include
'Emergency Windows' which can happen anytime given a pre-determined
amount of forewarning.  Limits of duration and frequency of these
windows should obviously be agreed upon in any good SLA.  Bottom line:
it's good practice for an SLA to define exactly what metrics are being
used, who is measuring them (read: third party, i.e. Keynote), how they
are measuring them (software/tools), what constitute a violation and
what the recompense should be.

- Gregori



SLA monitoring and reporting to customers

2007-03-18 Thread Rubens Kuhl Jr.


What open-source or low-budget tools are operators using for SLA
monitoring when the reports (current state and historical) should be
available to customers ?

Looking at NANOG archives, NAGIOS is the most prevalent tool, but its
authorization mechanisms are somewhat below I would like so customers
could not change anything both in configuration and in SLA software
state.

I'm looking for something more like Cacti, where customers can be
contained to only see some of the generated graphs.

Thanks for any input,
Rubens


Re: SLA monitoring and reporting to customers

2007-03-18 Thread william(at)elan.net



On Sun, 18 Mar 2007, Rubens Kuhl Jr. wrote:


What open-source or low-budget tools are operators using for SLA
monitoring when the reports (current state and historical) should be
available to customers ?


Please define SLA in terms of monitoring.


Looking at NANOG archives, NAGIOS is the most prevalent tool, but its
authorization mechanisms are somewhat below I would like so customers
could not change anything both in configuration and in SLA software
state


You can setup so that customer only sees the data on status of the 
services he or she has access to by adding customer into as a contact

for host or services. Do you think that your customers should or
should not have such access to your central nagios system?


I'm looking for something more like Cacti, where customers can be
contained to only see some of the generated graphs.


Would you be satisfied with graphing extension to nagios that is
tied replicates nagios security mechanism where customer can see
graphs for the service he/she is listed as contact for?

--
William Leibzon
Elan Networks
[EMAIL PROTECTED]


RE: SLA monitoring and reporting to customers

2007-03-18 Thread Ray Burkholder

 
 What open-source or low-budget tools are operators using for 
 SLA monitoring when the reports (current state and 
 historical) should be available to customers ?
 

Here is one way to do it on the cheap.

I have worked with Cricket and genDevConfig extensively.  genDevConfig will
scan a router and automatically create the cricket SNMP commands to pull the
IP SLA statistics out, or what ever other statistics in which you are
interested.  This scanning parameters are stored in a cricket config file
and the data in rrd files.

A custom Perl script, with or without some Mason templating could be used
along with a connection to a backend Postgresql database for user
authentication.

It should be relatively easy to create two tables:  
  a) userid, username or email, password
  b) userid, router, interface/id for sla

Then that data can be used in a Perl script to generate a page of customer
specific graphs in a user-authenticated web site.

Ray.



-- 
Scanned for viruses and dangerous content at 
http://www.oneunified.net and is believed to be clean.



Re: SLA monitoring and reporting to customers

2007-03-18 Thread virendra rode //

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ray Burkholder wrote:
 What open-source or low-budget tools are operators using for 
 SLA monitoring when the reports (current state and 
 historical) should be available to customers ?

 
 Here is one way to do it on the cheap.
 
 I have worked with Cricket and genDevConfig extensively.  genDevConfig will
 scan a router and automatically create the cricket SNMP commands to pull the
 IP SLA statistics out, or what ever other statistics in which you are
 interested.  This scanning parameters are stored in a cricket config file
 and the data in rrd files.
 
 A custom Perl script, with or without some Mason templating could be used
 along with a connection to a backend Postgresql database for user
 authentication.
 
 It should be relatively easy to create two tables:  
   a) userid, username or email, password
   b) userid, router, interface/id for sla
 
 Then that data can be used in a Perl script to generate a page of customer
 specific graphs in a user-authenticated web site.
 
 Ray.
- 
Generally if you are responsible for meeting a SLA, one has to take
outages into account.



regards,
/virendra


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/hHlpbZvCIJx1bcRAgYxAKC14AbDi47oVrMkE73XgUpY+PTBPgCfQiNZ
OW5X3VjTPh71qtcq38ou8cM=
=imML
-END PGP SIGNATURE-


Re: SLA monitoring and reporting to customers

2007-03-18 Thread virendra rode //

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

william(at)elan.net wrote:
 
 
 On Sun, 18 Mar 2007, Rubens Kuhl Jr. wrote:
 
 What open-source or low-budget tools are operators using for SLA
 monitoring when the reports (current state and historical) should be
 available to customers ?
 
 Please define SLA in terms of monitoring.
- ---
I would say,

- - availability
- - response time / latency
- - utilization
- - accuracy and errors
- - five nines, six nines , take your pick and define your own holy grail.

 
 Looking at NANOG archives, NAGIOS is the most prevalent tool, but its
 authorization mechanisms are somewhat below I would like so customers
 could not change anything both in configuration and in SLA software
 state
 
 You can setup so that customer only sees the data on status of the
 services he or she has access to by adding customer into as a contact
 for host or services. Do you think that your customers should or
 should not have such access to your central nagios system?
- ---
correct, one can define user privilege mode as to what can be drilled into



regards,
/virendra


 
 I'm looking for something more like Cacti, where customers can be
 contained to only see some of the generated graphs.
 
 Would you be satisfied with graphing extension to nagios that is
 tied replicates nagios security mechanism where customer can see
 graphs for the service he/she is listed as contact for?
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/hQDpbZvCIJx1bcRAmp4AKCKzbeGbI5de5jAmdKtRFvgxTNQFACcDbjt
O/+7R16CnaezvKeVpTzy9jY=
=cL7B
-END PGP SIGNATURE-