Re: [openib-general] Port error rate detection

2007-02-27 Thread Troy Benjegerdes
On Mon, Feb 19, 2007 at 03:53:36PM -0500, Steven Carter wrote:
 I have a Nagios module that alerts on connectivity, port errors, 
 speed/width problems.  I would like to give it the ability to change the 
 severity of the alert depending on whether errors are just present or if 
 they are increasing faster than a specified rate.  The intent is to 
 equip the module to keep the state of the last query and possibly 
 history, but I wanted to make sure that I was not re-inventing the wheel 
 first.  Is there an attribute or utility that I am overlooking that will 
 help me do this?

One other thing you might want to take a look at is the Fountain/Goanna
node monitoring setup... It's not really anything like the proposed
performance manager, but it might get you want you need. (And we'd like
some feedback on what it should do differently ;)

http://www.scl.ameslab.gov/Projects/Monitor/

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Port error rate detection

2007-02-20 Thread Hal Rosenstock
On Mon, 2007-02-19 at 15:53, Steven Carter wrote:
 I have a Nagios module that alerts on connectivity, port errors, 
 speed/width problems.  I would like to give it the ability to change the 
 severity of the alert depending on whether errors are just present or if 
 they are increasing faster than a specified rate.  The intent is to 
 equip the module to keep the state of the last query and possibly 
 history, but I wanted to make sure that I was not re-inventing the wheel 
 first.  Is there an attribute or utility that I am overlooking that will 
 help me do this?

Not currently (to my knowledge). The thresholding of rate aspect is
similat to what will be supported in the proposed PerfManager.

-- Hal

 Thanks,
 
 Steven.
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Port error rate detection

2007-02-20 Thread Steven Carter
Hal Rosenstock wrote:
 On Mon, 2007-02-19 at 15:53, Steven Carter wrote:
   
 I have a Nagios module that alerts on connectivity, port errors, 
 speed/width problems.  I would like to give it the ability to change the 
 severity of the alert depending on whether errors are just present or if 
 they are increasing faster than a specified rate.  The intent is to 
 equip the module to keep the state of the last query and possibly 
 history, but I wanted to make sure that I was not re-inventing the wheel 
 first.  Is there an attribute or utility that I am overlooking that will 
 help me do this?
 

 Not currently (to my knowledge). The thresholding of rate aspect is
 similat to what will be supported in the proposed PerfManager.
   
I noticed that in your RFC.  How are you planning on presenting the data 
to other agents (e.g. Nagios, Openview, MRTG, etc.)?  One comment that I 
should have made on your RFC is that I wonder if it is necessary to 
include the data analysis/reduction part.  Just having a central 
location that collects the values and presents it via SNMP is extremely 
useful since there are a plethora of monitoring apps (free and 
commercial) that  do what you are proposing.  That way, a network 
manager can leverage existing tools currently used for monitoring 
Ethernet Nodes, Hosts, etc.  You can still include a last change 
attribute with each counter so that simple utilities (like the one that 
I am writing) can get an idea of how quickly errors are occurring.

Steven.

 -- Hal

   
 Thanks,

 Steven.

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general

 

   


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Port error rate detection

2007-02-20 Thread Hal Rosenstock
On Tue, 2007-02-20 at 09:44, Steven Carter wrote:
 Hal Rosenstock wrote:
  On Mon, 2007-02-19 at 15:53, Steven Carter wrote:

  I have a Nagios module that alerts on connectivity, port errors, 
  speed/width problems.  I would like to give it the ability to change the 
  severity of the alert depending on whether errors are just present or if 
  they are increasing faster than a specified rate.  The intent is to 
  equip the module to keep the state of the last query and possibly 
  history, but I wanted to make sure that I was not re-inventing the wheel 
  first.  Is there an attribute or utility that I am overlooking that will 
  help me do this?
  
 
  Not currently (to my knowledge). The thresholding of rate aspect is
  similat to what will be supported in the proposed PerfManager.

 I noticed that in your RFC.  How are you planning on presenting the data 
 to other agents (e.g. Nagios, Openview, MRTG, etc.)?  One comment that I 
 should have made on your RFC is that I wonder if it is necessary to 
 include the data analysis/reduction part.

I think it is because there is too much data to push up the tree to one
manager.

 Just having a central location that collects the values and presents it via 
 SNMP is extremely 
 useful since there are a plethora of monitoring apps (free and 
 commercial) that  do what you are proposing.

In general, this information can be exported via SNMP or whatever the
management infrastructure is.

BTW, are there SNMP MIBs for all of this information ? To my knowledge,
some of these were started but never completed. Also, the MIBs were
geared at the agents rather than the managers (in the PerfMgt arena).

-- Hal

 That way, a network manager can leverage existing tools currently used for 
 monitoring 
 Ethernet Nodes, Hosts, etc.  You can still include a last change 
 attribute with each counter so that simple utilities (like the one that 
 I am writing) can get an idea of how quickly errors are occurring.

 Steven.
 
  -- Hal
 

  Thanks,
 
  Steven.
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general
 
  
 

 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Port error rate detection

2007-02-20 Thread Hal Rosenstock
On Tue, 2007-02-20 at 10:25, Steven Carter wrote:
 Hal Rosenstock wrote:
  On Tue, 2007-02-20 at 09:44, Steven Carter wrote:

  Hal Rosenstock wrote:
  
  On Mon, 2007-02-19 at 15:53, Steven Carter wrote:


  I have a Nagios module that alerts on connectivity, port errors, 
  speed/width problems.  I would like to give it the ability to change the 
  severity of the alert depending on whether errors are just present or if 
  they are increasing faster than a specified rate.  The intent is to 
  equip the module to keep the state of the last query and possibly 
  history, but I wanted to make sure that I was not re-inventing the wheel 
  first.  Is there an attribute or utility that I am overlooking that will 
  help me do this?
  
  
  Not currently (to my knowledge). The thresholding of rate aspect is
  similat to what will be supported in the proposed PerfManager.


  I noticed that in your RFC.  How are you planning on presenting the data 
  to other agents (e.g. Nagios, Openview, MRTG, etc.)?  One comment that I 
  should have made on your RFC is that I wonder if it is necessary to 
  include the data analysis/reduction part.
  
 
  I think it is because there is too much data to push up the tree to one
  manager.

 I agree, but does the data need to be pushed to one node?  If you go 
 with a distributed approach  where information is aggregated per network 
 device (switch or group of switches), 

The proposal includes a distributed approach.

 then a third-party monitoring 
 server can collect and present it in the same way that it does for an 
 Ethernet network.  That way, you do not need to pass information up to a 
 central node.  You can just have a third party monitoring application 
 collect and present the information.  I guess it just depends on how 
 much you want to leverage existing monitoring solutions and/or how much 
 capability you want inherent in the OFA software.

Third party monitoring agents can hook in at the intermediate nodes in
the collection hierarchy if that is what is desired.

  Just having a central location that collects the values and presents it 
  via SNMP is extremely 
  useful since there are a plethora of monitoring apps (free and 
  commercial) that  do what you are proposing.
  
 I should have said 'a location' and not 'a central location'.  Since 
 most monitoring applications support multiple agents, it is not 
 necessary to aggregate the information into one place.
 
  In general, this information can be exported via SNMP or whatever the
  management infrastructure is.
 
  BTW, are there SNMP MIBs for all of this information ? To my knowledge,
  some of these were started but never completed. Also, the MIBs were
  geared at the agents rather than the managers (in the PerfMgt arena).

 There are standard MIBS (e.g. mib-2's ifTable) that can present most of 
 the useful information (in/out octets, errors, etc.)

Not most of the useful IB information.

 , but I would suspect that you would have to supplement that with a private 
 MIB as 
 most other technologies/vendors have.

Yes, as this may be data out of a non IBTA specified manager, it is
likely a private MIB unless one goes for all the agent (PMA) data. There
was a proposed MIB for the PMA at the IETF IPoIB WG.

-- Hal

 Steven.
 
  -- Hal
 

  That way, a network manager can leverage existing tools currently used for 
  monitoring 
  Ethernet Nodes, Hosts, etc.  You can still include a last change 
  attribute with each counter so that simple utilities (like the one that 
  I am writing) can get an idea of how quickly errors are occurring.
  
 

  Steven.
 
  
  -- Hal
 


  Thanks,
 
  Steven.
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general
 
  
  


 

 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Port error rate detection

2007-02-19 Thread Steven Carter
I have a Nagios module that alerts on connectivity, port errors, 
speed/width problems.  I would like to give it the ability to change the 
severity of the alert depending on whether errors are just present or if 
they are increasing faster than a specified rate.  The intent is to 
equip the module to keep the state of the last query and possibly 
history, but I wanted to make sure that I was not re-inventing the wheel 
first.  Is there an attribute or utility that I am overlooking that will 
help me do this?

Thanks,

Steven.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general