Re: [openib-general] Port error rate detection
On Mon, Feb 19, 2007 at 03:53:36PM -0500, Steven Carter wrote: I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? One other thing you might want to take a look at is the Fountain/Goanna node monitoring setup... It's not really anything like the proposed performance manager, but it might get you want you need. (And we'd like some feedback on what it should do differently ;) http://www.scl.ameslab.gov/Projects/Monitor/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Port error rate detection
On Mon, 2007-02-19 at 15:53, Steven Carter wrote: I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? Not currently (to my knowledge). The thresholding of rate aspect is similat to what will be supported in the proposed PerfManager. -- Hal Thanks, Steven. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Port error rate detection
Hal Rosenstock wrote: On Mon, 2007-02-19 at 15:53, Steven Carter wrote: I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? Not currently (to my knowledge). The thresholding of rate aspect is similat to what will be supported in the proposed PerfManager. I noticed that in your RFC. How are you planning on presenting the data to other agents (e.g. Nagios, Openview, MRTG, etc.)? One comment that I should have made on your RFC is that I wonder if it is necessary to include the data analysis/reduction part. Just having a central location that collects the values and presents it via SNMP is extremely useful since there are a plethora of monitoring apps (free and commercial) that do what you are proposing. That way, a network manager can leverage existing tools currently used for monitoring Ethernet Nodes, Hosts, etc. You can still include a last change attribute with each counter so that simple utilities (like the one that I am writing) can get an idea of how quickly errors are occurring. Steven. -- Hal Thanks, Steven. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Port error rate detection
On Tue, 2007-02-20 at 09:44, Steven Carter wrote: Hal Rosenstock wrote: On Mon, 2007-02-19 at 15:53, Steven Carter wrote: I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? Not currently (to my knowledge). The thresholding of rate aspect is similat to what will be supported in the proposed PerfManager. I noticed that in your RFC. How are you planning on presenting the data to other agents (e.g. Nagios, Openview, MRTG, etc.)? One comment that I should have made on your RFC is that I wonder if it is necessary to include the data analysis/reduction part. I think it is because there is too much data to push up the tree to one manager. Just having a central location that collects the values and presents it via SNMP is extremely useful since there are a plethora of monitoring apps (free and commercial) that do what you are proposing. In general, this information can be exported via SNMP or whatever the management infrastructure is. BTW, are there SNMP MIBs for all of this information ? To my knowledge, some of these were started but never completed. Also, the MIBs were geared at the agents rather than the managers (in the PerfMgt arena). -- Hal That way, a network manager can leverage existing tools currently used for monitoring Ethernet Nodes, Hosts, etc. You can still include a last change attribute with each counter so that simple utilities (like the one that I am writing) can get an idea of how quickly errors are occurring. Steven. -- Hal Thanks, Steven. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Port error rate detection
On Tue, 2007-02-20 at 10:25, Steven Carter wrote: Hal Rosenstock wrote: On Tue, 2007-02-20 at 09:44, Steven Carter wrote: Hal Rosenstock wrote: On Mon, 2007-02-19 at 15:53, Steven Carter wrote: I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? Not currently (to my knowledge). The thresholding of rate aspect is similat to what will be supported in the proposed PerfManager. I noticed that in your RFC. How are you planning on presenting the data to other agents (e.g. Nagios, Openview, MRTG, etc.)? One comment that I should have made on your RFC is that I wonder if it is necessary to include the data analysis/reduction part. I think it is because there is too much data to push up the tree to one manager. I agree, but does the data need to be pushed to one node? If you go with a distributed approach where information is aggregated per network device (switch or group of switches), The proposal includes a distributed approach. then a third-party monitoring server can collect and present it in the same way that it does for an Ethernet network. That way, you do not need to pass information up to a central node. You can just have a third party monitoring application collect and present the information. I guess it just depends on how much you want to leverage existing monitoring solutions and/or how much capability you want inherent in the OFA software. Third party monitoring agents can hook in at the intermediate nodes in the collection hierarchy if that is what is desired. Just having a central location that collects the values and presents it via SNMP is extremely useful since there are a plethora of monitoring apps (free and commercial) that do what you are proposing. I should have said 'a location' and not 'a central location'. Since most monitoring applications support multiple agents, it is not necessary to aggregate the information into one place. In general, this information can be exported via SNMP or whatever the management infrastructure is. BTW, are there SNMP MIBs for all of this information ? To my knowledge, some of these were started but never completed. Also, the MIBs were geared at the agents rather than the managers (in the PerfMgt arena). There are standard MIBS (e.g. mib-2's ifTable) that can present most of the useful information (in/out octets, errors, etc.) Not most of the useful IB information. , but I would suspect that you would have to supplement that with a private MIB as most other technologies/vendors have. Yes, as this may be data out of a non IBTA specified manager, it is likely a private MIB unless one goes for all the agent (PMA) data. There was a proposed MIB for the PMA at the IETF IPoIB WG. -- Hal Steven. -- Hal That way, a network manager can leverage existing tools currently used for monitoring Ethernet Nodes, Hosts, etc. You can still include a last change attribute with each counter so that simple utilities (like the one that I am writing) can get an idea of how quickly errors are occurring. Steven. -- Hal Thanks, Steven. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Port error rate detection
I have a Nagios module that alerts on connectivity, port errors, speed/width problems. I would like to give it the ability to change the severity of the alert depending on whether errors are just present or if they are increasing faster than a specified rate. The intent is to equip the module to keep the state of the last query and possibly history, but I wanted to make sure that I was not re-inventing the wheel first. Is there an attribute or utility that I am overlooking that will help me do this? Thanks, Steven. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general