Re: [j-nsp] Solarwinds Monitoring Problem
Thanks to everyone for their help on this... We've proven it back to a Solarwinds issue we feel. It's happened two more times on two additional EX switches, which at first glance would really point the finger at a JunOS related issue - BUT, when I do a restart on the Windows 2008 server hosting the Solarwinds system you are able to start pinging these devices no problem. It's not 100% conclusive but considering I can ping these EX switches from any other location while Solarwinds is reporting them as down has had me puzzled.. Thanks again for everyone's input.. appreciate it... Paul -Original Message- From: sth...@nethelp.no [mailto:sth...@nethelp.no] Sent: June-06-10 11:37 AM To: p...@paulstewart.org Cc: juniper-nsp@puck.nether.net Subject: Re: [j-nsp] Solarwinds Monitoring Problem > Is there default rate limiting of ICMP traffic in JunOS? There is a default limiting of all the traffic to the RE. Since you have a problem (missing ICMP echo replies while doing SNMP queries) that *might* be due to such limiting, I would strongly suggest that you do some packet sniffing and find out exactly what your NMS is trying to do. We have sizable Juniper M/MX based network here, and have never seen the problem you describe - this is with a combination of commercial monitoring systems and stuff we've developed ourselves. Mind you, we don't have any EX switches. Steinar Haug, Nethelp consulting, sth...@nethelp.no ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
And I doubt the Solarwinds app is pushing that kind of icmp traffic to a single host for monitoring. Now, if something else was already hitting it up... Dan Farrell Applied Innovations Corp. da...@appliedi.net -Original Message- From: juniper-nsp-boun...@puck.nether.net [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Chris Morrow Sent: Sunday, June 06, 2010 11:45 AM To: juniper-nsp@puck.nether.net Subject: Re: [j-nsp] Solarwinds Monitoring Problem On 06/06/10 11:37, sth...@nethelp.no wrote: >> Is there default rate limiting of ICMP traffic in JunOS? > monitoring systems and stuff we've developed ourselves. Mind you, we > don't have any EX switches. ex's have a default (unchangable) 1kpps limit toward the RE... ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp __ Information from ESET NOD32 Antivirus, version of virus signature database 5176 (20100606) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5182 (20100608) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
> -Original Message- > From: juniper-nsp-boun...@puck.nether.net [mailto:juniper-nsp- > boun...@puck.nether.net] On Behalf Of Jensen Tyler > Sent: Monday, June 07, 2010 10:45 AM > To: Paul Stewart > Cc: 'juniper-nsp' > Subject: Re: [j-nsp] Solarwinds Monitoring Problem > > I have seen the same issue with Solarwinds across many devices. I think > Solarwinds only sends 1 ICMP message. If that message is lost it declares > the node down. Ours has come back up on the next polling interval though. > We also run NSM Express and haven't seen an issue with false alarms. > > On a side note solarwinds has a knob for tuning your polling settings. > Might look at your timeouts. > > Jensen Tyler > Network Engineer > Fiberutilities Group, LLC You might also want to set the size of the ICMP message within the Solarwinds NPM Advanced Options. I recall having a problem of ICMP pings under a certain size were sometimes dropped by various vendor gear, not just by Juniper. -evt ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
I have seen the same issue with Solarwinds across many devices. I think Solarwinds only sends 1 ICMP message. If that message is lost it declares the node down. Ours has come back up on the next polling interval though. We also run NSM Express and haven't seen an issue with false alarms. On a side note solarwinds has a knob for tuning your polling settings. Might look at your timeouts. Jensen Tyler Network Engineer Fiberutilities Group, LLC -Original Message- From: juniper-nsp-boun...@puck.nether.net [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Paul Stewart Sent: Sunday, June 06, 2010 7:43 AM To: 'Jeff Cadwallader' Cc: 'juniper-nsp' Subject: Re: [j-nsp] Solarwinds Monitoring Problem Great... and guess what we're getting ready to deploy? ;) We have an NSM Express system sitting in the box ready to go soon... Our problem though doesn't appear to be SNMP itself - just problems pinging the hosts. during the time that Solarwinds says "site is down" you can't ping the box however SNMP still functions... Cheers, Paul From: Jeff Cadwallader [mailto:wom...@gmail.com] Sent: June-05-10 8:24 PM To: Paul Stewart Cc: juniper-nsp Subject: Re: [j-nsp] Solarwinds Monitoring Problem Paul We have seen the same thing on our ex series 3200 and 4200. we have not seen it on our MX480's yet. Our logs showed that the SNMP daemon had stopped. Opened a case with jtac and they mention (after 2 months I might add) that if you used Juniper's NMS (which we are) that that might cause those symptoms due to excessive polling. We junked the NMS and it hasn't seemed to happen since. Jeff On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart wrote: Hi folks... I'm starting here to see if anyone has seen this behaviour before by chance We're in a migration to Solarwinds for monitoring of our network resources. On the network are several Juniper devices (and lots more coming soon). Every so often (about once a month or so), the Solarwinds system triggers with a "node down" alarm. When this occurs, it's showing a Juniper device (which varies) as "down". Definition of "down" simply means it's not pingable. The behaviour we're seeing is that from the Solarwinds server we suddenly cannot ping the remote Juniper device - however - we continue to monitor SNMP successfully on that device. These Juniper devices have been MX480, EX3200 and EX4200 to date. During these outages I have been able to ping these devices from any other location on our network except the Solarwinds server. If I reboot the Solarwinds server, the alarm clears so I thought this is clearly an issue with the monitoring system ... but ... recently I rebooted one of the Juniper switches and the issue cleared as well Logs on the Juniper devices are clean - nothing indicating a problem. Solarwinds systems doesn't show anything of interest... Thoughts? ;) I'm thinking of setting up another open source monitoring solution just to further eliminate the Juniper side of this... Paul ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
Thank you - appreciate the information... We're looking at it currently and yes it only sends 1 ICMP message by default .. we'll adjust and go from there.. Thanks again! Paul -Original Message- From: Jensen Tyler [mailto:jty...@fiberutilities.com] Sent: Monday, June 07, 2010 10:45 AM To: Paul Stewart Cc: 'juniper-nsp' Subject: RE: [j-nsp] Solarwinds Monitoring Problem I have seen the same issue with Solarwinds across many devices. I think Solarwinds only sends 1 ICMP message. If that message is lost it declares the node down. Ours has come back up on the next polling interval though. We also run NSM Express and haven't seen an issue with false alarms. On a side note solarwinds has a knob for tuning your polling settings. Might look at your timeouts. Jensen Tyler Network Engineer Fiberutilities Group, LLC -Original Message- From: juniper-nsp-boun...@puck.nether.net [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Paul Stewart Sent: Sunday, June 06, 2010 7:43 AM To: 'Jeff Cadwallader' Cc: 'juniper-nsp' Subject: Re: [j-nsp] Solarwinds Monitoring Problem Great... and guess what we're getting ready to deploy? ;) We have an NSM Express system sitting in the box ready to go soon... Our problem though doesn't appear to be SNMP itself - just problems pinging the hosts. during the time that Solarwinds says "site is down" you can't ping the box however SNMP still functions... Cheers, Paul From: Jeff Cadwallader [mailto:wom...@gmail.com] Sent: June-05-10 8:24 PM To: Paul Stewart Cc: juniper-nsp Subject: Re: [j-nsp] Solarwinds Monitoring Problem Paul We have seen the same thing on our ex series 3200 and 4200. we have not seen it on our MX480's yet. Our logs showed that the SNMP daemon had stopped. Opened a case with jtac and they mention (after 2 months I might add) that if you used Juniper's NMS (which we are) that that might cause those symptoms due to excessive polling. We junked the NMS and it hasn't seemed to happen since. Jeff On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart wrote: Hi folks... I'm starting here to see if anyone has seen this behaviour before by chance We're in a migration to Solarwinds for monitoring of our network resources. On the network are several Juniper devices (and lots more coming soon). Every so often (about once a month or so), the Solarwinds system triggers with a "node down" alarm. When this occurs, it's showing a Juniper device (which varies) as "down". Definition of "down" simply means it's not pingable. The behaviour we're seeing is that from the Solarwinds server we suddenly cannot ping the remote Juniper device - however - we continue to monitor SNMP successfully on that device. These Juniper devices have been MX480, EX3200 and EX4200 to date. During these outages I have been able to ping these devices from any other location on our network except the Solarwinds server. If I reboot the Solarwinds server, the alarm clears so I thought this is clearly an issue with the monitoring system ... but ... recently I rebooted one of the Juniper switches and the issue cleared as well Logs on the Juniper devices are clean - nothing indicating a problem. Solarwinds systems doesn't show anything of interest... Thoughts? ;) I'm thinking of setting up another open source monitoring solution just to further eliminate the Juniper side of this... Paul ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
On 06/06/10 11:37, sth...@nethelp.no wrote: >> Is there default rate limiting of ICMP traffic in JunOS? > monitoring systems and stuff we've developed ourselves. Mind you, we > don't have any EX switches. ex's have a default (unchangable) 1kpps limit toward the RE... ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
> Is there default rate limiting of ICMP traffic in JunOS? There is a default limiting of all the traffic to the RE. Since you have a problem (missing ICMP echo replies while doing SNMP queries) that *might* be due to such limiting, I would strongly suggest that you do some packet sniffing and find out exactly what your NMS is trying to do. We have sizable Juniper M/MX based network here, and have never seen the problem you describe - this is with a combination of commercial monitoring systems and stuff we've developed ourselves. Mind you, we don't have any EX switches. Steinar Haug, Nethelp consulting, sth...@nethelp.no ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
Thank you... unless I'm reading it wrong it looks ok: icmp: 0 drops due to rate limit 67129 calls to icmp_error 0 errors not generated because old message was icmp Output Histogram 53894 echo reply 67129 destination unreachable 0 messages with bad code fields 0 messages less than the minimum length 35 messages with bad checksum 0 messages with bad source address 1 messages with bad length 0 echo drops with broadcast or multicast destinaton address 0 timestamp drops with broadcast or multicast destination address Input Histogram 20 echo reply 248 destination unreachable 53894 echo 58 time exceeded 53894 message responses generated Is there default rate limiting of ICMP traffic in JunOS? Take care, Paul -Original Message- From: juniper-nsp-boun...@puck.nether.net [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Ihsan Junaidi Ibrahim Sent: June-06-10 10:42 AM To: juniper-nsp Subject: Re: [j-nsp] Solarwinds Monitoring Problem Hi, If you do a show system statistics icmp, do you see any drops resulting from rate limiting? On 6 June 2010 20:43, Paul Stewart wrote: > Great... and guess what we're getting ready to deploy? ;) We have an NSM > Express system sitting in the box ready to go soon... > > > > Our problem though doesn't appear to be SNMP itself - just problems pinging > the hosts. during the time that Solarwinds says "site is down" you > can't ping the box however SNMP still functions... > > > > Cheers, > > > > Paul > > > > > > From: Jeff Cadwallader [mailto:wom...@gmail.com] > Sent: June-05-10 8:24 PM > To: Paul Stewart > Cc: juniper-nsp > Subject: Re: [j-nsp] Solarwinds Monitoring Problem > > > > Paul > > We have seen the same thing on our ex series 3200 and 4200. we have not > seen > it on our MX480's yet. Our logs showed that the SNMP daemon had stopped. > Opened a case with jtac and they mention (after 2 months I might add) that > if you used Juniper's NMS (which we are) that that might cause those > symptoms due to excessive polling. We junked the NMS and it hasn't seemed > to > happen since. > > Jeff > > On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart wrote: > > Hi folks... > > > > I'm starting here to see if anyone has seen this behaviour before by > chance > > > > We're in a migration to Solarwinds for monitoring of our network resources. > On the network are several Juniper devices (and lots more coming soon). > > > > Every so often (about once a month or so), the Solarwinds system triggers > with a "node down" alarm. When this occurs, it's showing a Juniper device > (which varies) as "down". Definition of "down" simply means it's not > pingable. > > > > The behaviour we're seeing is that from the Solarwinds server we suddenly > cannot ping the remote Juniper device - however - we continue to monitor > SNMP successfully on that device. These Juniper devices have been MX480, > EX3200 and EX4200 to date. During these outages I have been able to ping > these devices from any other location on our network except the Solarwinds > server. > > > > If I reboot the Solarwinds server, the alarm clears so I thought this is > clearly an issue with the monitoring system ... but ... recently I rebooted > one of the Juniper switches and the issue cleared as well > > > > Logs on the Juniper devices are clean - nothing indicating a problem. > Solarwinds systems doesn't show anything of interest... > > > > Thoughts? ;) I'm thinking of setting up another open source monitoring > solution just to further eliminate the Juniper side of this... > > > > Paul > > > > > > > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > > > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > -- Thank you for your time, Ihsan Junaidi Ibrahim ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
Hi, If you do a show system statistics icmp, do you see any drops resulting from rate limiting? On 6 June 2010 20:43, Paul Stewart wrote: > Great... and guess what we're getting ready to deploy? ;) We have an NSM > Express system sitting in the box ready to go soon... > > > > Our problem though doesn't appear to be SNMP itself - just problems pinging > the hosts. during the time that Solarwinds says "site is down" you > can't ping the box however SNMP still functions... > > > > Cheers, > > > > Paul > > > > > > From: Jeff Cadwallader [mailto:wom...@gmail.com] > Sent: June-05-10 8:24 PM > To: Paul Stewart > Cc: juniper-nsp > Subject: Re: [j-nsp] Solarwinds Monitoring Problem > > > > Paul > > We have seen the same thing on our ex series 3200 and 4200. we have not > seen > it on our MX480's yet. Our logs showed that the SNMP daemon had stopped. > Opened a case with jtac and they mention (after 2 months I might add) that > if you used Juniper's NMS (which we are) that that might cause those > symptoms due to excessive polling. We junked the NMS and it hasn't seemed > to > happen since. > > Jeff > > On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart wrote: > > Hi folks... > > > > I'm starting here to see if anyone has seen this behaviour before by > chance > > > > We're in a migration to Solarwinds for monitoring of our network resources. > On the network are several Juniper devices (and lots more coming soon). > > > > Every so often (about once a month or so), the Solarwinds system triggers > with a "node down" alarm. When this occurs, it's showing a Juniper device > (which varies) as "down". Definition of "down" simply means it's not > pingable. > > > > The behaviour we're seeing is that from the Solarwinds server we suddenly > cannot ping the remote Juniper device - however - we continue to monitor > SNMP successfully on that device. These Juniper devices have been MX480, > EX3200 and EX4200 to date. During these outages I have been able to ping > these devices from any other location on our network except the Solarwinds > server. > > > > If I reboot the Solarwinds server, the alarm clears so I thought this is > clearly an issue with the monitoring system ... but ... recently I rebooted > one of the Juniper switches and the issue cleared as well > > > > Logs on the Juniper devices are clean - nothing indicating a problem. > Solarwinds systems doesn't show anything of interest... > > > > Thoughts? ;) I'm thinking of setting up another open source monitoring > solution just to further eliminate the Juniper side of this... > > > > Paul > > > > > > > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > > > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > -- Thank you for your time, Ihsan Junaidi Ibrahim ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
Great... and guess what we're getting ready to deploy? ;) We have an NSM Express system sitting in the box ready to go soon... Our problem though doesn't appear to be SNMP itself - just problems pinging the hosts. during the time that Solarwinds says "site is down" you can't ping the box however SNMP still functions... Cheers, Paul From: Jeff Cadwallader [mailto:wom...@gmail.com] Sent: June-05-10 8:24 PM To: Paul Stewart Cc: juniper-nsp Subject: Re: [j-nsp] Solarwinds Monitoring Problem Paul We have seen the same thing on our ex series 3200 and 4200. we have not seen it on our MX480's yet. Our logs showed that the SNMP daemon had stopped. Opened a case with jtac and they mention (after 2 months I might add) that if you used Juniper's NMS (which we are) that that might cause those symptoms due to excessive polling. We junked the NMS and it hasn't seemed to happen since. Jeff On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart wrote: Hi folks... I'm starting here to see if anyone has seen this behaviour before by chance We're in a migration to Solarwinds for monitoring of our network resources. On the network are several Juniper devices (and lots more coming soon). Every so often (about once a month or so), the Solarwinds system triggers with a "node down" alarm. When this occurs, it's showing a Juniper device (which varies) as "down". Definition of "down" simply means it's not pingable. The behaviour we're seeing is that from the Solarwinds server we suddenly cannot ping the remote Juniper device - however - we continue to monitor SNMP successfully on that device. These Juniper devices have been MX480, EX3200 and EX4200 to date. During these outages I have been able to ping these devices from any other location on our network except the Solarwinds server. If I reboot the Solarwinds server, the alarm clears so I thought this is clearly an issue with the monitoring system ... but ... recently I rebooted one of the Juniper switches and the issue cleared as well Logs on the Juniper devices are clean - nothing indicating a problem. Solarwinds systems doesn't show anything of interest... Thoughts? ;) I'm thinking of setting up another open source monitoring solution just to further eliminate the Juniper side of this... Paul ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Solarwinds Monitoring Problem
Paul We have seen the same thing on our ex series 3200 and 4200. we have not seen it on our MX480's yet. Our logs showed that the SNMP daemon had stopped. Opened a case with jtac and they mention (after 2 months I might add) that if you used Juniper's NMS (which we are) that that might cause those symptoms due to excessive polling. We junked the NMS and it hasn't seemed to happen since. Jeff On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart wrote: > Hi folks... > > > > I'm starting here to see if anyone has seen this behaviour before by > chance > > > > We're in a migration to Solarwinds for monitoring of our network resources. > On the network are several Juniper devices (and lots more coming soon). > > > > Every so often (about once a month or so), the Solarwinds system triggers > with a "node down" alarm. When this occurs, it's showing a Juniper device > (which varies) as "down". Definition of "down" simply means it's not > pingable. > > > > The behaviour we're seeing is that from the Solarwinds server we suddenly > cannot ping the remote Juniper device - however - we continue to monitor > SNMP successfully on that device. These Juniper devices have been MX480, > EX3200 and EX4200 to date. During these outages I have been able to ping > these devices from any other location on our network except the Solarwinds > server. > > > > If I reboot the Solarwinds server, the alarm clears so I thought this is > clearly an issue with the monitoring system ... but ... recently I rebooted > one of the Juniper switches and the issue cleared as well > > > > Logs on the Juniper devices are clean - nothing indicating a problem. > Solarwinds systems doesn't show anything of interest... > > > > Thoughts? ;) I'm thinking of setting up another open source monitoring > solution just to further eliminate the Juniper side of this... > > > > Paul > > > > > > > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] Solarwinds Monitoring Problem
Hi folks... I'm starting here to see if anyone has seen this behaviour before by chance We're in a migration to Solarwinds for monitoring of our network resources. On the network are several Juniper devices (and lots more coming soon). Every so often (about once a month or so), the Solarwinds system triggers with a "node down" alarm. When this occurs, it's showing a Juniper device (which varies) as "down". Definition of "down" simply means it's not pingable. The behaviour we're seeing is that from the Solarwinds server we suddenly cannot ping the remote Juniper device - however - we continue to monitor SNMP successfully on that device. These Juniper devices have been MX480, EX3200 and EX4200 to date. During these outages I have been able to ping these devices from any other location on our network except the Solarwinds server. If I reboot the Solarwinds server, the alarm clears so I thought this is clearly an issue with the monitoring system ... but ... recently I rebooted one of the Juniper switches and the issue cleared as well Logs on the Juniper devices are clean - nothing indicating a problem. Solarwinds systems doesn't show anything of interest... Thoughts? ;) I'm thinking of setting up another open source monitoring solution just to further eliminate the Juniper side of this... Paul ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp