Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
Richard, We are running into similar issues with NSR. Are you running with GRES enabled or have you removed that as well ? Jasper On Thu, May 23, 2013 at 12:44 AM, Richard A Steenbergen wrote: > On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote: > > I was curious to know if anyone has run into any issues with large > > routing tables on an MX causing ISSU upgrades to fail? > > > > On several occasions, I have been able to successfully do an > > In-Software-Service-Upgrade (ISSU) in a lab environment but then it > > fails to work in production. > > > > I find it difficult to replicate the issue in a lab, since in > > production I am dealing with lots of routes as compared to a small > > lab. Does anyone have any experience when the backup RE gets its new > > software, then reboots, but since it takes a long time to populate the > > routing kernel database on the newly upgraded RE that it appears to > > timeout? > > > > I have seen behavior like this with upgrades moving from 10.x to a > > newer 10.y and from 10.x to 11.y. > > We had that issue for many years. There is a hard-coded timeout in the > NSR process which is very easy to hit if you have a box with a large > number of routes. > > We had a case open on it for about 1.5 years, but Juniper refused to > actually fix it ("it works fine in the lab"), and eventually we just > gave us and declared ISSU to be dead. There are way too many other bugs > with it anyways, even turning on NSR caused nothing but problems. > > -- > Richard A Steenbergenhttp://www.e-gerbil.net/ras > GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC) > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
On 05/23/2013 09:25 AM, Mike Azevedo wrote: > I have an MX960, full routes, performed issu. Did not have a timeout > problem. > > However, like Ras eluded to, other issues...Once the backup > routing-engine upgrades and takes primary RE position, the used-to-be > primary upgrades itself. You would think everything is fine with a > new primary RE but the chassis goes into alarm still saying the > backup is active like it switched for a failure event. JTAC says I > have to switch it back to the old primary to get the alarm to clear. > Why can't I run the 'new' primary for a while? You can. Just switch the master/backup relationship in the configuration, and the alarm clears. -- Byron Hicks Lonestar Education and Research Network office: 972-883-4645 google: 972-746-2549 aim/skype: byronhicks ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
I have an MX960, full routes, performed issu. Did not have a timeout problem. However, like Ras eluded to, other issues...Once the backup routing-engine upgrades and takes primary RE position, the used-to-be primary upgrades itself. You would think everything is fine with a new primary RE but the chassis goes into alarm still saying the backup is active like it switched for a failure event. JTAC says I have to switch it back to the old primary to get the alarm to clear. Why can't I run the 'new' primary for a while? - Original Message - From: "Richard A Steenbergen" To: "Clarke Morledge" Cc: juniper-nsp@puck.nether.net Sent: Wednesday, May 22, 2013 5:44:03 PM Subject: Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables? On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote: > I was curious to know if anyone has run into any issues with large > routing tables on an MX causing ISSU upgrades to fail? > > On several occasions, I have been able to successfully do an > In-Software-Service-Upgrade (ISSU) in a lab environment but then it > fails to work in production. > > I find it difficult to replicate the issue in a lab, since in > production I am dealing with lots of routes as compared to a small > lab. Does anyone have any experience when the backup RE gets its new > software, then reboots, but since it takes a long time to populate the > routing kernel database on the newly upgraded RE that it appears to > timeout? > > I have seen behavior like this with upgrades moving from 10.x to a > newer 10.y and from 10.x to 11.y. We had that issue for many years. There is a hard-coded timeout in the NSR process which is very easy to hit if you have a box with a large number of routes. We had a case open on it for about 1.5 years, but Juniper refused to actually fix it ("it works fine in the lab"), and eventually we just gave us and declared ISSU to be dead. There are way too many other bugs with it anyways, even turning on NSR caused nothing but problems. -- Richard A Steenbergen http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC) ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote: > I was curious to know if anyone has run into any issues with large > routing tables on an MX causing ISSU upgrades to fail? > > On several occasions, I have been able to successfully do an > In-Software-Service-Upgrade (ISSU) in a lab environment but then it > fails to work in production. > > I find it difficult to replicate the issue in a lab, since in > production I am dealing with lots of routes as compared to a small > lab. Does anyone have any experience when the backup RE gets its new > software, then reboots, but since it takes a long time to populate the > routing kernel database on the newly upgraded RE that it appears to > timeout? > > I have seen behavior like this with upgrades moving from 10.x to a > newer 10.y and from 10.x to 11.y. We had that issue for many years. There is a hard-coded timeout in the NSR process which is very easy to hit if you have a box with a large number of routes. We had a case open on it for about 1.5 years, but Juniper refused to actually fix it ("it works fine in the lab"), and eventually we just gave us and declared ISSU to be dead. There are way too many other bugs with it anyways, even turning on NSR caused nothing but problems. -- Richard A Steenbergenhttp://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC) ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
On Wednesday, May 22, 2013 08:38:40 AM Saku Ytti wrote: > We banned ISSU from our network due to poor hit/miss > ratio, not uncommonly something strange would happen > after ISSU. > Like firewall filter was programmed wrong, routes were > blackholing. > > But as far as I understand, ISSU on routers isn't useful > in other vendors gear either. > > Even if ISSU would work in JNPR, it wouldn't be that > useful to us, as it can cause blackholing for several > seconds, which is not something we can do intentionally > without announced maintenance windows, and if we do > announce maintenance window, we might as well do full > reload. +1. I've never tried implementing ISSU in any networks I've run/built because on paper, it looks both rosey and dark at the same time. As many have said on this and other operational lists in the past, since most ISSU runs would happen in a maintenance window anyway, why not keep your life simple and just run upgrades vanilla? For those running IOS XR, ISSU sounds like a great idea, but even SMU's that are documented as hitless have hit us many times. That said, I'm hearing some good news for IOS XR 5 re: reduction of software upgrade times. I digress... Cheers, Mark. signature.asc Description: This is a digitally signed message part. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
Hello Clarke, You can use a Linux box with an hexabgp to easely inject as many routes you want in your lab routers. Rgds, Christian Le 22/05/2013 03:01, Clarke Morledge a écrit : I was curious to know if anyone has run into any issues with large routing tables on an MX causing ISSU upgrades to fail? On several occasions, I have been able to successfully do an In-Software-Service-Upgrade (ISSU) in a lab environment but then it fails to work in production. I find it difficult to replicate the issue in a lab, since in production I am dealing with lots of routes as compared to a small lab. Does anyone have any experience when the backup RE gets its new software, then reboots, but since it takes a long time to populate the routing kernel database on the newly upgraded RE that it appears to timeout? I have seen behavior like this with upgrades moving from 10.x to a newer 10.y and from 10.x to 11.y. Clarke Morledge College of William and Mary Information Technology - Network Engineering Jones Hall (Room 18) Williamsburg VA 23187 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
On (2013-05-21 21:01 -0400), Clarke Morledge wrote: > I was curious to know if anyone has run into any issues with large > routing tables on an MX causing ISSU upgrades to fail? We banned ISSU from our network due to poor hit/miss ratio, not uncommonly something strange would happen after ISSU. Like firewall filter was programmed wrong, routes were blackholing. But as far as I understand, ISSU on routers isn't useful in other vendors gear either. Even if ISSU would work in JNPR, it wouldn't be that useful to us, as it can cause blackholing for several seconds, which is not something we can do intentionally without announced maintenance windows, and if we do announce maintenance window, we might as well do full reload. -- ++ytti ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] ISSU timeouts on MX upgrades due to large routing tables?
I was curious to know if anyone has run into any issues with large routing tables on an MX causing ISSU upgrades to fail? On several occasions, I have been able to successfully do an In-Software-Service-Upgrade (ISSU) in a lab environment but then it fails to work in production. I find it difficult to replicate the issue in a lab, since in production I am dealing with lots of routes as compared to a small lab. Does anyone have any experience when the backup RE gets its new software, then reboots, but since it takes a long time to populate the routing kernel database on the newly upgraded RE that it appears to timeout? I have seen behavior like this with upgrades moving from 10.x to a newer 10.y and from 10.x to 11.y. Clarke Morledge College of William and Mary Information Technology - Network Engineering Jones Hall (Room 18) Williamsburg VA 23187 ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp