Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-24 Thread Jasper Jans
Richard,

We are running into similar issues with NSR. Are you running with GRES
enabled or have you removed that as well ?

Jasper


On Thu, May 23, 2013 at 12:44 AM, Richard A Steenbergen 
wrote:

> On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote:
> > I was curious to know if anyone has run into any issues with large
> > routing tables on an MX causing ISSU upgrades to fail?
> >
> > On several occasions, I have been able to successfully do an
> > In-Software-Service-Upgrade (ISSU) in a lab environment but then it
> > fails to work in production.
> >
> > I find it difficult to replicate the issue in a lab, since in
> > production I am dealing with lots of routes as compared to a small
> > lab.  Does anyone have any experience when the backup RE gets its new
> > software, then reboots, but since it takes a long time to populate the
> > routing kernel database on the newly upgraded RE that it appears to
> > timeout?
> >
> > I have seen behavior like this with upgrades moving from 10.x to a
> > newer 10.y and from 10.x to 11.y.
>
> We had that issue for many years. There is a hard-coded timeout in the
> NSR process which is very easy to hit if you have a box with a large
> number of routes.
>
> We had a case open on it for about 1.5 years, but Juniper refused to
> actually fix it ("it works fine in the lab"), and eventually we just
> gave us and declared ISSU to be dead. There are way too many other bugs
> with it anyways, even turning on NSR caused nothing but problems.
>
> --
> Richard A Steenbergenhttp://www.e-gerbil.net/ras
> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-23 Thread Byron Hicks
On 05/23/2013 09:25 AM, Mike Azevedo wrote:
> I have an MX960, full routes, performed issu. Did not have a timeout
> problem.
> 
> However, like Ras eluded to, other issues...Once the backup
> routing-engine upgrades and takes primary RE position, the used-to-be
> primary upgrades itself. You would think everything is fine with a
> new primary RE but the chassis goes into alarm still saying the
> backup is active like it switched for a failure event. JTAC says I
> have to switch it back to the old primary to get the alarm to clear.
> Why can't I run the 'new' primary for a while?

You can.  Just switch the master/backup relationship in the
configuration, and the alarm clears.

-- 
Byron Hicks
Lonestar Education and Research Network
office: 972-883-4645
google: 972-746-2549
aim/skype: byronhicks
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-23 Thread Mike Azevedo
I have an MX960, full routes, performed issu. Did not have a timeout problem. 

However, like Ras eluded to, other issues...Once the backup routing-engine 
upgrades and takes primary RE position, the used-to-be primary upgrades itself. 
You would think everything is fine with a new primary RE but the chassis goes 
into alarm still saying the backup is active like it switched for a failure 
event. JTAC says I have to switch it back to the old primary to get the alarm 
to clear. Why can't I run the 'new' primary for a while? 

- Original Message -

From: "Richard A Steenbergen"  
To: "Clarke Morledge"  
Cc: juniper-nsp@puck.nether.net 
Sent: Wednesday, May 22, 2013 5:44:03 PM 
Subject: Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables? 

On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote: 
> I was curious to know if anyone has run into any issues with large 
> routing tables on an MX causing ISSU upgrades to fail? 
> 
> On several occasions, I have been able to successfully do an 
> In-Software-Service-Upgrade (ISSU) in a lab environment but then it 
> fails to work in production. 
> 
> I find it difficult to replicate the issue in a lab, since in 
> production I am dealing with lots of routes as compared to a small 
> lab. Does anyone have any experience when the backup RE gets its new 
> software, then reboots, but since it takes a long time to populate the 
> routing kernel database on the newly upgraded RE that it appears to 
> timeout? 
> 
> I have seen behavior like this with upgrades moving from 10.x to a 
> newer 10.y and from 10.x to 11.y. 

We had that issue for many years. There is a hard-coded timeout in the 
NSR process which is very easy to hit if you have a box with a large 
number of routes. 

We had a case open on it for about 1.5 years, but Juniper refused to 
actually fix it ("it works fine in the lab"), and eventually we just 
gave us and declared ISSU to be dead. There are way too many other bugs 
with it anyways, even turning on NSR caused nothing but problems. 

-- 
Richard A Steenbergen  http://www.e-gerbil.net/ras 
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC) 
___ 
juniper-nsp mailing list juniper-nsp@puck.nether.net 
https://puck.nether.net/mailman/listinfo/juniper-nsp 

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-22 Thread Richard A Steenbergen
On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote:
> I was curious to know if anyone has run into any issues with large 
> routing tables on an MX causing ISSU upgrades to fail?
> 
> On several occasions, I have been able to successfully do an 
> In-Software-Service-Upgrade (ISSU) in a lab environment but then it 
> fails to work in production.
> 
> I find it difficult to replicate the issue in a lab, since in 
> production I am dealing with lots of routes as compared to a small 
> lab.  Does anyone have any experience when the backup RE gets its new 
> software, then reboots, but since it takes a long time to populate the 
> routing kernel database on the newly upgraded RE that it appears to 
> timeout?
> 
> I have seen behavior like this with upgrades moving from 10.x to a 
> newer 10.y and from 10.x to 11.y.

We had that issue for many years. There is a hard-coded timeout in the 
NSR process which is very easy to hit if you have a box with a large 
number of routes.

We had a case open on it for about 1.5 years, but Juniper refused to 
actually fix it ("it works fine in the lab"), and eventually we just 
gave us and declared ISSU to be dead. There are way too many other bugs 
with it anyways, even turning on NSR caused nothing but problems.

-- 
Richard A Steenbergenhttp://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-22 Thread Mark Tinka
On Wednesday, May 22, 2013 08:38:40 AM Saku Ytti wrote:

> We banned ISSU from our network due to poor hit/miss
> ratio, not uncommonly something strange would happen
> after ISSU.
> Like firewall filter was programmed wrong, routes were
> blackholing.
> 
> But as far as I understand, ISSU on routers isn't useful
> in other vendors gear either.
> 
> Even if ISSU would work in JNPR, it wouldn't be that
> useful to us, as it can cause blackholing for several
> seconds, which is not something we can do intentionally
> without announced maintenance windows, and if we do
> announce maintenance window, we might as well do full
> reload.

+1.

I've never tried implementing ISSU in any networks I've 
run/built because on paper, it looks both rosey and dark at 
the same time.

As many have said on this and other operational lists in the 
past, since most ISSU runs would happen in a maintenance 
window anyway, why not keep your life simple and just run 
upgrades vanilla?

For those running IOS XR, ISSU sounds like a great idea, but 
even SMU's that are documented as hitless have hit us many 
times. That said, I'm hearing some good news for IOS XR 5 
re: reduction of software upgrade times. I digress...

Cheers,

Mark.


signature.asc
Description: This is a digitally signed message part.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-22 Thread Christian

Hello Clarke,
You can use a Linux box with an hexabgp to easely inject as many routes 
you want in your lab routers.

Rgds,

Christian

Le 22/05/2013 03:01, Clarke Morledge a écrit :
I was curious to know if anyone has run into any issues with large 
routing tables on an MX causing ISSU upgrades to fail?


On several occasions, I have been able to successfully do an 
In-Software-Service-Upgrade (ISSU) in a lab environment but then it 
fails to work in production.


I find it difficult to replicate the issue in a lab, since in 
production I am dealing with lots of routes as compared to a small 
lab.  Does anyone have any experience when the backup RE gets its new 
software, then reboots, but since it takes a long time to populate the 
routing kernel database on the newly upgraded RE that it appears to 
timeout?


I have seen behavior like this with upgrades moving from 10.x to a 
newer 10.y and from 10.x to 11.y.


Clarke Morledge
College of William and Mary
Information Technology - Network Engineering
Jones Hall (Room 18)
Williamsburg VA 23187
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-21 Thread Saku Ytti
On (2013-05-21 21:01 -0400), Clarke Morledge wrote:

> I was curious to know if anyone has run into any issues with large
> routing tables on an MX causing ISSU upgrades to fail?

We banned ISSU from our network due to poor hit/miss ratio, not uncommonly
something strange would happen after ISSU.
Like firewall filter was programmed wrong, routes were blackholing.

But as far as I understand, ISSU on routers isn't useful in other vendors
gear either.

Even if ISSU would work in JNPR, it wouldn't be that useful to us, as it
can cause blackholing for several seconds, which is not something we can do
intentionally without announced maintenance windows, and if we do announce
maintenance window, we might as well do full reload.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


[j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

2013-05-21 Thread Clarke Morledge
I was curious to know if anyone has run into any issues with large routing 
tables on an MX causing ISSU upgrades to fail?


On several occasions, I have been able to successfully do an 
In-Software-Service-Upgrade (ISSU) in a lab environment but then it fails 
to work in production.


I find it difficult to replicate the issue in a lab, since in production I 
am dealing with lots of routes as compared to a small lab.  Does anyone 
have any experience when the backup RE gets its new software, then 
reboots, but since it takes a long time to populate the routing kernel 
database on the newly upgraded RE that it appears to timeout?


I have seen behavior like this with upgrades moving from 10.x to a newer 
10.y and from 10.x to 11.y.


Clarke Morledge
College of William and Mary
Information Technology - Network Engineering
Jones Hall (Room 18)
Williamsburg VA 23187
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp