RE: [j-nsp] Krt queue issues

2012-10-03 Thread Jensen Tyler
Look into Static route retain. Should keep the route in the forwarding table.

>From Jniper site
<<<
Route Retention

By default, static routes are not retained in the forwarding table when the 
routing process shuts down. When the routing process starts up again, any 
routes configured as static routes must be added to the forwarding table again. 
To avoid this latency, routes can be flagged as retain, so that they are kept 
in the forwarding table even after the routing process shuts down. Retention 
ensures that the routes are always in the forwarding table, even immediately 
after a system reboot.
>>>

Thanks,

Jensen Tyler
Sr Engineering Manager
Fiberutilities Group, LLC


-Original Message-
From: juniper-nsp-boun...@puck.nether.net 
[mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Benny Amorsen
Sent: Wednesday, October 03, 2012 8:32 AM
To: Jared Mauch
Cc: Saku Ytti; juniper-...@puck.nether.net
Subject: Re: [j-nsp] Krt queue issues

Jared Mauch  writes:

> As far as the fallback 'default' route, if you are purchasing transit 
> from someone, you could consider a last-resort default pointed at 
> them. You can exclude routes like 10/8 etc by routing these to discard
> + install on your devices.

That only helps if the default gets installed first, though. If the default has 
to wait at boot in the krt-queue behind the 300k+ Internet-routes, I have not 
really gained anything...

I suppose it is likely that a static default would be installed before the BGP 
sessions even come up.


/Benny
___
juniper-nsp mailing list juniper-...@puck.nether.net 
https://puck.nether.net/mailman/listinfo/juniper-nsp



RE: [j-nsp] Krt queue issues

2012-10-03 Thread Naslund, Steve
I think route retention might help in the event the table was cleared or
routing process restarted but I don't that it will help with a boot
because the table structures are being built as part of the system
initialization.  In reality, I would expect the static routes to get
installed very early as soon as the routing process comes up.  Since you
will need a route to your BGP neighbor (even though it may be directly
connected, it is still a route), routing has to be up BEFORE BGP
establishes and by definition your static routes will have to be up
before your BGP routes are ready.  How well your router responds to
traffic during an initial boot and during a 300,000 route update is
another story.  My experience with very large routers and tables is that
you will have a hard time guaranteeing user traffic will pass with very
much performance during an event like a full table rebuild.  Luckily
with the bandwidth we have these days and the CPU power on the routers,
it does not take that long to pull in a full internet table and begin
handling traffic.

Steven Naslund

-Original Message-
From: Jensen Tyler [mailto:jty...@fiberutilities.com] 
Sent: Wednesday, October 03, 2012 9:45 AM
To: nanog@nanog.org
Subject: RE: [j-nsp] Krt queue issues

Look into Static route retain. Should keep the route in the forwarding
table.

>From Jniper site
<<<
Route Retention

By default, static routes are not retained in the forwarding table when
the routing process shuts down. When the routing process starts up
again, any routes configured as static routes must be added to the
forwarding table again. To avoid this latency, routes can be flagged as
retain, so that they are kept in the forwarding table even after the
routing process shuts down. Retention ensures that the routes are always
in the forwarding table, even immediately after a system reboot.
>>>

Thanks,

Jensen Tyler
Sr Engineering Manager
Fiberutilities Group, LLC


-Original Message-
From: juniper-nsp-boun...@puck.nether.net
[mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Benny Amorsen
Sent: Wednesday, October 03, 2012 8:32 AM
To: Jared Mauch
Cc: Saku Ytti; juniper-...@puck.nether.net
Subject: Re: [j-nsp] Krt queue issues

Jared Mauch  writes:

> As far as the fallback 'default' route, if you are purchasing transit 
> from someone, you could consider a last-resort default pointed at 
> them. You can exclude routes like 10/8 etc by routing these to discard
> + install on your devices.

That only helps if the default gets installed first, though. If the
default has to wait at boot in the krt-queue behind the 300k+
Internet-routes, I have not really gained anything...

I suppose it is likely that a static default would be installed before
the BGP sessions even come up.


/Benny
___
juniper-nsp mailing list juniper-...@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp




Re: [j-nsp] Krt queue issues

2013-01-08 Thread Tim Vollebregt
Hi,

What we do nowadays as some workaround, is configuring a default route towards 
a core router on 8 x 10G before maintaining an MX box. Which will be installed 
before BGP sessions come up, this will cause some packet loss during burst hour 
outages but is fine during maintenance hours. 

I've seen cases where it took up to 30 minutes before the full table was 
installed correctly in the PFE's.

Currently this issue/bug is holding back our Juniper deployments. As far as I 
know Juniper created a project group for this bug, and so far they were able to 
reproduce the issue. Looks like the issue is being taken serious from now.

Tim

On Oct 3, 2012, at 11:50 PM, Naslund, Steve wrote:

> I think route retention might help in the event the table was cleared or
> routing process restarted but I don't that it will help with a boot
> because the table structures are being built as part of the system
> initialization.  In reality, I would expect the static routes to get
> installed very early as soon as the routing process comes up.  Since you
> will need a route to your BGP neighbor (even though it may be directly
> connected, it is still a route), routing has to be up BEFORE BGP
> establishes and by definition your static routes will have to be up
> before your BGP routes are ready.  How well your router responds to
> traffic during an initial boot and during a 300,000 route update is
> another story.  My experience with very large routers and tables is that
> you will have a hard time guaranteeing user traffic will pass with very
> much performance during an event like a full table rebuild.  Luckily
> with the bandwidth we have these days and the CPU power on the routers,
> it does not take that long to pull in a full internet table and begin
> handling traffic.
> 
> Steven Naslund
> 
> -Original Message-
> From: Jensen Tyler [mailto:jty...@fiberutilities.com] 
> Sent: Wednesday, October 03, 2012 9:45 AM
> To: nanog@nanog.org
> Subject: RE: [j-nsp] Krt queue issues
> 
> Look into Static route retain. Should keep the route in the forwarding
> table.
> 
> From Jniper site
> <<<
> Route Retention
> 
> By default, static routes are not retained in the forwarding table when
> the routing process shuts down. When the routing process starts up
> again, any routes configured as static routes must be added to the
> forwarding table again. To avoid this latency, routes can be flagged as
> retain, so that they are kept in the forwarding table even after the
> routing process shuts down. Retention ensures that the routes are always
> in the forwarding table, even immediately after a system reboot.
>>>> 
> 
> Thanks,
> 
> Jensen Tyler
> Sr Engineering Manager
> Fiberutilities Group, LLC
> 
> 
> -Original Message-
> From: juniper-nsp-boun...@puck.nether.net
> [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Benny Amorsen
> Sent: Wednesday, October 03, 2012 8:32 AM
> To: Jared Mauch
> Cc: Saku Ytti; juniper-...@puck.nether.net
> Subject: Re: [j-nsp] Krt queue issues
> 
> Jared Mauch  writes:
> 
>> As far as the fallback 'default' route, if you are purchasing transit 
>> from someone, you could consider a last-resort default pointed at 
>> them. You can exclude routes like 10/8 etc by routing these to discard
>> + install on your devices.
> 
> That only helps if the default gets installed first, though. If the
> default has to wait at boot in the krt-queue behind the 300k+
> Internet-routes, I have not really gained anything...
> 
> I suppose it is likely that a static default would be installed before
> the BGP sessions even come up.
> 
> 
> /Benny
> ___
> juniper-nsp mailing list juniper-...@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
> 
> 




Re: [j-nsp] Krt queue issues

2013-01-08 Thread Richard A Steenbergen
On Tue, Jan 08, 2013 at 03:45:10PM +0100, Tim Vollebregt wrote:
> Hi,
> 
> What we do nowadays as some workaround, is configuring a default route 
> towards a core router on 8 x 10G before maintaining an MX box. Which 
> will be installed before BGP sessions come up, this will cause some 
> packet loss during burst hour outages but is fine during maintenance 
> hours.
> 
> I've seen cases where it took up to 30 minutes before the full table 
> was installed correctly in the PFE's.
> 
> Currently this issue/bug is holding back our Juniper deployments. As 
> far as I know Juniper created a project group for this bug, and so far 
> they were able to reproduce the issue. Looks like the issue is being 
> taken serious from now.

PR 836197

I actually have very good luck reproducing it:

http://cluepon.net/ras/rpdstall.png

The issue appears to be that when rpd is busy processing incoming BGP 
updates (such as when you turn up a large number of peers 
simultaniously), it starves the rest of the process from actually 
spending any CPU time handling/installing the route. The graph above 
shows a plot of the total BGP paths, the number of routes in the 
"pending" state, and the number of routes actually installed into the 
forwarding hardware. This is a very simplified example (nothing but IBGP 
sessions with very simple policies here, not even any EBGP neighbors), 
using the latest top of the line routing engine, so in real life the 
issue is much worse.

As you can see, while rpd is still busy receiving and processing the 
incoming updates, the number of pending routes rises and doesn't fall, 
and the number of routes installed in the PFE stays almost non-existant. 
A few routes actually manage to squeek in before all of the BGP sessions 
come up, which is why it has any at all for the period between 0 and 330 
seconds. After the router finishes receiving the BGP paths, the pending 
routes clear very quickly, and then the FIB installation process begins. 
8 minutes after turning up the BGP sessions, this router finally has a 
full table installed in hardware. The pending routes actually clear much 
quicker than this once the BGP routes stop coming int, I need to update 
this graph with a higher resolution to show it. :)

Juniper actually DOES have a fix for this issue, tweaking the scheduler 
in rpd so that the router still processes BGP routes even when it's 
spending a lot of time receiving new routes. Unfortunately they haven't 
yet decided to prioritize implementing this fix, so it's still stuck in 
development. If this issue drives you as insane as it does me, I highly 
encourage you to talk to your account team about PR 836197 and why 8-20+ 
minutes to install routes to the FIB is not acceptable to you.

-- 
Richard A Steenbergenhttp://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



Re: [j-nsp] Krt queue issues

2013-01-08 Thread bas
Hi,

On Tue, Jan 8, 2013 at 10:20 PM, Richard A Steenbergen  
wrote:
> PR 836197

That looks like a spanking new PR number to me.
The highest PR number I found in 12.2 release notes was 82.
Rather strange that they didn't have an earlier PR number, while the
issue has existed for such a long time.

> If this issue drives you as insane as it does me, I highly
> encourage you to talk to your account team about PR 836197

Done.

I can't read PR836197 online as it is not public.
Can you post it without liability?
If you would be liable do not post it.. Also do _not_ email me off
list with the PR description...

Thanks.



Re: [j-nsp] Krt queue issues

2013-01-08 Thread Richard A Steenbergen
On Tue, Jan 08, 2013 at 11:10:16PM +0100, bas wrote:
> Hi,
> 
> On Tue, Jan 8, 2013 at 10:20 PM, Richard A Steenbergen  
> wrote:
> > PR 836197
> 
> That looks like a spanking new PR number to me.
> The highest PR number I found in 12.2 release notes was 82.
> Rather strange that they didn't have an earlier PR number, while the
> issue has existed for such a long time.

Oh I have a pile of PR's about a mile long, including some that I opened 
on this issue 5+ years ago. But I'm not going to harp on the complete 
absurdity of how long it has taken to finally figure this thing out, or 
the number of people who have seen this issue while they've claimed all 
along that nobody else sees it. I'm just going to focus on fixing it. 
This is the PR that they've chosen for implementing the actual fix, so 
that's what I'm going with for the sake of simplicity. :)

> I can't read PR836197 online as it is not public.
> Can you post it without liability?
> If you would be liable do not post it.. Also do _not_ email me off
> list with the PR description...

Neither can I, but the basic description of the issue is what I said 
before. :)

-- 
Richard A Steenbergenhttp://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)