Re: [j-nsp] Krt queue issues

2012-10-03 Thread Benny Amorsen
Jared Mauch  writes:

> As far as the fallback 'default' route, if you are purchasing transit
> from someone, you could consider a last-resort default pointed at
> them. You can exclude routes like 10/8 etc by routing these to discard
> + install on your devices.

That only helps if the default gets installed first, though. If the
default has to wait at boot in the krt-queue behind the 300k+
Internet-routes, I have not really gained anything...

I suppose it is likely that a static default would be installed before
the BGP sessions even come up.


/Benny
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-02 Thread Saku Ytti
On (2012-10-02 15:20 -0400), Clarke Morledge wrote:

> routing table feed you can have before you start to hit this issue
> on the MX80?  Are there other load factors involved?

Yes there are other factors than just the number of BGP peers, I cannot
reliably identify them.

> I assuming that the RE-1300 on the MX chassis units do not suffer
> from this, correct?

It's inherent to JunOS, but it's lot harder to trigger it on faster
control-planes. MX80 and EX are more likely to see it.

-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-02 Thread Clarke Morledge

A very interesting thread.

Does anyone have a good feel for how many BGP neighbors with a full 
routing table feed you can have before you start to hit this issue on the 
MX80?  Are there other load factors involved?


I assuming that the RE-1300 on the MX chassis units do not suffer from 
this, correct?


As a workaround, could you have a script that brings up BGP neighbors in 
an orderly sequence after a reboot?


Clarke Morledge
College of William and Mary
Information Technology - Network Engineering
Jones Hall (Room 18)
Williamsburg VA 23187


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-02 Thread Jared Mauch

On Oct 2, 2012, at 10:49 AM, Benny Amorsen wrote:

> "Darren O'Connor"  writes:
> 
>> Indeed, this is the worst thing this router can do. I have redundant
>> routers sitting there doing absolutely nothing as this router's
>> control-plane says everything is fine.
> 
> I'm looking at using MX80 as an Internet transit router too...
> 
> Do you know if it is possible to prioritize which routes get installed
> first into the FIB? In that case, a default route could be used to catch
> the wrongly-blackholed traffic. It is not particularly elegant or in
> keeping with being otherwise default-free, of course.

so, I've observed a lot of other interesting bugs as it relates to JunOS when 
running on a lower processor system.  These are the types of bugs they didn't 
see in the lab until they installed the same "slower" RE that we were using.  
Just some odd timing regression.

I have reason to believe some of this will get better in the long-term, but 
until then you will need to spend some time convincing JTAC and the developers 
to look into the suboptimal performance of the system under load.  (We spent a 
long time doing this in the past and they eventually found some code that had a 
poorly constructed set of arguments to an if statement.  This resulted in it 
always being true (or was it false?).

As far as the fallback 'default' route, if you are purchasing transit from 
someone, you could consider a last-resort default pointed at them.  You can 
exclude routes like 10/8 etc by routing these to discard + install on your 
devices.

- Jared
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-02 Thread Benny Amorsen
"Darren O'Connor"  writes:

> Indeed, this is the worst thing this router can do. I have redundant
> routers sitting there doing absolutely nothing as this router's
> control-plane says everything is fine.

I'm looking at using MX80 as an Internet transit router too...

Do you know if it is possible to prioritize which routes get installed
first into the FIB? In that case, a default route could be used to catch
the wrongly-blackholed traffic. It is not particularly elegant or in
keeping with being otherwise default-free, of course.


/Benny

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-01 Thread Jonathan Lassoff
It's sadly a known issue for which there is no easy fix.

When turning up new adjacencies, I generally hack in policy to avoid
announcing any routes at first until the box has had a while to learn and
pick up the tables, only then do I start announcing space and sinking
traffic through the router.

However, this does little help after an unexpected reboot.

--j

On Mon, Oct 1, 2012 at 5:26 AM, Darren O'Connor wrote:

> Hi Saku.
>
> Indeed, this is the worst thing this router can do. I have redundant
> routers sitting there doing absolutely nothing as this router's
> control-plane says everything is fine.
>
> Juniper aren't really telling me anything, only that the links I've shows
> are 'corner cases' - no comment on the comment I got from JTAC.
>
> I do happen to have a spare Brocade XMR that I might just use instead.
>
> Juniper should just come out straight.
>
> > Date: Mon, 1 Oct 2012 15:15:39 +0300
> > From: s...@ytti.fi
> > To: juniper-nsp@puck.nether.net
> > Subject: Re: [j-nsp] Krt queue issues
> >
> > On (2012-10-01 08:38 +0100), Darren O'Connor wrote:
> >
> > Hi Darren,
> >
> > > So to me this means this problem is a software issue, not hardware.
> And it's not yet fixed. Hence spending the money on a new box would be of
> no use.
> >
> > Certainly not hardware issue, cisco boxes running significantly lower
> > performance RPs wont do this (well at this scale). Like crappy old
> > sup720-3bxl.
> >
> > IMHO this is absolutely worst thing router can do, not have RIB and FIB
> in
> > sync, since then all your investments in redundant network was lost. You
> > have working backup paths, but they cannot be used due to your router
> > sucking traffic it cannot handle yet.
> > If they can't fix the FIB programming, they should also stop accepting
> > routes from routing protocols, even if BGP convergence takes 30min, it's
> > much more preferable to FIB/RIB desync.
> >
> > > This particular router for me will be connected to a peering point so
> will have around 200 neighbours, as well as a transit link with a full BGP
> table.
> >
> > JunOS is exceedingly poorly performing platform in control-plane,
> > especially with PPC control-plane. 200 neighbours on MX80 does not sound
> > like a good idea right now. You probably should have gone with bigger MX
> > where you'd get XEON.
> >
> > MX80 has faster CPU than RSP720, but RSP720 runs circles around MX80 :/
> > --
> >   ++ytti
> > ___
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-01 Thread Richard A Steenbergen
On Mon, Oct 01, 2012 at 03:15:39PM +0300, Saku Ytti wrote:
> 
> JunOS is exceedingly poorly performing platform in control-plane, 
> especially with PPC control-plane. 200 neighbours on MX80 does not sound 
> like a good idea right now. You probably should have gone with bigger MX 
> where you'd get XEON.
> 
> MX80 has faster CPU than RSP720, but RSP720 runs circles around MX80 :/ 

Indeed. For extra fun, try watching your "show route forwarding-table 
summary" after you reboot, and see how long it takes for your router to 
actually get a full table installed. The more BGP load you have on the 
device, the more you'll see that it totally stalls the installation of 
routes into the FIB. At this point I can't describe it as anything less 
than a major architectural flaw which Juniper is completely powerless to 
fix.

-- 
Richard A Steenbergenhttp://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-01 Thread Darren O'Connor
Hi Saku.

Indeed, this is the worst thing this router can do. I have redundant routers 
sitting there doing absolutely nothing as this router's control-plane says 
everything is fine.

Juniper aren't really telling me anything, only that the links I've shows are 
'corner cases' - no comment on the comment I got from JTAC.

I do happen to have a spare Brocade XMR that I might just use instead. 

Juniper should just come out straight.

> Date: Mon, 1 Oct 2012 15:15:39 +0300
> From: s...@ytti.fi
> To: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] Krt queue issues
> 
> On (2012-10-01 08:38 +0100), Darren O'Connor wrote:
> 
> Hi Darren,
> 
> > So to me this means this problem is a software issue, not hardware. And 
> > it's not yet fixed. Hence spending the money on a new box would be of no 
> > use.
> 
> Certainly not hardware issue, cisco boxes running significantly lower
> performance RPs wont do this (well at this scale). Like crappy old
> sup720-3bxl.
> 
> IMHO this is absolutely worst thing router can do, not have RIB and FIB in
> sync, since then all your investments in redundant network was lost. You
> have working backup paths, but they cannot be used due to your router
> sucking traffic it cannot handle yet.
> If they can't fix the FIB programming, they should also stop accepting
> routes from routing protocols, even if BGP convergence takes 30min, it's
> much more preferable to FIB/RIB desync.
> 
> > This particular router for me will be connected to a peering point so will 
> > have around 200 neighbours, as well as a transit link with a full BGP table.
> 
> JunOS is exceedingly poorly performing platform in control-plane,
> especially with PPC control-plane. 200 neighbours on MX80 does not sound
> like a good idea right now. You probably should have gone with bigger MX
> where you'd get XEON.
> 
> MX80 has faster CPU than RSP720, but RSP720 runs circles around MX80 :/
> -- 
>   ++ytti
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
  
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Krt queue issues

2012-10-01 Thread Saku Ytti
On (2012-10-01 08:38 +0100), Darren O'Connor wrote:

Hi Darren,

> So to me this means this problem is a software issue, not hardware. And it's 
> not yet fixed. Hence spending the money on a new box would be of no use.

Certainly not hardware issue, cisco boxes running significantly lower
performance RPs wont do this (well at this scale). Like crappy old
sup720-3bxl.

IMHO this is absolutely worst thing router can do, not have RIB and FIB in
sync, since then all your investments in redundant network was lost. You
have working backup paths, but they cannot be used due to your router
sucking traffic it cannot handle yet.
If they can't fix the FIB programming, they should also stop accepting
routes from routing protocols, even if BGP convergence takes 30min, it's
much more preferable to FIB/RIB desync.

> This particular router for me will be connected to a peering point so will 
> have around 200 neighbours, as well as a transit link with a full BGP table.

JunOS is exceedingly poorly performing platform in control-plane,
especially with PPC control-plane. 200 neighbours on MX80 does not sound
like a good idea right now. You probably should have gone with bigger MX
where you'd get XEON.

MX80 has faster CPU than RSP720, but RSP720 runs circles around MX80 :/
-- 
  ++ytti
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


[j-nsp] Krt queue issues

2012-10-01 Thread Darren O'Connor
Hi all.

I'm looking at replacing my ageing m7i's with MX80s. I have run into a few 
issues where the RIB is not moved to the FIB in a timely fashion and the router 
effectively black holes traffic for up to 20 minutes while it empties the krt 
queue.

My hope that with a beefier MX80, this problem would be gone. However I've been 
reading reports that this is not the case. 

Here is 1 such example on an MX480: 
http://www.gossamer-threads.com/lists/nsp/juniper/35385
This is another: http://www.gossamer-threads.com/lists/nsp/juniper/20588

After extensive talking with JTAC I got this response:
"I had read the 20588 thread
previously. It is expected behavior for:

1. routes to be in KRT queue

2. and to be blackholed as they are in there, since PFE is not programmed with
them.



It's not a bug, it's a design limitation. There's a unix socket between the RE
and PFE which has limited bandwidth to process what's called
"ifstate" by developers. There are features upcoming which will help
decrease the blackholing window, like RLI 17564."


So to me this means this problem is a software issue, not hardware. And it's 
not yet fixed. Hence spending the money on a new box would be of no use.

This particular router for me will be connected to a peering point so will have 
around 200 neighbours, as well as a transit link with a full BGP table.

Can anyone shed any light on this problem? 

Thanks

Darren
  
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp